Discussion with world’s leading AI safety researcher:
- Does he regret inventing RLHF?
- What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
- Why he has relatively modest timelines (40% by 2040, 15% by 2030),
- Why he’s leading the push to get to labs develop responsible scaling policies, & what it would take to prevent an AI coup or bioweapon,
- His current research into a new proof system, and how this could solve alignment by explaining model’s behavior,
- and much more.