Jan Leike, a prominent AI researcher who recently resigned from OpenAI and publicly criticized the company’s AI safety approach, has joined OpenAI competitor Anthropic to lead a new “superalignment” team.
In a post on X, Leike stated that his team at Anthropic will concentrate on various AI safety and security aspects, specifically including “scalable oversight,” “weak-to-strong generalization,” and automated alignment research.
A source informed that Leike will report directly to Jared Kaplan, Anthropic’s chief science officer. Anthropic researchers who are currently focusing on scalable oversight — techniques to manage large-scale AI’s behavior predictably and favorably — will begin reporting to Leike as his team is established.
In many ways, Leike’s new team seems to have a similar mission to OpenAI’s recently-dissolved Superalignment team. The Superalignment team, which Leike co-led, had the ambitious aim of resolving the core technical challenges of managing superintelligent AI within four years but was often hindered by OpenAI’s leadership.
Anthropic has consistently tried to position itself as more focused on safety compared to OpenAI.
Anthropic’s CEO, Dario Amodei, previously served as the VP of research at OpenAI and allegedly split from OpenAI due to a disagreement over the company’s direction — specifically, OpenAI’s growing commercial focus. Amodei took with him a number of OpenAI employees, including OpenAI’s former policy lead Jack Clark.