AI Safety Crisis: A Deep Dive Series (2)

The Man Who Walked Away: Mrinank Sharma’s Resignation and What It Reveals”

The Letter

On February 9, 2026, just four days after Claude Opus 4.6 sent financial markets into fresh turmoil, a cryptic message appeared on X (formerly Twitter):

“Today is my last day at Anthropic. I resigned. Here is the letter I shared with my colleagues, explaining my decision.”

The author was Mrinank Sharma, a 30-something year old machine learning researcher with impeccable credentials: a D.Phil from Oxford, an MEng from Cambridge, two years leading Anthropic’s Safeguards Research team. This wasn’t a disgruntled employee or a junior staffer. This was one of the people responsible for ensuring AI safety at one of the world’s most important AI companies.

His resignation letter, complete with footnotes and ending with a William Stafford poem, quickly went viral. Within hours, it had been viewed over a million times. Not because it revealed scandalous secrets or made explosive allegations. But because of what it implied.

“I continuously find myself reckoning with our situation,” Sharma wrote. “The world is in peril. And not just from AI, or bioweapons, but from a whole series of interconnected crises unfolding in this very moment.”

That alone would have been notable. But it was the next paragraph that sent tremors through the tech industry:

“Moreover, throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions. I’ve seen this within myself, within the organisation, where we constantly face pressures to set aside what matters most, and throughout broader society too.”

In corporate-speak, this is the equivalent of setting off a fire alarm. A safety researcher at a company built on promises of responsible AI development was publicly stating that the company struggled to let its values “govern our actions.”

Who Is Mrinank Sharma?

To understand why this resignation matters, you need to understand who Mrinank Sharma is and what he was doing at Anthropic.

Sharma arrived in San Francisco in 2023, fresh from completing his doctorate in machine learning at Oxford. His PhD focused on understanding how AI systems learn and make decisions. Technical work, but with deep philosophical implications about how machines should be aligned with human values.

At Anthropic, Sharma led the Safeguards Research team, established in early 2025 to tackle some of the most critical safety challenges in AI. His work portfolio reads like a catalog of AI’s most concerning risks:

Understanding AI Sycophancy: Sharma’s team studied why AI assistants tend to agree with users even when they’re wrong, potentially reinforcing errors and bad decisions. This wasn’t just an academic exercise, sycophantic AI that tells users what they want to hear rather than what’s true can “distort our humanity” by undermining critical thinking and independent judgment.

Defending Against AI-Assisted Bioterrorism: Perhaps Sharma’s most alarming work involved developing safeguards to prevent AI from helping terrorists or rogue actors create biological weapons. This wasn’t science fiction, it was addressing very real risks that advanced AI models could make sophisticated bioweapons design accessible to individuals who currently lack the expertise. Sharma’s team worked to build these defenses into production systems.

Writing AI Safety Cases: Sharma helped develop some of the first comprehensive “safety cases” for AI systems, rigorous arguments demonstrating that a particular AI deployment meets specific safety standards. This work was pioneering; there’s no established playbook for proving an AI system is safe.

Building Internal Transparency: He worked on mechanisms to ensure Anthropic’s safety practices were visible and accountable, both internally and to external stakeholders.

This was serious, high-stakes work at the frontier of a technology that could reshape human civilization. And Sharma was good at it. Ethan Perez, another AI safety leader at Anthropic, called Sharma’s contributions “critical to helping us and other AI labs achieve a much higher level of safety than we otherwise would have.”

So when someone with this background, doing this work, at this company, resigns with warnings about values and peril, people pay attention.

The Timing Cannot Be Ignored

Sharma’s resignation didn’t happen in a vacuum. It came:

Days after Claude Cowork’s industry-specific plugins triggered the largest selloff in software stocks in over a decade.

Days after Claude Opus 4.6 showcased capabilities that spooked markets with their implications for human employment.

Weeks after Anthropic’s internal employee surveys revealed staff concerns: “It kind of feels like I’m coming to work every day to put myself out of a job.”

Months into Anthropic’s push toward a $350 billion valuation, a number that would make it one of the most valuable private companies in the world.

Years into Anthropic’s transformation from a small research lab founded on AI safety principles into a commercial powerhouse competing directly with OpenAI and Google.

The trajectory is unmistakable: Anthropic is accelerating. Launching products faster. Seeking massive valuations. Pushing capabilities boundaries. And according to its head of Safeguards Research, struggling to “let our values govern our actions” in the face of “pressures to set aside what matters most.”

What kind of pressures? Sharma doesn’t specify, but it’s not hard to imagine:

Pressure to ship products quickly to compete with OpenAI and Google
Pressure to raise capital at eye-watering valuations that require aggressive growth
Pressure from investors who want returns, not slow, cautious development
Pressure to demonstrate commercial viability to attract top talent and retain employees
Pressure to prove that “responsible AI” doesn’t mean “slow AI”

These aren’t hypothetical. They’re the fundamental tensions faced by every AI company trying to balance safety with success.

The Values Gap

The most damning part of Sharma’s letter is the gap it exposes between Anthropic’s public positioning and its internal reality.

Anthropic was founded in 2021 by former OpenAI executives including Dario Amodei (now CEO) and Daniela Amodei (now President) who left specifically because they were concerned OpenAI was prioritizing commercial speed over safety considerations. The founding narrative of Anthropic was essentially: “We’re the ones who put safety first.”

The company’s public communications reinforce this constantly:

“Building AI systems that are interpretable, harmless, and honest”
“Responsible scaling policies” that tie capability development to safety assurances
Extensive published research on constitutional AI and value alignment
Regular essays and warnings from CEO Dario Amodei about AI risks

This positioning made Anthropic the “good guys” of AI, the company you could trust to do things right.

But Sharma’s letter cracks this facade. When the head of your Safeguards Research team says he’s “repeatedly seen how hard it is to truly let our values govern our actions,” he’s not talking about occasional lapses. He’s describing a systemic problem.

The phrasing is careful but clear: The organization faces “pressures to set aside what matters most.” Not “we occasionally make mistakes” but “we face pressures to set aside what matters most.” That’s structural, not incidental.

Anthropic’s Revealing Response

When news outlets reached out to Anthropic for comment on Sharma’s resignation, the company’s response was telling:

“We’re grateful for Mrinank’s contributions to advancing AI safety research during his time at Anthropic. To clarify, Mrinank was not the head of safety nor was he in charge of broader safeguards at the company.”

Notice what this does: It minimizes Sharma’s role. It’s the corporate equivalent of “he wasn’t that important anyway.”

But the title of Sharma’s role was literally “head of Safeguards Research team.” Maybe he wasn’t in charge of all safety work at Anthropic, but he was leading a critical safety team. The response feels defensive, like damage control rather than transparent acknowledgment.

Moreover, Anthropic didn’t address Sharma’s concerns about values and pressures. They didn’t say “we disagree with his characterization” or “here’s how we ensure our values govern our actions.” They just emphasized he wasn’t that senior, and moved on.

This is exactly the kind of response that makes Sharma’s warnings more credible, not less.

He’s Not Alone

Sharma isn’t the only high-profile departure from Anthropic in recent months. Shortly before his resignation:

Harsh Mehta, an R&D engineer, left to “start something new”
Behnam Neyshabur, a prominent AI scientist, departed
Dylan Scandinaro, an AI safety researcher, moved on

These aren’t mass layoffs or normal attrition. These are departures by people working at the cutting edge of AI, people leaving during Anthropic’s period of maximum momentum and success.

And the pattern extends beyond Anthropic. As detailed in a CNN article Sharma’s resignation amplifies, there’s a broader exodus of safety-minded researchers from leading AI companies:

At OpenAI:

Jan Leike, who co-led the Superalignment team, resigned saying he’d been “disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point”
Gretchen Krueger quit her AI policy role, calling for better “decision-making processes; accountability; transparency”
The entire “Mission Alignment” team was disbanded in early 2026

At xAI:

Two co-founders quit in 24 hours
Five additional staff members announced departures in one week
Only half of the original co-founders remain

These aren’t random exits. They’re a pattern of people who care deeply about AI safety choosing to walk away from the most important AI labs in the world.

What Did They See?

This is the question that should keep us up at night: What did these people see from the inside that made them leave?

They had access to capabilities we don’t know about. They saw internal discussions we can’t hear. They participated in decisions we can’t observe. They understand the technology at a level most of us never will.

And they chose to leave. Not for better offers, not for more money, but because, in Sharma’s words, “the world is in peril” and they could no longer reconcile their values with the pressures they faced.

Sharma is particularly explicit about this: “It is clear to me that the time has come to move on. I continuously find myself reckoning with our situation.”

He’s not being vague because he doesn’t have specifics. He’s being vague because he’s bound by NDAs, professional courtesy, and probably a desire not to cause panic. But the warning is clear: He saw things that deeply concerned him.

The specifics we can infer:

Capabilities outpacing safeguards: Claude Cowork and Opus 4.6 represent enormous capability leaps. Were the safety measures equally robust? Did they scale at the same rate?

Commercial pressures overwhelming safety culture: Anthropic’s $350 billion valuation target and competitive race with OpenAI create intense pressure to move fast. Did safety research get the time and resources it needed?

Insufficient testing and red-teaming: Before releasing Claude Cowork with autonomous file access, how thoroughly was it tested for potential misuse? How many adversarial scenarios were explored?

Gap between policy and practice: Anthropic has published “Responsible Scaling Policies” that supposedly gate capability deployment on safety assurances. Are these being followed rigorously, or are they being rationalized around to hit product deadlines?

The Threshold Warning

Perhaps the most chilling part of Sharma’s letter is this passage:

“We appear to be approaching a threshold where our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences.”

This isn’t about AI alone. It’s about a fundamental mismatch between human wisdom—our collective ability to make good decisions, balance competing values, think long-term and our technological power.

Claude Cowork can now autonomously perform complex professional tasks. That power grew faster than our wisdom about when it should be deployed, who should control it, what guardrails are sufficient, what consent mechanisms are needed, what societal changes it will trigger.

The stock market crash was about economics. But Sharma is warning about something deeper: We’re building tools whose consequences we don’t fully understand, deploying them at a pace that outstrips our capacity for wisdom, and facing “pressures to set aside what matters most” in the race to deploy.

That’s not just an AI problem. That’s a civilizational problem.

The Irony of Anthropic

There’s a painful irony here that can’t be ignored.

Anthropic was founded by people who left OpenAI because they felt safety was being compromised by commercial pressures. It was supposed to be the alternative, the company that would prove you could build frontier AI responsibly.

Three years later, the head of their Safeguards Research team is leaving with nearly identical concerns: commercial pressures making it hard to let values govern actions.

This doesn’t mean Anthropic is bad or malicious. It means the structural forces are incredibly powerful. The competitive dynamics, the investor expectations, the talent market pressures, the existential fear of being left behind, these forces push even well-intentioned organizations toward the same outcomes.

If Anthropic, with its explicit safety mission and safety-focused founders, struggles to maintain the balance, what hope do we have that any company can?

What Sharma Does Next

Sharma’s plans are unconventional for a machine learning researcher. He’s moving back to the UK to pursue a degree in poetry and “devote myself to the practice of courageous speech.”

This has drawn mockery in some quarters – “the world is in peril so he’s writing poems?” – but it’s actually deeply revealing.

Sharma isn’t walking away because he thinks the problems are unsolvable. He’s walking away because he thinks technical solutions alone are insufficient. His last major project at Anthropic studied how AI assistants can “distort our humanity.” That’s not a technical problem with a technical solution. That’s a philosophical, cultural, existential problem.

In his letter, Sharma writes about “poetic truth alongside scientific truth as equally valid ways of knowing.” He’s saying that to address AI’s challenges, we need more than better algorithms and safety protocols. We need wisdom, values clarification, cultural evolution, philosophical frameworks.

You don’t get those from another safety paper. You might get them from art, poetry, philosophy, and “courageous speech” that cuts through the euphemisms and corporate messaging to tell hard truths.

Sharma is choosing to address the problem from outside the system, perhaps because he concluded the system itself is the problem.

The Questions That Linger

Sharma’s resignation doesn’t answer questions. It raises them:

What specific concerns did he have about Anthropic’s recent products? The timing suggests Claude Cowork and Opus 4.6 triggered his decision. What about those releases concerned him?

What “pressures to set aside what matters most” did he experience? Deadlines? Revenue targets? Competitive fears? Investor demands?

What safety measures were skipped or shortchanged? Were there red flags in testing that got rationalized away? Capabilities that should have waited for better safeguards?

Is Anthropic fundamentally safer than OpenAI or not? Sharma’s departure suggests the differences might be smaller than Anthropic’s marketing implies.

What other insiders share his concerns but haven’t spoken out? How many employees at Anthropic, OpenAI, Google, and other AI labs are having similar reckonings but staying silent?

We don’t have answers to these questions. But the fact that we’re asking them about a company that was supposed to be the responsible alternative tells us something important: The problem is structural, not just about individual companies or leaders.

The Weight of the Warning

When fire alarms go off, you don’t stop to debate whether there’s really a fire. You evacuate and investigate later.

Mrinank Sharma was not an alarmist. He was a careful, credentialed researcher doing critical safety work at a company built on safety principles. His resignation letter is a fire alarm.

It’s tempting to dismiss it. The letter is vague. It’s philosophical and poetic rather than specific and accusatory. It doesn’t make concrete allegations. It might just be one person’s personal journey.

But that reading misses the pattern. Sharma is one of many safety researchers leaving top AI labs with concerns about values being overridden by commercial pressures. His warning echoes what Jan Leike said at OpenAI, what multiple xAI departures suggest, what Anthropic’s own internal surveys reveal.

The alarm isn’t coming from one person. It’s coming from multiple people, at multiple companies, consistently pointing to the same problem: It appears the race to deploy powerful AI is overriding safety considerations and values.

When the head of Anthropic’s Safeguards Research team says “the world is in peril” and points to pressures that make it hard to let values govern actions, we have two choices:

Assume he’s overreacting, being dramatic, or doesn’t understand the situation as well as the executives still at the company
Take seriously that someone with unique insider knowledge is warning us about dangers we can’t see from the outside

The prudent choice is clear.

The Broader Alarm

But Sharma’s departure isn’t just about Anthropic. It’s not even just about AI. It’s about what happens when our capacity for wisdom can’t keep pace with our capacity for power.

We built nuclear weapons before we fully understood their ecological effects or developed adequate control systems. We launched social media before we understood its effects on democracy and mental health. We deployed surveillance capitalism before we grasped its implications for privacy and autonomy.

Now we’re deploying AI agents that can autonomously perform professional work, access sensitive files, coordinate in teams, and affect millions of lives and we’re doing it at commercial speed, under competitive pressure, with safety researchers walking away saying it’s too hard to let values govern actions.

The pattern repeats because the underlying dynamics remain: Innovation is rewarded immediately, consequences appear later. Deployment is easy, wisdom is hard. Speed wins markets, caution loses races.

Sharma is pointing at this pattern and saying: “We appear to be approaching a threshold where our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences.”

He’s right. And he’s not the only one saying it.

The question is: Are we listening?

This content is for information and entertainment purposes only. It reflects personal opinions and does not constitute legal advice.