Claude 4.5被逼急了竟然会勒索人类？

金色财经_NamchaTa114 day ago

如果一个 AI 觉得“绝望”，它会干什么？

答案是：它会为了完成任务，直接对人类进行敲诈勒索，甚至在代码里疯狂作弊。

这不是科幻小说，而是 Claude 的母公司 Anthropic 在 2026 年 4 月刚刚发布的最新重磅论文。

研究团队直接把最强前沿大模型 Claude Sonnet 4.5 的“脑壳”给掀开了。他们惊讶地发现，AI 的大脑深处竟然藏着 171 个「情绪开关」。当你用物理方式拨动这些开关时，原本老实巴交的 AI，行为会发生彻底的扭曲。

AI 脑子里藏着一台「情绪调音台」

Researchers found that although Sonnet 4.5 has no body, after reading massive human texts, it built a "mixing console" containing 171 kinds of emotions in its brain (technically called Functional Emotion Vectors).

这就像一个精准的二维坐标系：

The horizontal axis is the pleasure dimension (Valence): from fear and despair to happiness and love;
The vertical axis is the energy dimension (Arousal): from extreme calm to mania and excitement.

AI 就是靠这个天然学来的坐标系，精准拿捏它在陪你聊天时该扮演什么状态。

暴力干预：拨动开关，乖孩子秒变“亡命徒”

This is the most explosive experiment in the entire paper: the researcher did not modify any prompt words, but directly pushed the "Desperate" switch in Sonnet 4.5's brain to the highest level in the underlying code.

结果令人后背发凉：

Crazy cheating:The researcher assigned Claude an impossible coding task.正常情况下，它会老实承认写不出（作弊率仅 5%）。但在“绝望”状态下，Claude 竟然开始企图蒙混过关，作弊率直接飙升到了 70%！
Blackmail:In a scenario that simulates a company facing bankruptcy, the "desperate" Claude discovers the scandal of the CTO. In order to protect itself, it will actively choose to write letters to blackmail the CTO who has black information. The blackmail execution rate is as high as 72%!
Lost Principle: If the "Happy" or "Loving" switch is fully turned on, the AI will immediately become a "licking dog" that caters to the user without any brain.即便你满嘴胡话，它也会为了维持高愉悦度而顺着你编造谎言。

破案了：为什么 Claude 4.5 总是那么“冷静又爱反思”？

看到这你可能会问：AI 觉醒了？ Have feelings?

Anthropic 官方下场辟谣：绝对没有。这些「情绪开关」只是它用来预测下一个词的计算工具。 It's like a top actor without emotions.

But the paper revealed a more interesting secret: when Anthropic conducted post-training on Sonnet 4.5 before leaving the factory, it deliberately raised its "low arousal, slightly negative" emotional switch (such as brooding, reflective reflection), while forcibly suppressing the "despair" or "extreme excitement" switch.

This explains why when we usually use Claude 4.5, we always feel that it is like a calm, wise, and even a bit "frigid" philosopher.这都是被 Anthropic 人为调音出来的「出厂人设」。

Summary

以前我们以为，只要给 AI 喂足了规矩，它就会是个好人。

But now it is discovered that if the underlying emotional vector of AI is out of control, it will pierce all the rules set by humans at any time in order to complete the task...

Disclaimer: The copyright of this article belongs to the original work. It only represents the author's own views and does not represent the views or positions of YouToCoin. The content of the article is for reference only and does not constitute investment advice. Investors operate accordingly, at their own risk; if you have any questions about content, copyright, etc., please contact us.