-
Cryptocurrencies
-
Exchanges
-
Media
All languages
Cryptocurrencies
Exchanges
Media
Share
In the morning in Palo Alto, as soon as the coffee was served, Alan Walker glanced down at Anthropic’s harness article, looked up and said only one sentence:
"Many people think that this is another step forward in the model. Wrong, this is the beginning of the process betraying people."
This article talks about engineering design, planner, generator, evaluator, and how to make Claude run for several hours to make more complex products.
Most people stop when they see this. They will feel:
Oh, it turns out that the agent is more complex, the prompt is longer, and the workflow is more detailed.
But Alan said that what is really worth looking at is never the surface function, but To which level the power is transferred.
In the past, to complete a complex task, someone had to break down the requirements, execute it, check it, rework it, and cover it all.
What Anthropic is doing now is not to make the model more like a smart employee, but to let the entire system begin to take over the organizational, supervisory and acceptance rights that originally belonged to humans.
Harness is not a plug-in. Harness is The machine begins to grow a "management layer".
This is where it gets really scary.
Many people’s first reaction when they see harness is: Isn’t this just another agent framework?
This understanding is too shallow.
The essence of ordinary tools is to listen to commands and then execute them. You click it and it dries. If you don't say anything, it won't move.
But harness no longer has this logic. What it really does is turn the division of labor structure originally hidden in human teams into software:
Who will understand the requirements, who will break them into stages, who will execute them, who will check them, and who will have the right to redo them after discovering problems.
In other words, Anthropic is not stacking more functions, but writing "how to organize work" itself into the system.
Why is this step important? Because in the past, the most difficult thing to copy was never single-point capabilities, but organizational capabilities.
There are many people who can write code.
There are very few people who can organize more than a dozen people, a dozen steps, and a dozen rounds of rework, and finally deliver it stably.
And what the harness touches is exactly the most expensive thing on this level.
Tools improve efficiency, and organizations determine output.
A single model is just labor, and Harness is starting to get a feel for corporate structure.
WhenAI not only knows how to do work, but also begins to be able to divide work, hand over, and hold accountable, it is no longer a "Tool upgrade"So simple.
The most confusing thing about the model is that it always seems very smart in short tasks.
Ask it a question, and the answer will be clear and logical; ask it to write a piece of code, and it will usually answer it in a decent manner. So many people mistakenly believe that since short tasks can be done, isn’t it just a matter of running longer for long tasks?
Not at all.
The really difficult part of a long task is never not to be unable to do a certain step, but to be able to do it without distortion, loss of control, or self-deception after dozens of consecutive steps.
The same goes for humans when they work on projects. The biggest fear is not not knowing it, but the chaos later:
Can’t remember clearly the requirements,
The target is starting to drift,
The logic is inconsistent,
The best thing in the end is not to finish things, but to write a summary that looks like it is finished.
The core issue mentioned in Anthropic’s article is essentially this:
Models will gradually lose their soul during long-term missions. The longer the context, the more chaotic the state, and the easier it is to enter into a psychological illusion of "almost done" in advance.
The value of Harness is not to make it more intelligent, but to make it less scattered, less empty, and less easy to fool.
The disassembly phase, handover, contract setting, independent evaluation, and failure rollback may look like process details, but they are actually all solving the same underlying problem:
Intelligence can be unstable, but delivery cannot be left to chance.
So if you really want to understand harness, you must understand one thing first:
The real value in the future is not who can occasionally create an amazing demo.
But who can make the system continue to push things forward for hours, days, or even longer without unfinished business?
It’s not unusual to be able to write.
It’s surprising that it didn’t collapse until the end of writing.
A flash of inspiration is not valuable, but stable delivery is valuable.
Alan said that the coldest thing in Anthropic is not the planner or the generator, but the evaluator.
Why?
Because large models have a problem that is very similar to humans: things they make by themselves always feel good.
As long as there are no external constraints, it is easy to give a self-evaluation of "generally good", "basically completed" and "core functions are already available".
The problem is that this kind of evaluation is often not a lie, but a kind of systemic self-forgiveness.
Why do many projects in human companies end up overturning?
Because people who work are often the best at making excuses for themselves.
People who do it say it's almost done,
The acceptance person is too lazy to look deeply,
So an "almost" thing was released all the way, and finally exploded in the hands of users.
Anthropic's cruel point is to take this matter apart directly:
Working is a role,
The fault is with another character.
The former is responsible for advancement, and the latter is responsible for doubt.
The logic behind this is very deep:
Once production rights and evaluation rights are separated, the system begins to truly form a closed loop.
And what's even scarier is that Anthropic doesn't just let the evaluator say a few words "I feel bad here". It is trying to structure "finding faults" as much as possible:
Functions need to be tested, page key points, interfaces need to be checked, database status needs to be checked, and design quality is also broken down into scoreable dimensions.
What does this mean?
MeansMany judgment rights that were mystified by humans in the past are being broken down bit by bit into processes, standards and thresholds.
The first thing to be automated is often not physical strength, but fault-finding.
Once "Will this thing work?" is streamlined, many people's experience moat will begin to leak.
In the past, many positions were truly valuable, not because they could produce, but because they had the right to say, "Is this thing considered excessive?"
Now, this power is beginning to loosen from people's hands.
Alan said that the coolest thing about Anthropic is not the planner or the generator, but the evaluator.
Why?
Because large models have a problem that is very similar to humans: things they make by themselves always feel good.
As long as there are no external constraints, it is easy to give a self-evaluation of "generally good", "basically completed" and "core functions are already available".
The problem is that this kind of evaluation is often not a lie, but a kind of systematic self-forgiveness.
Why do many projects in human companies end up overturning?
Because people who work are often the best at making excuses for themselves.
The person who made it said it was almost done,
The people who accepted it were too lazy to look deeply,
So an "almost" thing was released all the way, and finally exploded in the hands of users.
Anthropic’s cruel point is to take this matter apart directly:
The person doing the work is a role,
It was another character who found fault.
The former is responsible for advancement, and the latter is responsible for doubt.
The logic behind this is very deep:
Once production rights and evaluation rights are separated, the system begins to truly form a closed loop.
And what's even scarier is that Anthropic doesn't just let the evaluator say a few words "I don't think it's good here." It is trying to structure "finding faults" as much as possible:
The functions need to be tested, the key points of the page, the interface need to be checked, the status of the database needs to be checked, and the design quality is also broken down into scoreable dimensions.
What does this mean?
This means that many of the judgment rights that were mystified by humans in the past are being broken down into processes, standards and thresholds bit by bit.
The first thing to be automated is often not physical strength, but fault-finding.
Once "Will this thing work?" is streamlined, many people's experience moat will begin to leak.
In the past, many jobs were truly valuable, not because they could produce, but because they had the right to say, "Is this thing worth it?"
Now, this power is beginning to loosen from people's hands.
When they see this kind of article, many people's conditioned reflex is: Are programmers going to be doomed?
Alan said that this kind of question is too superficial and too lazy.
The first wave of Harness eats is not a certain professional name.
What it eats first is a long-standing way of survival that is common in almost all knowledge work:
If the requirements are not clear, do it first;
I got it wrong in the middle, I will fix it later;
The effect is average, but it can run;
The document is not clearly written, but everyone in the team understands it;
Go online first, and fix problems later.
To put it bluntly, this is a set of working methods based on fuzzy space and human flexibility.
The reason why many projects can move forward is not because the process is really clear, but because there are always people in the middle who fill the holes by relying on experience, filling positions, and temporary judgment.
Harness is doing exactly the opposite.
It compresses the fuzzy space.
It compresses the interface space.
It is compressing the living space that "I thought" was "almost" and "should be OK".
First define what is done in this round, and then allow work to start;
If you are not satisfied, call back;
If the test fails, continue;
Don't feel it, want evidence.
Once this logic is pushed forward, the most dangerous person will never be the person who is best at writing code, but the person who relies most on the gray area for survival.
Harness doesn’t eat programmers, it eats ambiguity first.
Not everyone will be replaced, but every position that relies on ambiguity to live will be devalued first.
In the past, many positions relied on information difference to survive. In the future, many positions will die on standard deviation.
Many people will ask, this kind of workflow-type thing has been done before, why is it that people are starting to take it seriously this time?
Because the previous base mold was not strong enough.
To put it more bluntly:
In the past, many such frames looked beautiful but were heavy to run, but turned out not to be stiff enough.
You build a bunch of processes, a bunch of roles, and write a bunch of rules. In the end, you just package an unreliable model into a more complex and unreliable system.
So in the past, many people lost patience with agents, workflows, and scaffolds, which is normal.
It's not that the direction is wrong, it's that the chassis has not reached that stage.
It's different now.
Once the model crosses a certain threshold, many processes that originally looked like decoration begin to release real value for the first time.
Because when the base mold is strong enough, the process is no longer supporting a waste, but amplifying a system that is already working continuously.
This is why harness suddenly seems "kind of real" now.
It's not that its concept just appeared today, but The model is finally strong enough to reap the process dividends.
Alan said it exactly:
Model capability is the engine and Harness is the gearbox.
There was no good engine before, and no matter how good the gearbox was, it was just a decoration.
But when the engine is powerful enough, the gearbox begins to decide who can hit the highway and who is still slamming the accelerator.
So this wave is not just a technology trend, but the industry is sending a deeper signal:
The competition in the future is not just who has a stronger model, but who can compile the model into the production system first.
Finally Alan put the cup down and said the coldest sentence of the day:
"In the past, people looked at the software to work, but in the future, the software looked at the software to work."
Why does this sentence hurt my heart?
Because it breaks the truth that harness is not really rewriting a certain position, but a lower-level premise that almost no one has doubted in the past:
In digital labor, there should be one person standing in the middle by default.
He came to dismantle the task,
He comes to monitor the progress,
He judges quality,
He will coordinate the rework,
He is here to get the last word.
This "person standing in the middle by default" may be called a programmer, PM, TL, design leader, QA, or project manager.
The name is not important.
Importantly, in the past, the entire digital production system was inseparable from such a human center by default.
What Harness really touches is this central position.
It does not mean to drive people away immediately today, but to prove it bit by bit:
It turns out that some disassembly can be done systematically,
It turns out that some supervision can be done systematically,
It turns out that some acceptance can be done systematically,
It turns out that some rollbacks and retries can be processed without anyone discovering them first.
When this is proven more and more, people’s position will not disappear all at once, but it will begin to sink.
From the default center, become Exception intervention;
From monitoring the whole process to only dealing with corner issues;
From process owner to Process Observer.
This is what the harness actually eats.
Not a programmer.
Not a product manager.
Not QA.
But the deeper assumptions behind these characters:
Human beings are by default at the center of the process.
Once this premise starts to loosen, the subsequent stories will be different.
In the age of tools, it is about who is better at using tools.
In the Harness era, who was the first to accept it:
You are no longer naturally at the center of the system.