-
Cryptocurrencies
-
Exchanges
-
Media
All languages
Cryptocurrencies
Exchanges
Media
Share
Author: Andy Hall, a16z research consultant and professor of political economics at Stanford University; Source: a16z crypto; Compiler: Shaw Golden Finance
Last year, more than $6 million was traded in prediction market contracts regarding the outcome of Venezuela’s presidential election. But after the vote count, the market was caught between a dilemma: the government declared Nicolás Maduro the winner, while the opposition and international observers said the election was fraudulent. Should the settlement of prediction market contracts be based on “official information” (Maduro wins) or “consensus of credible reports” (opposition wins)?
In Venezuela’s election, observers made various accusations: that rules were ignored and participants’ “money was stolen”; that the dispute settlement mechanism acted as “judge, jury and executioner” in a high-stakes political farce; and that the electoral process itself was “seriously rigged.”
This is not an isolated episode, but a manifestation of one of the biggest bottlenecks faced in the large-scale development of the prediction market: contract settlement.
The stakes are high here. If the results are accurate, people will trust your market and be willing to trade in it, and the price will become a meaningful signal to society. If the outcome is wrong, trading can be frustrating and unpredictable. Participants may be lost, liquidity risks drying up, and prices no longer reflect accurate predictions of stabilization targets. Instead, prices will begin to reflect a vague mix between the probability of the outcome actually occurring and traders' beliefs about how the distorted outcome-judgment mechanism will rule.
Although the Venezuela dispute has attracted much attention, similar mistakes also occur on various platforms:
The Ukrainian map manipulation incident shows that adversaries can directly manipulate dispute settlement mechanisms. A contract for control of territory stipulates that the outcome will be determined based on a specific online map. The map was allegedly tampered with to influence the outcome of the contract. When your factual basis can be tampered with, your market can also be manipulated.
The government shutdown of trading contracts shows that the basis for settlement may be inaccurate or even lead to unpredictable results. Settlement rules state that the market will settle based on when the Office of Personnel Management (OPM) website indicates the government shutdown is over. President Trump signed the appropriations bill on November 12, but for unknown reasons, the OPM website was not updated until November 13. Traders who correctly predicted that the government shutdown would end on the 12th lost their bets due to delayed updates by website administrators.
The market for Zelensky’s suit raised concerns about conflicts of interest. The contract asked whether Ukrainian President Volodymyr Zelensky would wear a suit on a specific occasion — a seemingly innocuous question that attracted more than $200 million in bets. When Zelensky showed up at the NATO summit wearing the suit described by the BBC, New York Post and other media outlets, the market's initial answer was "yes." However, Universal Market Access (UMA Protocol) token holders disputed the result, and the final result became “no”.
In this article, I explore how large language models (LLMs), cleverly combined with cryptography, can help us create large-scale prediction market solutions that are difficult to manipulate while simultaneously delivering accuracy, full transparency, and trusted neutrality.
Similar conundrums plague financial markets. The International Swaps and Derivatives Association (ISDA) has been grappling with clearing difficulties in the market for credit default swaps (contracts that pay compensation when a company or country defaults on its debt) for years, and its 2024 assessment report candidly points out these difficulties. ISDA’s Determination Committee, composed of major market participants, is responsible for voting on whether a credit event has occurred. But like UMA's process, it has been criticized for a lack of transparency, potential conflicts of interest and inconsistent results.
The fundamental problem is the same: When a lot of money depends on making decisions about ambiguous situations, every resolution mechanism becomes a target for exploitation, and every ambiguity becomes a potential trigger.
So, what should a good solution mechanism look like?
Any viable solution requires the implementation of several key features simultaneously.
Ability to resist manipulation. If an adversary can influence outcomes by editing Wikipedia, spreading fake news, bribing authorities, or exploiting bugs, then the market becomes a game of greater manipulation rather than greater predictability.
Reasonable accuracy. The mechanism must make the right call most of the time. In a world of real ambiguity, absolute accuracy is impossible, but systematic inaccuracies or obvious errors can seriously damage credibility.
Upfront transparency is critical. Traders need to fully understand how the settlement mechanism works before placing bets. Changing the rules during the transaction violates the basic agreement between the platform and the participants.
Trusted neutrality. Participants need to trust that the mechanism will not favor any particular trader or outcome. This is why it is so problematic to allow participants who hold large UMA contracts to settle the contracts they are betting on: even if they act fairly, the appearance of a conflict of interest can undermine trust.
Human councils can satisfy some of these properties, but fall short in others—especially at large scale, making it difficult to effectively resist manipulation and maintain trustworthy neutrality. Token-based voting systems like UMA also suffer from well-known issues of whale dominance and conflicts of interest.
This is where artificial intelligence (AI) comes in handy.
In the world of prediction markets, one proposal is gaining traction: using large language models (LLMs) as evaluators, locking specific models and hints into the blockchain when creating contracts.
The basic architecture works as follows: at contract creation time, the market maker specifies not only the resolution criteria in natural language, but also the exact LLM (identified by a timestamped model version) and the exact prompt used to determine the outcome.
The specification will be cryptographically written to the blockchain. When a trade begins, participants can view the complete settlement mechanism—they know exactly which AI model will judge the outcome, what prompts it will receive, and what information sources it has access to.
If they don't like the way they trade, they won't trade.
At the time of settlement, the submitted LLM will run according to the submitted prompts, access all specified information sources, and make judgments. The outcome of that judgment determines who gets paid.
This approach addresses several key constraints simultaneously:
LLM is highly resistant to manipulation (although not absolutely). Unlike a Wikipedia page or a small news website, you cannot easily edit the output of an LLM model. The weights of the model are determined at generation time. To manipulate its results, an adversary would either need to compromise the source of information the model relies on, or somehow contaminate the model's training data long in advance—both of which are costly and risky compared to bribing oracles or tampering with maps.
High accuracy. As inference models rapidly evolve to handle a wide variety of complex tasks, especially when capable of browsing the web and looking for new information, LLM models should be able to accurately solve many market problems—experiments to understand their accuracy are currently ongoing.
Full transparency. The entire settlement mechanism is transparent and auditable before anyone places a bet. There will be no rule changes during the game, no random judgments, and no behind-the-scenes dealings. You know exactly what you're getting into.
Significantly improved trustworthiness. The LLM model has no financial interest in the results, will not be bought, and does not hold UMA tokens. Its biases, whatever they may be, stem from the model itself rather than being the result of ad hoc decision-making by stakeholders.
Of course, LLM model judgment will also have some limitations. I will outline and discuss these limitations below:
Models make mistakes too. For example, LLM may misread news reports, fabricate facts, or apply criteria inconsistently. But as long as traders know which model they are using, they can take the model's flaws into account. If a model has a specific tendency to handle ambiguous cases, experienced traders will adjust for this. The model does not have to be perfect, but it must be predictable.
Manipulation is not impossible, just more difficult. If the alert specifies specific news sources, the attacker can attempt to plant stories in those sources. This type of attack would be costly for large media outlets, but could be effective for smaller ones — another form of the map tampering problem. The design of prompt information is critical: parsing mechanisms that rely on diverse, redundant information sources are more robust than those that rely on a single point of failure.
Theoretically, a poisoning attack is possible. An adversary with sufficient resources can attempt to influence the training data of the LLM model, thereby affecting its future judgments. But this requires action long before the contract is created, with uncertain and costly benefits—much more difficult than bribing committee members.
The proliferation of LLM models can cause coordination problems. If different market participants choose different LLMs and different trading tips, liquidity will be fragmented. It is difficult for traders to compare contracts or aggregate information across markets. While standardization is important, it is equally important to allow the market to discover which combinations of LLMs and trading tips work best. The right approach may be to do both: allow for experimentation, but also create mechanisms that enable the market to converge over time to well-proven defaults.
To summarize: AI-based solutions essentially replace a previous set of problems (human bias, conflicts of interest, information opacity) with another set of problems (model limitations, engineering challenges, information source holes) that may be easier to solve. So how do we move forward? The platform should:
Build a track record by testing LLM mechanisms on low-risk contracts. Which models perform best? Which prompt structures are the most robust? What failure modes occur in practice?
Standardization. As best practices emerge, the industry should work toward standardizing LLM-prompt combinations as the default option. Rather than hindering innovation, this helps concentrate liquidity in easy-to-understand markets.
Build transparency tools, such as interfaces that allow traders to see the complete settlement mechanism, including models, alerts and sources of information, before trading. Settlement mechanisms should not be hidden in obscure fine print.
Ongoing governance. Even with AI judgment, humans still need to make meta-level decisions: which models to trust, how to handle situations where a model gives an obviously wrong answer, and when to update default settings. The goal is not to exclude humans entirely, but to guide them away from ad hoc case-by-case judgments and towards systematic rule-making.
***
Prediction markets have huge potential to help us understand our complex world. But this potential depends on trust, which in turn depends on fair contractual settlement mechanisms. We've seen the consequences of failed resolution mechanisms: Confusion, anger, and a loss of traders. I’ve witnessed people quit prediction markets entirely after feeling cheated because the results of their bets seemed to go against their original intentions – and they swore off using their previously beloved platform. This is undoubtedly a missed opportunity to unleash the advantages of prediction markets and expand their wider applications.
The LLM model is not perfect. But when combined with encryption, they can be transparent, neutral, and resistant to the kinds of manipulation that plague human-based systems. At a time when the scale of the prediction market is expanding far faster than the governance mechanism, this may be exactly what we need.