Résumer cet article avec :
The best AI models win gold medals at math Olympiads, but fail to read an analog clock half the time. The Stanford AI Index 2026 documents this paradox and its implications for companies deploying AI for sensitive tasks.
A large language model can win a gold medal at the International Mathematical Olympiad. The same model, when faced with an analog clock, will only tell the time correctly half the time.
This paradox has a name: the jagged frontier, or jagged frontier. It is now documented by the ninth annual AI Index report, published in April by the Stanford Institute for Human-Centered AI. And it changes what can be expected from AI in business.
A structural flaw, not a teething problem
Since 2017, the AI Index Report has been the most cited neutral reference by governments and businesses on the true state of artificial intelligence. The 2026 edition, over 400 pages across nine chapters, is authored by an independent committee comprising academics, industry professionals, and public experts.
Its main finding can be summarized in one sentence: large models are brilliant at times, and flawed at others, without being able to predict when.
The clock analogy is not arbitrary. It illustrates what researchers call the jagged frontier: the same model excels at tasks humans consider extremely difficult, yet fails at tasks they deem trivial. No apparent logic, no reproducible pattern. The developer itself doesn't know where the next error will occur.
For a leader, the consequence is less technical than strategic. A general-purpose tool cannot be deployed as is for sensitive tasks—compliance, contracts, auditing, legal—because its failure zone is unpredictable. This is no longer an argument for caution: it's a measured fact, backed by data.
88% adoption, fewer than one in ten companies in production for judgment-based tasks
The second figure in the report challenges the prevailing narrative. According to Stanford, 88% of companies report having "adopted" AI. But fewer than one in ten have actually put it into production for tasks requiring judgment.
Successful deployments focus on repetitive tasks—customer support, code generation, marketing—with productivity gains of 14 to 26%. For tasks requiring judgment, the measured effects are weak, or even negative.
However, compliance, legal monitoring, internal control, and audit response are, by their nature, judgment-based activities. The discrepancy between the prevailing narrative of the AI revolution and its operational reality is now documented.
The silent collapse of transparency
The third finding directly concerns legal departments and audit committees. The transparency index for foundation models, calculated by Stanford, has dropped from 58 to 40 points in one year.
Developers are publishing less and less about what goes into training their models, the volume of data used, or the nature of the safeguards applied. For a data protection officer or an audit committee, the equation is simple: you cannot audit what you cannot see. And in the event of an incident, the ultimate responsibility remains with the user company, not the developer.
The 2026 report also dedicates, for the first time, an entire chapter to "AI Sovereignty. A striking figure: public trust in US AI regulation has fallen to 31%, the lowest score among major economies measured. Conversely, the European Union is now perceived as the most credible region—a strong signal at a time when NIS2, DORA, and the AI Act are reshaping compliance obligations for major corporations.
"A 75% score on a legal reasoning benchmark says nothing about performance in a real law firm." Raymond Perrault, co-director of the Stanford AI Index, April 2026
This statement deserves serious consideration. It refocuses the debate on three topics that are no longer solely the concern of the IT department: operational reliability, auditability, and sovereignty.
In contrast to the jagged frontier
Optivalue.ai takes the exact opposite approach: a specialized AI solution for tasks where generalist models fall short, such as security questionnaires, compliance, CSR, and tenders.
On reliability
Where general-purpose models exhibit unpredictable reliability, Optivalue.ai relies on 89 agents trained vertically on industry-specific datasets. It is specialization, not universality, that reduces the error surface.
On abstention
Where generalist models always respond, even when they don't know, Optivalue.ai abstains when it lacks a source. It's the only AI that knows how to say "I don't know." Every answer cites its document, page, and date. Zero hallucination, by design.
On sovereignty
Where the transparency index of large models falls short, Optivalue.ai offers a private AI per client, never shared, available in a European sovereign cloud or on-premise. Your data never leaves your perimeter. ISO 27001 certified, winner of the European Sovereignty Award 2026.
Future analyses in this series will revisit two practical consequences of the Stanford findings: the emergence of hallucinations in the most sensitive judgment tasks, and the now governance issue of data confidentiality entrusted to large models.
This analysis is based on the Stanford AI Index Report 2026, published in April 2026 by the Stanford Institute for Human-Centered AI. The report and its data are freely available at hai.stanford.edu/ai-index/2026-ai-index-report.
To discover how Optivalue.ai answers security questionnaires, audits, and tenders without hallucination: optivalue.ai
The editorial team, Optivalue.ai
Turn your quizzes into opportunities, right now
30 days free • No credit card required • No commitment