Publié : 7 November 2025
Actualisé : 2 hours ago
Fiabilité : ✓ Sources vérifiées
Notre équipe met à jour cet article dès que de nouvelles informations sont disponibles.
📋 Table of Contents
Imagine for a moment: you enter a Formula 1 race, but the winner isn’t the gleaming Ferrari or the supercharged Red Bull. No, it’s a more modest, “intermediate” car that crosses the finish line first. That’s exactly the scenario that just unfolded in the world of artificial intelligence! France, with its public platform compar:IA, has unveiled its very first ranking of users’ favorite conversational AI models . And let me tell you, the results are astounding. Get ready to reconsider your go-to AI.
🏆 A Ranking That Shuffles the Deck: compar:IA’s Incredible Revelation
Launched a year ago by the French Interministerial Digital Directorate (DINUM) and the Ministry of Culture, compar:IA is a unique tool. Its principle? Testing AIs “blind.” In 95% of cases, you ask a question, and two unknown AIs respond. You choose the one you prefer, then their identities are revealed. A refreshing approach that promised results without bias. And those promises were kept, though perhaps not in the way we expected!
At the top of this list, a general surprise: neither the OpenAI behemoths nor Google’s latest gems are found. No, it’s a French model, Mistral-medium-3.1, that takes first place! And the craziest part is that it’s not even Mistral AI’s most powerful model, but a version optimized for a cost/performance compromise in the cloud.
The key takeaway: An intermediate French model, Mistral-medium-3.1, has dethroned AI giants in the compar:IA ranking, upsetting all expectations and international benchmarks.
🤔 Where Did the Heavyweights Go? The Shock of Forgotten Titans
So, what about the usual stars? The Gemini 2.5 Pros, the Claude Opus 4.1s, the GPT-5s… Where are they in this French ranking? This is where the gap widens with other comparators like LMArena, which tend to reflect expert consensus. On compar:IA, the “Flash” versions of Gemini, lighter and faster, occupy 2nd and 3rd place. As for OpenAI models, the first GPT only appears in seventh place, and it’s an open-source version, gpt-oss-120b, not the brand’s latest flagships!
| Rank | compar:IA (France) | LMArena (International) |
|---|---|---|
| #1 | Mistral Medium 3.1 | Gemini 2.5 Pro |
| #2 | Gemini 2.5 Flash | Claude Opus 4.1 Thinking |
| #3 | Gemini 2.0 Flash | Claude Sonnet 4.5 Thinking |
| #4 | Qwen 3 Max | GPT-4.5 Preview |
| #5 | DeepSeek-V3 | GPT-4o |
This table is striking, isn’t it? It shows a stark divergence between the perception of French users and the pure performance evaluations that dominate international rankings. Do French users have different expectations? Do they prioritize fluidity and speed over ultra-complex answers? The question remains open.
🤷♀️ The Mystery of Voters and the Will for Transparency
So, how should we interpret these unexpected results? The Ministry of Culture cautiously notes that very little information is available about the voter profiles. This is a deliberate choice, linked to personal data protection. It’s impossible to know whether those who vote are seasoned experts, curious individuals, or just regular users. This ensures anonymity but adds a layer of uncertainty regarding the panel’s homogeneity.
“We have very little information on voter profiles – a voluntary choice linked to personal data protection.”
This approach, while commendable for confidentiality, highlights that compar:IA is primarily a barometer of raw “user preference,” not a technical evaluation of the most advanced performances.
Important: The compar:IA ranking reflects the preferences of a broad, anonymous audience, and not necessarily expert opinions or raw technical performance. It’s a mirror of everyday usage.
📈 The Confidence Interval: The Compass for Results
To help us understand better, DINUM and the Ministry of Culture emphasize a crucial indicator: the confidence interval . This isn’t just a barbaric statistical term! It measures the robustness of a model’s position. A narrow interval means the model’s score is very stable, and users largely agree. Conversely, a wide interval indicates highly varied votes, with the model sometimes loved, sometimes less appreciated, making its position more fluid.
For example, if Mistral-medium-3.1 shows a confidence interval of -0/+0, it’s proof of an unshakable position, widely acclaimed by a large majority. In contrast, a model like deepseek-chat-V3.1, although 8th, has an interval of -10 to +7, meaning its position could climb or drop by several ranks with future votes. This detail is essential for not taking this ranking as an absolute and fixed truth.
🔮 So, Should We Trust AI Rankings?
This first compar:IA ranking is a real breath of fresh air in a universe dominated by a few giants. It reminds us that performance isn’t always about model size or its ability to solve complex problems. User experience, fluidity, or even an AI’s “personality” can make all the difference in the eyes of the general public.
It’s a living, evolving ranking that invites reflection. It highlights the importance of the “blind” approach to bypass our own biases. And what if, ultimately, the “best” AIs are the ones we prefer to use daily, rather than those that impress technical benchmarks? A fascinating question for the future of AI, isn’t it?




















0 Comments