Unpublished model watch
Claude Mythos vs Gemini 3.5 Pro
A careful comparison of two models you cannot simply buy yet: Anthropic's gated Claude Mythos Preview and Google's still-unreleased Gemini 3.5 Pro.
The honest comparison starts with access
Claude Mythos Preview and Gemini 3.5 Pro are not normal model-comparison targets. You cannot treat them like public API models with stable pricing, model IDs, and third-party leaderboards. Claude Mythos Preview is gated through Anthropic's Project Glasswing research program. Gemini 3.5 Pro has been disclosed by Google as an internal model, with rollout expected next month, but no public Pro benchmark table exists yet.
That does not make the comparison useless. It just changes the question. Instead of asking which model wins, ask what each unpublished model signals about the next frontier: Anthropic's path toward long-horizon, risky, research-grade agents; Google's path from fast multimodal agents into a stronger Pro tier integrated through Search, Workspace, Android, and Vertex.
What is actually public?
This is the line between evidence and speculation.
| Model | Status as of May 28, 2026 | Public evidence | What not to claim |
|---|---|---|---|
| Claude Mythos Preview | Gated research preview under Anthropic's Project Glasswing, available to selected external AI safety researchers rather than general developers. | Anthropic has published Project Glasswing, a Mythos Preview system card, an alignment risk update, and later Opus 4.8 materials that say Mythos remains Anthropic's capability frontier. | Do not describe Mythos as generally available, priced for normal production, or directly comparable to public API models on every task. |
| Gemini 3.5 Pro | Unreleased public model. Google's Gemini 3.5 Flash launch says 3.5 Pro is already being used internally and is expected to roll out next month. | Google has published Gemini 3.5 Flash results and product positioning, but not a public Gemini 3.5 Pro benchmark table. | Do not assign Gemini 3.5 Pro benchmark scores from Gemini 3.5 Flash, Gemini 3.1 Pro, or rumors. |
| Claude Opus 4.8 | Generally available premium Opus model released May 28, 2026. | The Opus 4.8 system card gives a useful public anchor: Opus 4.8 sits above Opus 4.7 on most capability evaluations but below Mythos Preview overall. | Do not treat Opus 4.8 as a Mythos substitute. Anthropic explicitly says Mythos remains the higher capability frontier. |
| Gemini 3.5 Flash | Public Google model released May 19, 2026. | Google positions it as faster and more capable than 3.1 Pro across almost all tested benchmarks, with strong agentic and multimodal signals. | Do not assume the Pro version will preserve the same strengths or pricing tradeoffs until Google publishes it. |
What Claude Mythos signals
Mythos Preview is best understood as Anthropic's early-warning frontier, not its ordinary product tier. Anthropic's Project Glasswing framing says the model is being made available to selected researchers because it is powerful enough to stress-test catastrophic-risk assumptions, alignment behavior, and long-horizon autonomy without being released broadly.
The published Mythos material points to a model that is unusually strong at long-running agentic engineering and tool use, but also important enough to keep behind access controls. That access decision is itself part of the comparison. Anthropic is saying, in effect, that the research frontier and the generally available product frontier are no longer the same thing.
Claude Mythos evidence to keep in view
These are the public signals that make Mythos relevant even though it is not broadly available.
| Signal | Published detail | Interpretation |
|---|---|---|
| Capability frontier | Anthropic's Opus 4.8 system card says Opus 4.8 remains weaker than Claude Mythos Preview overall and does not advance Anthropic's capability frontier. | Mythos is the higher internal/gated reference point, while Opus 4.8 is the public product bridge. |
| Software engineering | Anthropic's Project Glasswing material reports Mythos Preview ahead of prior Claude models on SWE-bench, Terminal-Bench, FrontierSWE, ProgramBench, OSWorld, and internal R&D tasks. | The Mythos story is not just chat quality. It is long-horizon engineering autonomy. |
| Risk posture | Anthropic ties Mythos to safety research, an alignment risk update, and controlled researcher access. | Access control is part of the product signal: the model may be ahead, but not production-normal. |
| Roadmap implication | The Opus 4.8 release says Anthropic expects to bring Mythos-class models to broader users in the coming weeks. | Teams should prepare eval harnesses now, not rewrite production architecture around a model they cannot yet use. |
What Gemini 3.5 Pro has to prove
Google's public signal is different. Gemini 3.5 Flash is already out, and it is stronger than the old mental model of a cheap Flash tier. Google positions it around speed, deeper reasoning, multimodality, app control, and agentic workflows. Anthropic's own Opus 4.8 comparison table also shows Gemini 3.5 Flash as a serious tool-use and finance-agent competitor.
Gemini 3.5 Pro, however, is still a blank public benchmark sheet. It should be stronger than Flash on quality, but how much stronger, at what latency, at what price, and in which surfaces is exactly the part buyers cannot know yet. The honest comparison is therefore between Mythos's published gated research signal and Gemini 3.5 Pro's disclosed but unpublished product signal.
The benchmark watchlist for Gemini 3.5 Pro
When Gemini 3.5 Pro ships, these are the rows that will matter more than a generic 'best model' claim.
| Benchmark or area | Why it matters | Current anchor |
|---|---|---|
| SWE-bench Pro and Terminal-Bench | Shows whether Google can match Anthropic and OpenAI on real engineering-agent work. | Opus 4.8 leads SWE-bench Pro at 69.2%; GPT-5.5 leads Terminal-Bench 2.1 in Anthropic's table at 78.2% while Gemini 3.5 Flash is reported at 76.2%. |
| MCP Atlas and Toolathlon | Tests whether Gemini can use real tools cleanly, not just reason about tools in prose. | Gemini 3.5 Flash is already strong on MCP Atlas at 83.6, slightly above Opus 4.8's 82.2 in Anthropic's table. |
| Finance Agent and OfficeQA | Captures practical business research over filings, documents, tables, and numerical evidence. | Gemini 3.5 Flash leads Finance Agent v2 at 57.9%, while Opus 4.8 leads GDPval-AA in Anthropic's public comparison. |
| OSWorld and browser agents | Matters for product copilots that click, inspect screens, edit documents, and operate software. | Opus 4.8 posts 83.4% on OSWorld-Verified; Gemini 3.5 Flash is listed at 78.4 in Anthropic's table. |
| GDPval-AA | Useful for broad professional-work comparisons across documents, slides, diagrams, and spreadsheets. | Opus 4.8 posts 1890 Elo, GPT-5.5 1769, Gemini 3.5 Flash 1656, and Gemini 3.1 Pro 1314 across cited public tables. |
| Search grounding and product integration | Google's advantage may come less from raw scores and more from Search, Workspace, Vertex, Android, and app distribution. | No single benchmark captures the full Google ecosystem effect. |
Likely strengths if the evidence holds
This is a provisional read, not a leaderboard.
| Use case | Claude Mythos Preview | Gemini 3.5 Pro | Current practical advice |
|---|---|---|---|
| Long-horizon coding agents | The stronger disclosed signal because Mythos is repeatedly used as Anthropic's high-capability reference for AI R&D and engineering-agent risk. | Unknown until Pro publishes SWE-bench Pro, Terminal-Bench, OSWorld, and tool-use results. | Evaluate Opus 4.8 now; prepare to test Mythos-class Claude and Gemini 3.5 Pro when access opens. |
| Google-native enterprise assistants | Strong if the task is pure reasoning or code, but less obviously native to Workspace/Search/GCP distribution. | Likely stronger strategic fit because the product surface is Google's home field. | Use Gemini 3.5 Flash or 3.1 Pro as the current routing anchor, then retest when 3.5 Pro ships. |
| Financial and professional research | Promising by family resemblance to Opus 4.8 and Mythos's stronger frontier status, but not generally accessible. | Potentially strong if it inherits Gemini 3.5 Flash's Finance Agent signal while improving accuracy. | Run side-by-side evals on filings, spreadsheets, citations, and audit trails before migration. |
| Safety-sensitive autonomous agents | The more heavily documented risk and alignment story because Anthropic has published detailed Mythos and Opus 4.8 safety material. | Unknown until Google publishes Pro-specific safety and deployment details. | Prefer public, well-documented models for production until the unpublished tier has a system card and controls. |
| Consumer and everyday use | Not the right comparison target while gated. | Likely to matter through Gemini app, Android, Workspace, and Search once released. | Use public consumer surfaces rather than unpublished model expectations. |
The strategic read
Anthropic and Google appear to be optimizing for different kinds of inevitability. Anthropic is trying to make the most trusted high-capability agent, even if that means staged access and heavy safety documentation. Google is trying to make frontier capability feel like infrastructure: embedded in search, mobile, cloud, productivity software, and developer tools.
That means Mythos vs Gemini 3.5 Pro is not just a model contest. It is a contest between a gated research frontier moving toward productization and a product-distribution giant preparing to move its Pro tier forward. One may be stronger on hard agentic engineering. The other may be harder to avoid once it is wired into the software millions of teams already use.
Bottom line
Claude Mythos Preview has the stronger published frontier signal today, but it is gated and not a normal production choice. Gemini 3.5 Pro may become the more important mass-market model once Google ships it, but public evidence is not there yet.
For current production work, test Claude Opus 4.8, GPT-5.5, Gemini 3.5 Flash, and Gemini 3.1 Pro against your own tasks. For roadmap work, track Mythos-class Claude access and Gemini 3.5 Pro release details. The first team to update its eval harness will learn more than the team refreshing benchmark threads.
Frequently asked questions
The important questions are about evidence, access, and what a buyer can safely do before these models are broadly public.
Is Claude Mythos Preview available to normal developers?
No. It is a gated research preview through Anthropic's Project Glasswing, not a generally available production model. Anthropic has said Mythos-class models should reach broader users later, but teams should wait for access, model IDs, pricing, and migration details.
Is Gemini 3.5 Pro released?
No public Gemini 3.5 Pro release is available as of May 28, 2026. Google says it is already using Gemini 3.5 Pro internally and expects to roll it out next month.
Can Gemini 3.5 Flash scores predict Gemini 3.5 Pro performance?
Only weakly. Flash gives a directional signal about Google's new model family, especially around speed and agentic tasks, but Pro may differ in latency, price, context behavior, and benchmark strengths.
Should buyers wait for unpublished models?
Usually no. Build the evaluation harness now and run it against public models. Waiting makes sense only when the target workflow is not urgent and the unpublished model's expected strength is central to the architecture.
What benchmarks matter most for these models?
For engineering agents, watch SWE-bench Pro, Terminal-Bench, FrontierSWE, ProgramBench, OSWorld, MCP Atlas, Toolathlon, and AutomationBench. For everyday and professional work, watch GDPval, Finance Agent, OfficeQA, BrowseComp, HealthBench, Legal Agent Benchmark, multimodal chart/screen tests, and multilingual benchmarks.
Reference shelf
The comparison deliberately uses sources that disclose model status or benchmark methodology. Rumor is not a source of benchmark data.
Primary source for Mythos Preview's gated research access and Anthropic's reason for opening it to external safety researchers.
Primary source stating that Opus 4.8 improves on Opus 4.7 but remains below Claude Mythos Preview overall.
Release source for Opus 4.8's benchmark table, fast mode, effort controls, and Mythos-class roadmap language.
Primary Google source for Gemini 3.5 Flash and the disclosure that Gemini 3.5 Pro is in internal use and expected next month.
Useful for checking how Google's model choices become real enterprise architecture and cost decisions.
Useful current public OpenAI reference point for terminal agents, professional work, browsing, computer use, pricing, and model-routing comparisons.