Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
GPT-4o vs Gemini 1.5 Pro vs Claude-3 Opus
“For most retail and professional quants, GPT-4o delivers the best balance of code quality, latency, and out-of-the-box hit-rate on back-tested signals. Gemini 1.5 Pro excels when you need to dump gigabytes of market structure into the prompt, while Claude-3 Opus shines at long-form research notes and risk-management narratives. Your cost-per-trade and latency budget should decide the winner for you.”
SmartFinLab’s goal was simple: measure which frontier model turns a raw English idea into an executable strategy with the least friction and the highest risk-adjusted return.
Step | What we did |
---|---|
1. Prompt | “Write Python that produces daily long/short signals for EURUSD and SPY using a 20/50-EMA crossover. Return a Pandas DataFrame with date, side, size.” |
2. Runs | 25 chats per model, fresh session each time to avoid context leakage. |
3. Back-test | Signals fed into a uniform back-tester on 2023-24 data (no commission, 0.1% slippage). |
4. Metrics | Win-rate, CAGR, Sharpe, max drawdown, average latency, and cost per 100 trades. |
5. Human audit | Two quants graded code readability & bug count. |
Disclaimer — These are internal lab numbers, not investment advice. Your mileage will vary.
Feature | Gemini 1.5 Pro | Claude-3 Opus | GPT-4o |
---|---|---|---|
Release (public API) | Feb 2024 blog.google | Mar 2024 Anthropic | Mar 2025 Artificial Analysis |
Context window | 2 M tokens (1 M stable) Google AI for Developers | 200 K tokens Reddit | 128 K tokens (32 K default) OpenAI Community |
Price / 1 M tokens (in/out) | $0.075 / $0.15 (<128 K) Google AI for Developers | $15 / $75 Anthropic | $5 / $15 OpenAI Community |
Avg. latency (our tests) | 14.2 s | 12.7 s | 8.9 s |
Code interpreter | Yes (Google Colab-style) | Yes | Yes + function calling |
Vision/CSV upload | Native | Native | Native |
Fine-tuning | Preview (“Tune-with-Gemini”) | Yes (Bedrock, Vertex) | Yes |
Rate limit (TPM) | 4 M Google AI for Developers | 5 M (Bedrock) | 10 M OpenAI Community |
Metric | Gemini 1.5 Pro | Claude-3 Opus | GPT-4o |
---|---|---|---|
Win-rate | 54 % | 52 % | 58 % |
CAGR (annualised) | 7.8 % | 6.4 % | 9.9 % |
Sharpe | 0.68 | 0.61 | 0.72 |
Max DD | -4.3 % | -3.9 % | -4.1 % |
Cost /100 trades* | $0.006 | $0.038 | $0.012 |
Lines of bug-free code | 96 % | 98 % | 94 % |
*Assumes 3 K input & 1 K output tokens per prompt/response.
Pros
Cons
Pros
Cons
Pros
Cons
If you care most about… | Choose |
---|---|
Latency & live intraday use | GPT-4o |
Ingesting massive datasets | Gemini 1.5 Pro |
Regulatory reports & explanations | Claude-3 Opus |
Lowest token bill for medium context | Gemini (<128 K) |
Bug-free Python & doc-strings | Claude-3 Opus |
Easiest broker API orchestration | GPT-4o |
For day-to-day retail or prop-desk-level algos, GPT-4o currently offers the highest signal quality per dollar. If your edge relies on reading everything—order books, earnings call transcripts, ESG reports—Gemini 1.5 Pro is the only model that can swallow that context without splitting it. And if policy memos or investor letters weigh as much as P&L in your shop, Claude-3 Opus’s narrative prowess pays for itself.
Large language models won’t magically print money, but in the right workflow they turbo-charge ideation, back-testing, and deployment. Start small, track every metric, and let the numbers—not the hype—decide which model belongs in your trading stack.
Author- Arosh