AI Models - gonzalomunoz.ai

Frontier model families

Anthropic - Claude

Claude is a family of models from Anthropic focused on helpfulness, honesty, and safety. It's known for strong reasoning, long context windows, and careful tool use. Current lineup is organized into Opus (most capable), Sonnet (balanced), and Haiku (fast/cheap). Good default for coding agents, document analysis, and long-form reasoning.

OpenAI - GPT

The GPT family (GPT-4, GPT-4o, and the o-series reasoning models) pioneered modern LLMs. Strong general-purpose performance, rich tooling (function calling, structured outputs, Assistants API), and broad ecosystem support. Reasoning-focused variants trade latency for stronger multi-step thinking.

Google - Gemini

Gemini is Google's multimodal family (text, image, audio, video, code) with very long context windows. Tight integration with Google Cloud and Workspace, and competitive on multimodal benchmarks. Useful when you need native multimodal reasoning or deep GCP integration.

Meta - Llama

Llama is the most widely used open-weights family. Strong performance, permissive-ish licensing, and a massive ecosystem of fine-tunes. A good foundation when you need to run locally, fine-tune aggressively, or avoid sending data to third-party APIs.

Mistral

Mistral ships both open-weights models (Mistral 7B, Mixtral MoE) and hosted frontier models (Mistral Large). Known for efficient architectures - mixture-of-experts in particular - and strong performance per dollar.

xAI - Grok

xAI's Grok family is integrated with the X platform and positioned around real-time information and less restrictive outputs. Competitive on reasoning benchmarks in its top tiers.

DeepSeek, Qwen, and other open ecosystems

DeepSeek (DeepSeek-V3, R1 reasoning) and Alibaba's Qwen are leading open-weights families, often matching or exceeding Western open models on key benchmarks. Worth evaluating any time you're choosing an open baseline.

Quick comparison

Family	Provider	Access	Strengths	Typical use
Claude	Anthropic	API	Reasoning, long context, coding, tool use	Agents, doc analysis, coding assistants
GPT / o-series	OpenAI	API	General purpose, ecosystem, multimodal, reasoning	Broad product features, structured outputs
Gemini	Google	API / GCP	Native multimodal, very long context	Video/audio, GCP-native apps
Llama	Meta	Open weights	Self-hosting, fine-tunes, ecosystem	On-prem, private data, custom models
Mistral / Mixtral	Mistral	Open + API	Efficient MoE, cost/perf	High-volume, latency-sensitive apps
Grok	xAI	API / X	Real-time data, reasoning	X-integrated and news-heavy use cases
DeepSeek / Qwen	DeepSeek / Alibaba	Open weights	Strong open baselines, reasoning variants	Open-source alternatives to frontier APIs

How to pick a model

Define the task. Classification, summarization, coding, multimodal, agentic? Each favors a different model.
Set a quality bar. Build a small eval set (20–100 examples) with expected outputs.
Set cost/latency budgets. Tokens-per-request × requests-per-day × price. Include latency SLAs.
Run a bake-off. Test 3–5 candidates on the eval set with identical prompts.
Re-evaluate quarterly. Models and prices move fast - your choice has a shelf life.

Model capabilities, pricing, and context windows change frequently. Treat this page as a map, not a spec sheet - check each provider's docs for current numbers.