AI Models

A practical reference to the major frontier and open-source models: who makes them, what they're good at, and when to use them.

Frontier model families

Anthropic - Claude

Claude is a family of models from Anthropic focused on helpfulness, honesty, and safety. It's known for strong reasoning, long context windows, and careful tool use. Current lineup is organized into Opus (most capable), Sonnet (balanced), and Haiku (fast/cheap). Good default for coding agents, document analysis, and long-form reasoning.

OpenAI - GPT

The GPT family (GPT-4, GPT-4o, and the o-series reasoning models) pioneered modern LLMs. Strong general-purpose performance, rich tooling (function calling, structured outputs, Assistants API), and broad ecosystem support. Reasoning-focused variants trade latency for stronger multi-step thinking.

Google - Gemini

Gemini is Google's multimodal family (text, image, audio, video, code) with very long context windows. Tight integration with Google Cloud and Workspace, and competitive on multimodal benchmarks. Useful when you need native multimodal reasoning or deep GCP integration.

Meta - Llama

Llama is the most widely used open-weights family. Strong performance, permissive-ish licensing, and a massive ecosystem of fine-tunes. A good foundation when you need to run locally, fine-tune aggressively, or avoid sending data to third-party APIs.

Mistral

Mistral ships both open-weights models (Mistral 7B, Mixtral MoE) and hosted frontier models (Mistral Large). Known for efficient architectures - mixture-of-experts in particular - and strong performance per dollar.

xAI - Grok

xAI's Grok family is integrated with the X platform and positioned around real-time information and less restrictive outputs. Competitive on reasoning benchmarks in its top tiers.

DeepSeek, Qwen, and other open ecosystems

DeepSeek (DeepSeek-V3, R1 reasoning) and Alibaba's Qwen are leading open-weights families, often matching or exceeding Western open models on key benchmarks. Worth evaluating any time you're choosing an open baseline.

Quick comparison

Family Provider Access Strengths Typical use
Claude Anthropic API Reasoning, long context, coding, tool use Agents, doc analysis, coding assistants
GPT / o-series OpenAI API General purpose, ecosystem, multimodal, reasoning Broad product features, structured outputs
Gemini Google API / GCP Native multimodal, very long context Video/audio, GCP-native apps
Llama Meta Open weights Self-hosting, fine-tunes, ecosystem On-prem, private data, custom models
Mistral / Mixtral Mistral Open + API Efficient MoE, cost/perf High-volume, latency-sensitive apps
Grok xAI API / X Real-time data, reasoning X-integrated and news-heavy use cases
DeepSeek / Qwen DeepSeek / Alibaba Open weights Strong open baselines, reasoning variants Open-source alternatives to frontier APIs

How to pick a model

  1. Define the task. Classification, summarization, coding, multimodal, agentic? Each favors a different model.
  2. Set a quality bar. Build a small eval set (20–100 examples) with expected outputs.
  3. Set cost/latency budgets. Tokens-per-request × requests-per-day × price. Include latency SLAs.
  4. Run a bake-off. Test 3–5 candidates on the eval set with identical prompts.
  5. Re-evaluate quarterly. Models and prices move fast - your choice has a shelf life.
Model capabilities, pricing, and context windows change frequently. Treat this page as a map, not a spec sheet - check each provider's docs for current numbers.