II. Feature Catalog
every capability on the platform surface
34 features across 7 categories. Each with availability and an opinionated competitive note.
The platform surface groups into seven categories. Prompt development, evaluation and testing, managed agents, skills, models and infrastructure, tools and integrations, and admin and billing. Features within each are listed with availability, a one-paragraph description, and an honest competitive note.
Prompt Development
Workbench
Browser-based prompt prototyping studio at platform.claude.com/workbench. Test prompts with variable substitution ({{var}} syntax), tune temperature and max-tokens, save versions, and export as code in 8 languages.
Availability Platform (all plans with API access)
Vs. OpenAI Playground: sharper on structured prompting, weaker on UI polish. Vs. LangSmith Studio: simpler, less opinionated. The best place to draft a prompt you'll later ship in production.
Prompt Generator
"Describe a task; get a working prompt" tool. Produces scaffolded first drafts with XML structure, examples, and chain-of-thought by default.
Availability Platform / Console
Vs. hand-writing from scratch: removes the blank-page problem. The output is a starting point, not a finish line — every prompt in Vol. II was originally seeded this way then heavily rewritten.
Prompt Improver
Takes an existing prompt and adds chain-of-thought steps, XML tags, clearer structure. Benchmarks show ~30% accuracy gain on classification tasks.
Availability Platform / Console
Vs. manual editing: faster for baseline improvements. Will over-engineer simple prompts — watch for bloat.
Examples Manager
Structured few-shot examples with clear input/output pairs. Auto-generates synthetic examples if you have none. Examples are inserted at the start of the first user message in the actual API call.
Availability Platform / Console (within Workbench)
Vs. pasting examples into the prompt text: less error-prone, versioned with the prompt, easier to swap out.
Evaluation & Testing
Evaluation Tool
Create test-case suites with variable substitution, grade outputs on a 5-point scale, side-by-side compare prompt versions, auto-generate test cases, import from CSV.
Availability Platform / Console
Vs. PromptLayer / LangSmith: simpler, tied to Anthropic. Vs. bespoke eval harness: faster to start, less flexible. Best for structured prompts where the right answer is checkable.
Ideal Output Column
Optional column in the eval sheet where you record the target output. Used for benchmarking and regression testing prompt changes.
Availability Platform / Console (within Eval Tool)
Vs. just grading outputs: gives you a ground truth to regress against, not just a quality score.
Prompt Versioning
Every Workbench prompt has a version history. Re-run the eval suite against any historical version to see how changes affected performance.
Availability Platform / Console
Vs. git-for-prompts workflows: less portable, more integrated. Makes A/B comparisons almost free.
Managed Agents (Beta)
Managed Agents
Server-managed stateful agents launched April 8, 2026. Anthropic hosts the sandbox, credentials, state, checkpointing. $0.08/hr runtime + token costs. Removes the "second job" of building agent infrastructure.
Availability Platform / API (requires managed-agents-2026-04-01 beta header)
Vs. OpenAI Assistants API: comparable surface, stronger on long-running tasks. Vs. LangGraph + self-hosted infra: 10x faster to production, less flexibility. Vs. Amazon Bedrock Agents: simpler onboarding, weaker AWS-ecosystem integration.
Agents API
Create an agent once (model, system prompt, tools, MCP servers, skills), reference it by ID across sessions. The "agent" is the persistent config, the "session" is each execution.
Availability Platform / API
Vs. stateless tool-use loops: better for complex workflows. Vs. LangChain agents: more opinionated, less DIY plumbing.
Environments
Container configuration: pre-installed packages (Python, Node.js, Go, etc.), network access rules, mounted files, credential injection. Define once, reuse across sessions.
Availability Platform / API
Vs. Docker + your own orchestrator: zero infra effort. Vs. Replit-for-agents tools: Anthropic-managed, single-vendor.
Sessions
Each session is a fresh container with persistent event history. Stream events via server-sent events (SSE). Interrupt or guide mid-execution.
Availability Platform / API
Vs. one-shot tool-use: supports hours-long workflows. Vs. polling-based agent APIs: the SSE streaming is cleaner for UX.
Vaults
Secure credential storage for agent sessions. Inject API keys, database connections, and secrets without exposing them to the model context.
Availability Platform / API
Vs. environment variables in your own infra: no key rotation headache. Vs. HashiCorp Vault: narrower scope, Anthropic-native.
Outcomes, Multiagent, Memory (Research Preview)
Three experimental features gated by access request: outcome-based task completion signals, multi-agent orchestration, and persistent per-agent memory across sessions.
Availability Platform / API (request access)
Vs. the current beta: these are what separates "agent framework" from "agent platform". Worth requesting access even if you don't use them yet.
Skills (Platform)
Agent Skills
Reusable instruction packs with code, config, and context. Attach up to 20 per session. Same concept as claude.ai Skills, but installable in Managed Agents and distributable as artifacts.
Availability Platform / API / claude.ai
Vs. custom GPTs: compose together, work across surfaces, don't require a Store listing. Vs. LangChain tools: skills are richer (instructions + code + context), not just function wrappers.
Custom Skill Upload
Upload your own Skills as zip files through Settings > Features. Available on Pro, Max, Team, and Enterprise with code execution enabled.
Availability Platform / claude.ai (Pro+)
Vs. prompt templates in your own repo: Claude loads them contextually, no engineering overhead. Currently per-user — no org-wide central management yet.
Skill Builder
Create and edit Skills with a structured editor. The SKILL.md frontmatter pattern (--- name: ... description: ... ---) plus markdown body. Preview, test, publish.
Availability Platform / Claude Code
Vs. writing raw markdown files: lower friction for non-developers. Vs. Anthropic's example skills repo: fully customisable for your domain.
claude-api Skill (Built-in)
Official skill that teaches Claude the Messages API, Managed Agents API, and SDKs. Covers 8 languages for Messages API, 7 for Managed Agents. Uses progressive disclosure to keep context efficient.
Availability Platform / Claude Code (bundled)
Vs. reading docs yourself: Claude loads only the doc fragments relevant to your current task. The meta-move of using Claude to build with Claude.
Models & Infrastructure
Model Picker
Opus 4.7 (flagship), Sonnet 4.6 (balanced speed/intelligence), Haiku 4.5 (fast/cheap). Deprecated: Sonnet 4, Opus 4 (retire June 15, 2026); Haiku 3 (retired April 19, 2026).
Availability Platform / API
Vs. OpenAI's GPT-4o / 4.5 / o-series: cleaner model-family framing, more predictable pricing. Migrate deprecated models before sunset dates.
Extended Thinking
Sonnet 4.6 and Opus 4.7 support extended thinking — visible reasoning tokens before the final response. Tunable thinking budget. 1M-token context on Sonnet 4.6 beta.
Availability Platform / API
Vs. OpenAI o-series: comparable reasoning depth, more transparent about thinking tokens. Vs. hiding the reasoning: extended thinking is itself debuggable output.
Advisor Tool (Beta)
Pair a faster executor model with a higher-intelligence advisor model that provides strategic guidance mid-generation. Long-horizon workloads approach advisor-solo quality at executor-model cost. Beta header: advisor-tool-2026-03-01.
Availability Platform / API
Vs. running everything on Opus: massive cost savings on long tasks. Vs. manually switching models: automated hand-off, no app logic needed.
Prompt Caching
Cache large static prompt prefixes (system prompt, long context) and reuse them across requests at ~10% of normal cost. Cache writes cost slightly more than normal tokens; reads cost a fraction.
Availability Platform / API
Vs. OpenAI's prompt caching: broadly comparable, different TTL defaults. The single biggest cost lever for any agent that re-sends context.
Batch API
Submit up to 100,000 requests in a batch, processed within 24 hours, at 50% of standard token prices.
Availability Platform / API
Vs. OpenAI Batch API: similar discount, similar latency. Vs. real-time API for bulk work: half the cost, slower turnaround.
Code Execution (Sandboxed)
Server-side Python execution as a first-party tool. Free when paired with web_search or web_fetch; standalone pricing otherwise.
Availability Platform / API
Vs. running your own Python sandbox: no security plumbing. Vs. OpenAI's Code Interpreter: tighter integration with the tool-use schema.
Structured Outputs
Force responses to conform to a JSON schema. Reduces parsing errors and LLM-vs-code contract failures.
Availability Platform / API
Vs. OpenAI structured outputs: comparable feature parity. Vs. regex-parsing free-form output: orders of magnitude more reliable.
Streaming
Server-sent event streaming for real-time response rendering. Essential for chat UIs and long-form generation UX.
Availability Platform / API
Standard across major AI APIs. Anthropic's SSE format is clean and well-documented.
Tools & Integrations
Tool Use
Claude calls your functions with typed arguments. You execute; Claude reads the result and continues. The primitive beneath every agent framework.
Availability Platform / API
Vs. OpenAI function calling: comparable. Claude's tool-use JSON schema has fewer footguns around strict mode.
MCP Servers in Agents
Managed Agents can call any MCP server — Gmail, Drive, Slack, GitHub, custom ones. Same protocol as Desktop app MCP, but runs server-side.
Availability Platform / API (within Managed Agents)
Vs. writing custom tool wrappers: MCP is becoming the de facto standard, so your investment is portable.
Web Search Tool
First-party web search as a tool call. Real-time results with citation metadata.
Availability Platform / API
Vs. wiring in Tavily/Perplexity search APIs: no extra vendor, no separate billing.
Web Fetch Tool
First-party URL fetch with HTML-to-markdown extraction, rate limiting, and domain allow/block lists.
Availability Platform / API
Vs. rolling your own fetcher: battle-tested. Vs. Firecrawl / Jina: free when paired with web search.
Admin & Billing
Workspace Management
Multi-workspace support with separate keys, rate limits, spend caps. Invite team members, assign roles, audit usage per workspace.
Availability Platform / Console
Vs. OpenAI orgs: cleaner model for agencies managing multiple clients. Each client can live in its own workspace with its own spend cap.
Usage Dashboard
Real-time spend, request volume, rate-limit headroom, model-by-model breakdown. Export to CSV.
Availability Platform / Console
Vs. cobbling together with webhooks and BI tools: good-enough out of the box for small/mid teams.
Spend Limits & Alerts
Hard spend caps per workspace; email alerts at configurable thresholds. Prevents runaway agent costs.
Availability Platform / Console
Vs. "hope the bill doesn't blow up": essential for anyone running agents unsupervised. Set these before shipping.
SSO & RBAC (Team/Enterprise)
SAML SSO, SCIM provisioning, custom roles scoped to specific Claude capabilities per group.
Availability Platform / Console (Team/Enterprise)
Vs. Enterprise OpenAI: comparable controls, cleaner UI for role definition.
API Key Rotation
Generate, revoke, and rotate API keys per workspace. Keys are scoped to workspace permissions.
Availability Platform / Console
Standard. Do it on schedule, not after an incident.