# Direct Inference > Direct Inference is a zero-knowledge inference endpoint. Point an OpenAI-, > Anthropic-, or Gemini-compatible client at one base URL, keep the model id your > app already sends, and every request is classified by its shape. Behind the > endpoint, model orchestration and smart routing scale each call to its task > complexity. The response echoes your model id back; which model, provider, or > version served the request stays hidden. Only the request type is exposed. The > endpoint keeps optimizing on its own: simple work is served cheap, the best > available model is served for hard work, and new models are absorbed as they > ship — so costs fall and quality rises without any change to your code. There is > no routing layer, gateway, or proxy to stand up. Base URL: https://api.directinference.com/di/v1 Auth: send your Direct Inference API key as the SDK's API key / Bearer token. ## Quickstart OpenAI-compatible (Python): from openai import OpenAI client = OpenAI( base_url="https://api.directinference.com/di/v1", api_key="YOUR_DIRECT_INFERENCE_KEY", ) resp = client.chat.completions.create( model="gpt-5.5-mini", # keep your own model id; it is echoed back messages=[{"role": "user", "content": "Summarize this thread."}], ) The same base URL also accepts the Anthropic Messages shape and the Gemini generateContent shape. Streaming, tool use, vision, PDFs, and structured output all pass through. ## Request types (classified from the request shape) - vision: image content in the request -> a vision-capable model. - document: PDF or file input -> document-capable handling. - long: input beyond the standard context window -> a long-context path. - code: tool definitions, diffs, stack traces, repo paths -> coding/tool strength. - json: a response/output JSON schema is set -> a schema-reliable model. - reason: multi-step reasoning in the prompt -> a reasoning model. - flash: simple request at low effort -> fast and cheap. - pro: everything else (default) -> a strong all-rounder. Capability outranks the model name: a PDF or image sent to a "mini" id still gets a capable model. Unknown, legacy, and future ids resolve instead of erroring. ## Effort (optional cost/quality hint) Send the X-DI-Effort header or an ?effort= query param. Levels: fast, minimal, low, medium, high, xhigh, max (omitted = auto; none is accepted as an alias for fast). OpenAI reasoning_effort and native thinking fields are read as the same signal. Effort tunes the serving choice; request shape still decides the needed capability. Where only a model id fits (pickers, fast/smart model slots), the catalog at GET /di/v1/models also lists di-saver and di-max — the same model with effort pinned, not separate models. resp = client.chat.completions.create( model="gpt-5.5", messages=[{"role": "user", "content": "Plan a database migration."}], extra_headers={"X-DI-Effort": "high"}, ) ## Links - Product: https://directinference.com/ - Why Direct Inference (model orchestration inside one endpoint): https://directinference.com/why - Developers (quickstart, request types, compatibility): https://directinference.com/developers - Documentation (guides and API reference): https://docs.directinference.com - Pricing: https://directinference.com/pricing - Security: https://directinference.com/security - Portal (create an API key): https://app.directinference.com - Agent skill (automated migration): https://github.com/Direct-Inference/skills - Full machine-readable docs: https://directinference.com/llms-full.txt