# Direct Inference

> Direct Inference is a zero-knowledge inference endpoint. Point an OpenAI-,
> Anthropic-, or Gemini-compatible client at one base URL, keep the model id your
> app already sends, and every request is classified by its shape. Behind the
> endpoint, model orchestration and smart routing scale each call to its task
> complexity. The response echoes your model id back; which model, provider, or
> version served the request stays hidden. Only the request type is exposed. The
> endpoint keeps optimizing on its own: simple work is served cheap, the best
> available model is served for hard work, and new models are absorbed as they
> ship — so costs fall and quality rises without any change to your code. There is
> no routing layer, gateway, or proxy to stand up.

Base URL: https://api.directinference.com/di/v1
Auth: send your Direct Inference API key as the SDK's API key / Bearer token.

## Quickstart

OpenAI-compatible (Python):

    from openai import OpenAI

    client = OpenAI(
        base_url="https://api.directinference.com/di/v1",
        api_key="YOUR_DIRECT_INFERENCE_KEY",
    )
    resp = client.chat.completions.create(
        model="gpt-5.5-mini",            # keep your own model id; it is echoed back
        messages=[{"role": "user", "content": "Summarize this thread."}],
    )

The same base URL also accepts the Anthropic Messages shape and the Gemini
generateContent shape. Streaming, tool use, vision, PDFs, and structured output
all pass through.

## Request types (classified from the request shape)

- vision: image content in the request -> a vision-capable model.
- document: PDF or file input -> document-capable handling.
- long: input beyond the standard context window -> a long-context path.
- code: tool definitions, diffs, stack traces, repo paths -> coding/tool strength.
- json: a response/output JSON schema is set -> a schema-reliable model.
- reason: multi-step reasoning in the prompt -> a reasoning model.
- flash: simple request at low effort -> fast and cheap.
- pro: everything else (default) -> a strong all-rounder.

Capability outranks the model name: a PDF or image sent to a "mini" id still gets
a capable model. Unknown, legacy, and future ids resolve instead of erroring.

## Effort (optional cost/quality hint)

Send the X-DI-Effort header or an ?effort= query param. Levels: fast, minimal,
low, medium, high, xhigh, max (omitted = auto; none is accepted as an alias for
fast). OpenAI reasoning_effort and native thinking fields are read as the same
signal. Effort tunes the serving choice; request shape still decides the needed
capability. Where only a model id fits (pickers, fast/smart model slots), the
catalog at GET /di/v1/models also lists di-saver and di-max — the same model
with effort pinned, not separate models.

    resp = client.chat.completions.create(
        model="gpt-5.5",
        messages=[{"role": "user", "content": "Plan a database migration."}],
        extra_headers={"X-DI-Effort": "high"},
    )

## Links

- Product: https://directinference.com/
- Why Direct Inference (model orchestration inside one endpoint): https://directinference.com/why
- Developers (quickstart, request types, compatibility): https://directinference.com/developers
- Documentation (guides and API reference): https://docs.directinference.com
- Pricing: https://directinference.com/pricing
- Security: https://directinference.com/security
- Portal (create an API key): https://app.directinference.com
- Agent skill (automated migration): https://github.com/Direct-Inference/skills
- Full machine-readable docs: https://directinference.com/llms-full.txt