Drop-in OpenAI-compatible API

Cut AI costs by up to 73%
without touching your code.

Flint Cloud routes your LLM calls to the cheapest model that meets quality thresholds, automatically optimises your prompts, and gives you full observability — all behind a single OpenAI-compatible endpoint.

Start for free Read the docs

Before → After

const openai = new OpenAI({
  apiKey: process.env
    .OPENAI_API_KEY,
  // ← default endpoint
});

const openai = new OpenAI({
  apiKey: process.env
    .FLINT_API_KEY,
  baseURL: 'https://
    api.flintlogic.com/v1',
});

73%

avg. cost reduction

<1ms

routing overhead

5 min

to production

Routes across all major providers

OpenAIAnthropicGoogleAWS BedrockAzure

Everything you need

One platform. Every AI cost lever.

From routing to prompt engineering to observability — Flint handles the entire AI cost stack so you can focus on building.

Multi-model routing

One API key, five providers. Flint selects the cheapest model that meets your quality bar for every request — automatically.

OpenAIAnthropicGoogleBedrockAzure

Prompt optimisation

Upload a test dataset and let Flint run LLM-as-judge evaluation loops to iteratively improve your system prompts. New versions go live without a redeploy.

LLM-as-judgeAuto-iterate

Real-time savings dashboard

See exactly how much you're saving per request, per agent, and per day. Compare actual spend to what the same workload would cost on GPT-4o.

Security guard

Every prompt scanned for PII, injection attacks, jailbreak attempts, and excessive agency before it leaves your perimeter.

PIIInjectionJailbreak

Prompt version control

Every optimised prompt is stored as an immutable version with its score, metrics, and parent diff. Roll back in one click.

Agent health monitoring

Track success rate, latency p99, and quality score per agent in real time. Get alerted before degraded prompts affect production.

Zero cold starts

Provisioned concurrency on the Pruner Lambda keeps P50 routing overhead under 1ms — even during traffic spikes.

Rate limiting & tier enforcement

Per-organisation token budgets and RPM caps enforced at the gateway layer. No surprise bills from runaway agents.

Auth0 JWT validation

JWKS-backed JWT verification with 5-minute caching. Your team's existing Auth0 tokens work out of the box.

Integration in minutes

How Flint works

Point your client at Flint

Change one line — swap `baseURL` to `api.flintlogic.com/v1`. Your existing OpenAI SDK, LangChain, or raw `fetch` calls work immediately.

const openai = new OpenAI({
  apiKey:  process.env.FLINT_API_KEY,
  baseURL: 'https://api.flintlogic.com/v1',
});

Flint routes to the cheapest capable model

The Pruner evaluates each request against your org's strategy (quality · cost · balanced) and rewrites `model` to the optimal provider before forwarding.

// Your request
{ model: 'gpt-4o', messages: [...] }

// What Flint forwards
{ model: 'gpt-4o-mini', provider: 'openai' }
// ↑ 20× cheaper, same quality for this task

Optimise prompts automatically

Upload a test dataset in the Workbench. Flint runs LLM-as-judge evaluation, mutates low-scoring prompts, and promotes the best version live — no redeploy needed.

POST /observe/optimizer
{
  "orgId":   "org_abc",
  "agentId": "agent_support",
  "dataset": [ ... ],     // test cases
  "targetScore": 90,      // stop when hit
  "strategy": "quality"
}

Watch savings accumulate in real time

The dashboard shows per-request, per-agent, and total savings vs. running everything on GPT-4o. Export usage data to your BI tool via the Usage API.

GET /observe/usage/org_abc
    ?from=2026-01-01&to=2026-01-31

{
  "totalCostUsd": 12.45,
  "savingsVsBaseline": 73.2,
  "byModel": { "gpt-4o-mini": {...} }
}

Trusted by fast-moving teams at

VercelStripeLinearNotionFigmaLoom

74% saved

“We cut our monthly OpenAI bill from $4,200 to $1,100 in two weeks. The prompt workbench alone paid for the year.”

Priya S.

CTO, Series A SaaS

61% saved

“Drop-in replacement took 3 minutes. The savings dashboard was running before lunch. Now we actually know what our AI stack costs.”

Marcus L.

Staff Engineer, Growth Platform

68% saved

“We had 12 agents with hand-tuned prompts that nobody dared to touch. Flint's versioning gave us the confidence to optimise them properly.”

Aisha K.

Head of AI, Enterprise Fintech

$2.4M+

Saved for customers

850M+

Tokens routed

99.97%

Gateway uptime

< 1ms

Routing overhead

Transparent pricing

Pay only for what you use

Free forever for individuals. Team and Enterprise plans scale with your usage.

Free

For individuals and side projects.

$0/ mo

2,048 max tokens / request
20 requests / minute
gpt-4o-mini · claude-haiku · gemini-flash
1 agent
Savings dashboard
Community support

Start for free

Pro

For teams shipping production AI features.

$39/ mo

Billed annually · $49/mo billed monthly

8,192 max tokens / request
200 requests / minute
All Free models + GPT-4o · Claude Sonnet
20 agents
Prompt Workbench (5 iterations / run)
Agent health monitoring
Usage API & cost alerts
Team management (up to 10 seats)
Email support

Start Pro trial

Enterprise

For organisations with large-scale AI workloads.

Custom

32,768 max tokens / request
2,000 requests / minute
All models — o1 · Claude Opus · Gemini 2.5
Unlimited agents & seats
Prompt Workbench (unlimited iterations)
Custom model endpoints (Azure, Bedrock)
VPC / private cloud deployment
SSO & SAML
SOC 2 Type II report
Dedicated Slack channel + SLA

Contact sales

Token charges apply at cost — Flint does not mark up model API prices.See full pricing details →

Start saving in 5 minutes.

Free forever for individuals. No credit card required. Change one line of code and you're live.

Terminal

npm install @flintlogic/sdk

Get started — it's free Read the docs

Cut AI costs by up to 73%without touching your code.