FlintCloud
Drop-in OpenAI-compatible API

Cut AI costs by up to 73%
without touching your code.

Flint Cloud routes your LLM calls to the cheapest model that meets quality thresholds, automatically optimises your prompts, and gives you full observability — all behind a single OpenAI-compatible endpoint.

Before → After
const openai = new OpenAI({
  apiKey: process.env
    .OPENAI_API_KEY,
  // ← default endpoint
});
const openai = new OpenAI({
  apiKey: process.env
    .FLINT_API_KEY,
  baseURL: 'https://
    api.flintlogic.com/v1',
});

73%

avg. cost reduction

<1ms

routing overhead

5 min

to production

Routes across all major providers

OpenAIAnthropicGoogleAWS BedrockAzure

Everything you need

One platform. Every AI cost lever.

From routing to prompt engineering to observability — Flint handles the entire AI cost stack so you can focus on building.

Multi-model routing

One API key, five providers. Flint selects the cheapest model that meets your quality bar for every request — automatically.

OpenAIAnthropicGoogleBedrockAzure

Prompt optimisation

Upload a test dataset and let Flint run LLM-as-judge evaluation loops to iteratively improve your system prompts. New versions go live without a redeploy.

LLM-as-judgeAuto-iterate

Real-time savings dashboard

See exactly how much you're saving per request, per agent, and per day. Compare actual spend to what the same workload would cost on GPT-4o.

Security guard

Every prompt scanned for PII, injection attacks, jailbreak attempts, and excessive agency before it leaves your perimeter.

PIIInjectionJailbreak

Prompt version control

Every optimised prompt is stored as an immutable version with its score, metrics, and parent diff. Roll back in one click.

Agent health monitoring

Track success rate, latency p99, and quality score per agent in real time. Get alerted before degraded prompts affect production.

Zero cold starts

Provisioned concurrency on the Pruner Lambda keeps P50 routing overhead under 1ms — even during traffic spikes.

Rate limiting & tier enforcement

Per-organisation token budgets and RPM caps enforced at the gateway layer. No surprise bills from runaway agents.

Auth0 JWT validation

JWKS-backed JWT verification with 5-minute caching. Your team's existing Auth0 tokens work out of the box.

Integration in minutes

How Flint works

01

Point your client at Flint

Change one line — swap `baseURL` to `api.flintlogic.com/v1`. Your existing OpenAI SDK, LangChain, or raw `fetch` calls work immediately.

const openai = new OpenAI({
  apiKey:  process.env.FLINT_API_KEY,
  baseURL: 'https://api.flintlogic.com/v1',
});
02

Flint routes to the cheapest capable model

The Pruner evaluates each request against your org's strategy (quality · cost · balanced) and rewrites `model` to the optimal provider before forwarding.

// Your request
{ model: 'gpt-4o', messages: [...] }

// What Flint forwards
{ model: 'gpt-4o-mini', provider: 'openai' }
// ↑ 20× cheaper, same quality for this task
03

Optimise prompts automatically

Upload a test dataset in the Workbench. Flint runs LLM-as-judge evaluation, mutates low-scoring prompts, and promotes the best version live — no redeploy needed.

POST /observe/optimizer
{
  "orgId":   "org_abc",
  "agentId": "agent_support",
  "dataset": [ ... ],     // test cases
  "targetScore": 90,      // stop when hit
  "strategy": "quality"
}
04

Watch savings accumulate in real time

The dashboard shows per-request, per-agent, and total savings vs. running everything on GPT-4o. Export usage data to your BI tool via the Usage API.

GET /observe/usage/org_abc
    ?from=2026-01-01&to=2026-01-31

{
  "totalCostUsd": 12.45,
  "savingsVsBaseline": 73.2,
  "byModel": { "gpt-4o-mini": {...} }
}

Trusted by fast-moving teams at

VercelStripeLinearNotionFigmaLoom
74% saved
We cut our monthly OpenAI bill from $4,200 to $1,100 in two weeks. The prompt workbench alone paid for the year.
P

Priya S.

CTO, Series A SaaS

61% saved
Drop-in replacement took 3 minutes. The savings dashboard was running before lunch. Now we actually know what our AI stack costs.
M

Marcus L.

Staff Engineer, Growth Platform

68% saved
We had 12 agents with hand-tuned prompts that nobody dared to touch. Flint's versioning gave us the confidence to optimise them properly.
A

Aisha K.

Head of AI, Enterprise Fintech

$2.4M+

Saved for customers

850M+

Tokens routed

99.97%

Gateway uptime

< 1ms

Routing overhead

Transparent pricing

Pay only for what you use

Free forever for individuals. Team and Enterprise plans scale with your usage.

Free

For individuals and side projects.

$0/ mo
  • 2,048 max tokens / request
  • 20 requests / minute
  • gpt-4o-mini · claude-haiku · gemini-flash
  • 1 agent
  • Savings dashboard
  • Community support
Start for free
Most popular

Pro

For teams shipping production AI features.

$39/ mo

Billed annually · $49/mo billed monthly

  • 8,192 max tokens / request
  • 200 requests / minute
  • All Free models + GPT-4o · Claude Sonnet
  • 20 agents
  • Prompt Workbench (5 iterations / run)
  • Agent health monitoring
  • Usage API & cost alerts
  • Team management (up to 10 seats)
  • Email support
Start Pro trial

Enterprise

For organisations with large-scale AI workloads.

Custom
  • 32,768 max tokens / request
  • 2,000 requests / minute
  • All models — o1 · Claude Opus · Gemini 2.5
  • Unlimited agents & seats
  • Prompt Workbench (unlimited iterations)
  • Custom model endpoints (Azure, Bedrock)
  • VPC / private cloud deployment
  • SSO & SAML
  • SOC 2 Type II report
  • Dedicated Slack channel + SLA
Contact sales

Token charges apply at cost — Flint does not mark up model API prices.See full pricing details →

Start saving in 5 minutes.

Free forever for individuals. No credit card required. Change one line of code and you're live.

Terminal
npm install @flintlogic/sdk