Everyone writes about building AI. Almost nobody writes about billing for it. Here's how I set up metered token billing, subscription tiers, and marketplace vendor payouts with Stripe.
I've set up Stripe billing for several AI SaaS clients over the past two years. Every one of them started the same way: the AI product worked, customers loved it, and billing was a spreadsheet.
"We'll figure out billing later" is the most expensive sentence in SaaS. One client was manually tracking token usage in a Google Sheet and sending invoices at the end of the month. By the time they had 40 customers, the founder was spending 8 hours a week on invoicing. That's a full workday lost to something Stripe can automate in a weekend.
Here's how I wire up usage-based billing for AI products. The same patterns apply whether you're billing for LLM tokens, API calls, compute minutes, or any metered resource.
The Billing Model
Most AI products land on one of these structures:
| Model | Example | Stripe implementation |
|---|---|---|
| Pay-per-token | $0.002 per 1K tokens | Metered usage with meter_events |
| Tiered with overage | 100K tokens/month included, $0.003/1K after | Licensed quantity + metered overage |
| Flat tiers | Starter ($29), Pro ($99), Enterprise (custom) | Standard subscriptions with feature flags |
| Marketplace split | Platform takes 15%, vendor gets 85% | Stripe Connect with application fees |
Most of my clients use tiered with overage — a base subscription that includes a token allowance, with metered billing for anything above the cap. It's the model that aligns incentives: customers get predictable costs at normal usage, and you capture revenue from heavy users without awkward "you've hit your limit" walls.
Tracking Token Usage
The foundation of usage-based billing is accurate metering. Every LLM call in your application needs to report how many tokens it consumed.
from stripe import StripeClient
from datetime import datetime
stripe = StripeClient("sk_live_...")
def report_token_usage(
customer_id: str,
input_tokens: int,
output_tokens: int,
model: str,
):
# Report to Stripe's metering API
total_tokens = input_tokens + output_tokens
stripe.billing.meter_events.create(
event_name="ai_token_usage",
payload={
"stripe_customer_id": customer_id,
"value": str(total_tokens),
},
timestamp=int(datetime.now().timestamp()),
)
return total_tokensI call this after every LLM interaction in the application:
from openai import OpenAI
client = OpenAI()
def chat(customer_id: str, messages: list[dict]) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
# Report usage to Stripe
report_token_usage(
customer_id=customer_id,
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
model="gpt-4o-mini",
)
return response.choices[0].message.contentKey details:
- Report total tokens, not cost. Let Stripe handle the pricing math. If you change your per-token price, you update the Stripe product — not your application code.
- Report immediately, not in batches. Stripe's metering API is designed for high-frequency events. Batching introduces lag between usage and billing, which creates customer support headaches ("why was I charged for usage I can't see?").
- Include the model in metadata. Different models have different costs. If you later want to bill GPT-4o at a higher rate than GPT-4o-mini, the data is already there.
Subscription Tiers with Included Allowance
The typical setup: three tiers, each with an included token allowance. Usage above the allowance is billed per-token.
# One-time setup: create the product and prices in Stripe
# Base subscription prices (monthly)
starter_price = stripe.prices.create(
product="prod_ai_platform",
unit_amount=2900, # $29
currency="usd",
recurring={"interval": "month"},
metadata={"tier": "starter", "included_tokens": "100000"},
)
pro_price = stripe.prices.create(
product="prod_ai_platform",
unit_amount=9900, # $99
currency="usd",
recurring={"interval": "month"},
metadata={"tier": "pro", "included_tokens": "500000"},
)
# Overage price (metered, per 1K tokens above allowance)
overage_price = stripe.prices.create(
product="prod_ai_tokens_overage",
currency="usd",
recurring={
"interval": "month",
"usage_type": "metered",
},
billing_scheme="tiered",
tiers_mode="graduated",
tiers=[
{"up_to": "inf", "unit_amount_decimal": "0.3"}, # $0.003 per unit (1K tokens)
],
transform_quantity={"divide_by": 1000, "round": "up"},
)When creating a subscription, attach both the base price and the overage price:
subscription = stripe.subscriptions.create(
customer=customer_id,
items=[
{"price": pro_price.id}, # $99/month base
{"price": overage_price.id}, # metered overage
],
)The application checks usage against the tier's included allowance before reporting overage:
async def maybe_report_overage(customer_id: str, tokens_used: int):
# Get current billing period usage from your database
period_usage = await db.get_period_usage(customer_id)
tier_allowance = await db.get_tier_allowance(customer_id)
total_usage = period_usage + tokens_used
if total_usage > tier_allowance:
overage = total_usage - max(period_usage, tier_allowance)
if overage > 0:
report_token_usage(customer_id, overage, 0, "overage")
await db.increment_usage(customer_id, tokens_used)Stripe Connect for Marketplace Payouts
One of my clients ran a multi-vendor AI marketplace — think multiple AI service providers on one platform, each with their own pricing. The platform takes a cut, vendors get the rest.
Stripe Connect handles this with application fees:
# When a customer pays for an AI service from a vendor
payment_intent = stripe.payment_intents.create(
amount=5000, # $50
currency="usd",
customer=customer_id,
application_fee_amount=750, # platform takes $7.50 (15%)
transfer_data={
"destination": vendor_stripe_account_id,
},
)The vendor receives $42.50 directly in their Stripe account (minus Stripe's processing fee). The platform receives $7.50 as the application fee. No manual payouts, no end-of-month reconciliation.
For subscription-based vendor services, I use the same pattern on recurring payments:
subscription = stripe.subscriptions.create(
customer=customer_id,
items=[{"price": vendor_service_price_id}],
application_fee_percent=15,
transfer_data={
"destination": vendor_stripe_account_id,
},
)Webhook Handling
Stripe webhooks are where billing meets your application. The critical events to handle:
from fastapi import FastAPI, Request
import stripe
app = FastAPI()
@app.post("/webhooks/stripe")
async def stripe_webhook(request: Request):
payload = await request.body()
sig_header = request.headers.get("stripe-signature")
event = stripe.Webhook.construct_event(payload, sig_header, webhook_secret)
match event.type:
case "invoice.paid":
# Reset monthly usage counters
customer_id = event.data.object.customer
await db.reset_period_usage(customer_id)
case "invoice.payment_failed":
# Notify customer, maybe throttle API access
await notify_payment_failed(event.data.object.customer)
case "customer.subscription.updated":
# Tier change — update allowance in your database
sub = event.data.object
new_tier = get_tier_from_subscription(sub)
await db.update_tier(sub.customer, new_tier)
case "customer.subscription.deleted":
# Churn — revoke API access
await db.deactivate_customer(event.data.object.customer)
return {"status": "ok"}The invoice.paid handler is the most important — it's where you reset the monthly usage counter so the customer's included allowance refreshes.
The Customer-Facing Usage Dashboard
Billing without visibility is a trust problem. Customers need to see their usage in near-real-time, not discover it on their invoice.
I build a simple usage dashboard that shows:
- Current period usage vs. included allowance (progress bar)
- Projected end-of-month cost based on current run rate
- Daily usage breakdown (chart)
- Per-model breakdown if multiple LLMs are available
The data comes from your application's usage table, not from Stripe — Stripe's usage records have a delay. Your app should be the source of truth for real-time display, with Stripe as the billing system of record.
Lessons from Multiple Implementations
Start with flat tiers, add metering later. If you have fewer than 50 customers, flat pricing ($29/$99/$299) with generous allowances is simpler to implement and easier for customers to understand. Add metered overage when you have customers consistently hitting their limits.
Never block access hard on limit. Throttle, warn, auto-upgrade — but don't return a 403 when a customer's AI chatbot is mid-conversation with their user. The worst billing experience is one that breaks the product.
Log everything. Every token usage report, every Stripe webhook, every tier change. When a customer disputes a charge ("I didn't use 2M tokens"), you need an audit trail that shows exactly when and where each token was consumed.
Test with Stripe's test clocks. Stripe has a test clocks feature that lets you simulate billing cycles in minutes instead of waiting 30 days. Use it to verify that overage calculations, tier resets, and invoice generation work correctly before going live.
Handle the margin math. If you're reselling OpenAI tokens at $0.003/1K and paying $0.00015/1K for GPT-4o-mini, your margin is healthy. But if a customer switches to GPT-4o at $0.005/1K input and you're still charging $0.003/1K, you're losing money on every call. Build per-model pricing or set your overage rate to cover your most expensive model with margin.
Billing for AI products isn't technically hard — Stripe handles the complexity. What's hard is choosing the right model, setting the right prices, and building the trust layer (usage dashboards, transparent invoices, no surprise charges) that keeps customers paying.
The code above is the skeleton. The product sense — knowing when to throttle vs. hard-block, when to nudge an upgrade vs. eat the overage, when flat pricing is better than metered — that comes from watching real customers interact with real invoices.