Per-minute billing is the standard pricing model across the real-time video API industry. Agora, ZegoCloud, Dyte, Daily.co, and most other major providers charge based on the total participant-minutes your platform consumes. The model is straightforward: one user connected to a call for one minute equals one participant-minute. Your monthly bill is the sum of all participant-minutes multiplied by the provider’s rate.
For startups and small teams, Per-minute billing is what enables them to ship video calling in their app without a five-figure infrastructure contract. But pricing models that work at one scale don’t always work at another. This article breaks down how per-minute billing works, what it costs at enterprise scale with real math, what building your own infrastructure costs, and what alternative pricing models exist — so you can make an informed decision for your specific situation. Let’s start with the basics.
What Is Per-Minute Video Pricing in Video Infrastructure?
Per-minute billing charges you based on participant-minutes: the total time each participant spends connected to a video or audio session. If 4 people are on a 30-minute call, that’s 120 participant-minutes billed.
Most providers calculate usage by counting subscribed streams. In a group call where User A watches User B and User C’s video, User A is consuming 2 video streams and is billed for both. Some providers have moved to simpler per-user billing (counting from join to leave regardless of streams), but the core mechanic remains: more users, more minutes, higher bill.
The rates are modest in isolation. Here’s what the major providers charge:
| Provider | Audio Rate | Video HD (720p) | Video Full HD |
|---|---|---|---|
| ZegoCloud | $0.99/1K min | $3.99/1K min | N/A |
| Agora | $0.99/1K min | $3.99/1K min | $8.99/1K min |
| Dyte | $0.001/user/min | $0.004/user/min | N/A |
| Daily.co | $0.00099/user/min | $0.004/user/min | N/A |
These numbers look small on a per-call basis. But they compound at volume. So where exactly does per-minute billing show up in real-world products?
Where Per-Minute Video Pricing Shows Up?
Per-minute billing isn’t specific to one industry. It applies to every product that uses real-time video or voice at scale:
Video KYC: Banks verify customer identity via live video. Calls are short (5–15 minutes) but high-volume — a mid-size bank may process 500–2,000 verifications daily.
Telehealth: Doctors conduct 15–30 minute 1:1 consultations. Hospital networks running hundreds of concurrent consultations consume thousands of participant-minutes per hour.
EdTech: Live classes with 30–500 students, 1–2 hours each. Long sessions and large group sizes make education one of the highest-cost per-minute use cases.
Enterprise Meetings: Team standups, cross-department syncs, leadership reviews, client presentations. Mixed video/audio usage across 5–50 participants per session.
Contact Centers: Video-assisted support and screen-sharing. Individually short (10–20 min), but thousands of sessions daily at high-volume operations.
Social & Live Streaming: Live audio rooms, dating apps, social video. Usage is unpredictable — a viral moment can multiply participant-minutes overnight.
The common thread: per-minute billing creates a direct, linear relationship between product usage and infrastructure cost. Whether that’s a problem depends on your scale. Let’s put actual numbers on it.
The Real Math: What Enterprise Teams Actually Pay Per Month
Below are two realistic enterprise scenarios using the publicly available rates shown above.
Scenario 1: Telehealth Platform with 500 Concurrent Doctors
Configuration: 500 doctors, each conducting 12 consultations per day, average 20 minutes, 26 working days/month. 1:1 video calls (doctor + patient = 2 participants). 720p HD.
Monthly participant-minutes: 500 × 12 × 20 × 2 × 26 = 6,240,000.
| Provider | Video Rate | Audio Rate | Monthly Cost | Annual Cost |
|---|---|---|---|---|
| ZegoCloud | $3.99/1K min | $0.99/1K min | $24,898 | $298,771 |
| Agora (HD) | $3.99/1K min | $0.99/1K min | $24,898 | $298,771 |
| Agora (Full HD) | $8.99/1K min | $0.99/1K min | $56,098 | $673,171 |
| Dyte | $0.004/user/min | $0.001/user/min | $24,960 | $299,520 |
At 720p, roughly $25,000/month across providers. At Full HD on Agora, $56,000/month — over $670,000 annually. That’s video minutes alone, before recording, storage, or any add-ons.
But telehealth is high-concurrency, all-day usage. What about a more common enterprise scenario where meetings are shorter and not everyone has their camera on?
Scenario 2: 3,000-Person Enterprise with Mixed Meeting Culture
Configuration: A 3,000-employee company runs a mix of daily meetings across departments. 22 working days/month. Video at 720p HD where cameras are on.
120 team standups/scrums: 10 people, 20 minutes. 6 have video on, 4 are audio-only.
30 cross-functional project syncs: 15 people, 45 minutes. 8 on video, 7 audio-only.
8 client-facing presentations: 12 people (internal + external), 60 minutes. 8 on video, 4 audio-only.
4 leadership reviews: 20 people, 90 minutes. 6 on video, 14 audio-only.
Daily breakdown: 31,200 video minutes + 26,010 audio minutes = 57,210 participant-minutes/day.
Monthly totals: 686,400 video minutes + 572,220 audio minutes.
| Provider | Video Cost | Audio Cost | Monthly Total | Annual Total |
|---|---|---|---|---|
| ZegoCloud | $2,739 | $566 | $3,305 | $39,663 |
| Agora (HD) | $2,739 | $566 | $3,305 | $39,663 |
| Agora (Full HD) | $6,171 | $566 | $6,737 | $80,847 |
| Dyte | $2,746 | $572 | $3,318 | $39,814 |
At HD, $3,300/month — $40,000 annually. At Full HD, $6,700/month — nearly $81,000/year. And this is a company where most people aren’t even on video. Flip to a video-mandatory culture and these numbers climb significantly. The rates are the same as the telehealth scenario ($3.99/1K video min, $0.99/1K audio min) — it’s the usage volume that drives the difference.
These are base costs. Most providers charge separately for add-ons that are standard in enterprise deployments. Let’s look at what sits on top.
The Costs That Sit on Top of Per-Minute Rates
Cloud recording adds $1.49–$13.49 per 1,000 minutes depending on resolution (Agora’s published rates). TURN server / cloud proxy access starts at $500/month for 200 peak concurrent users. Cloud storage for recorded content is extra. For regulated use cases like vKYC (where every call must be recorded) or telehealth (medical record archiving), recording and storage can add 30–50% on top of the base bill.
AI features like real-time transcription or voice analysis are optional add-ons, typically billed separately — Agora’s speech-to-text runs $16.99/1K minutes. Not every deployment needs them, but factor them in if your use case does.
When you stack recording, TURN, storage, and support on top of the base rate, the all-in cost is typically 1.5x to 2.5x the headline per-minute number. When enterprise CTOs see these numbers, the first instinct is often: “WebRTC is open source. We’ll build our own.” It’s a rational thought. Here’s what the reality looks like.
How Much It Costs to Build Video Infrastructure Yourself
WebRTC is open-source and royalty-free. The technology is free. The engineering to make it production-grade is not. A basic proof-of-concept — two users on a browser call — takes a weekend. A production-grade platform with an SFU, signaling, TURN/STUN, recording, and adaptive bitrate costs $500,000–$2,000,000 to build from scratch. A more modest SDK-based approach (building on mediasoup or LiveKit) runs $50,000–$300,000.
The talent alone is expensive. WebRTC sits at the intersection of networking, codec engineering, browser APIs, and distributed systems. Specialists command $150,000–$250,000/year in North America. A minimum team of 2–3 engineers plus DevOps puts you at $400,000–$750,000/year in salary.
Then there’s the non-commercial effort that doesn’t show up on a spreadsheet. Browser compatibility is a moving target — Chrome, Safari, Firefox, and Edge each handle WebRTC differently, and updates can break things without warning. NAT traversal across corporate firewalls is a deep, ongoing challenge. Congestion control for users on unreliable networks takes months to tune. Operational maintenance runs 15–20 hours/month of monitoring, patching, and firefighting.
Scalability is the part most teams underestimate. Each video session consumes significant CPU, memory, and bandwidth. The system works at 50 concurrent sessions, starts struggling at 200, and needs a partial rewrite to handle 1,000. Horizontal scaling across SFU instances requires load balancing, session migration, and geo-distribution — none of which come for free.
Five-year total cost of ownership: $1.5M–$7M for a from-scratch build, $650K–$3.5M for an SDK-based integration.
Who It Suits
Building your own makes sense when video IS your core product — the thing you’re selling, not a feature inside something else. If you’re a video conferencing company, a telehealth platform where video quality is your competitive moat, or a media company building live streaming, owning the stack gives you control that no third-party can match. You need the team, the budget, and the 12–24 month runway — but if video is what you sell, the investment pays back in product control and margin ownership.
For everyone else — companies where video is a feature inside a larger product — building from scratch is usually a distraction from what actually makes the business money. So what are the other options?
Alternative Pricing Models for Video Infrastructure
Per-minute billing is the most common model, but not the only one. Here are the other pricing structures and who each serves best.
Flat Licensing
You pay a fixed fee — monthly or annually — regardless of participant-minutes consumed. Your cost is a flat line, not a curve. Samvyo uses this model: a single negotiable license with unlimited calls, on-premise deployment, and full white-label capability.
Flat licensing becomes cheaper than per-minute billing once you cross a usage threshold — typically around 100–200 concurrent users. Below that, per-minute is more efficient because you only pay for what you use. Above it, flat licensing wins, and the gap widens with every additional user. It suits enterprises with high, predictable usage and finance teams that need a fixed budget number.
Per-User / Seat-Based Pricing
You pay per named user or host, regardless of usage. Zoom’s business plans work this way. This model is common in UCaaS products rather than embeddable APIs. It suits organizations with uniform usage per user. It becomes inefficient when usage varies widely — you pay the same for a power user and someone who barely logs in.
Bandwidth-Based Pricing
You pay based on data transferred rather than time. LiveKit prices at $0.18/GB after a free tier. This correlates cost with actual server resource consumption. It suits technical teams who can optimize encoding (resolution, bitrate, codec) to control costs directly.
Open Source (Zero License Cost)
Jitsi, LiveKit, and mediasoup are free. Your costs are servers, bandwidth, and engineering. As covered above, the engineering overhead is substantial, but for teams with the capacity, this offers complete cost control and zero vendor lock-in. It’s also a common starting point for companies that later migrate to a commercial platform when maintenance outweighs savings.
Now that you know the models, how do you decide which one fits your situation?
How to Evaluate What’s Right for Your Scale
Not every organization needs to move off per-minute billing. Here’s a framework:
Under 50 concurrent users: Per-minute billing is likely the most efficient model. Free tiers may cover you entirely. Don’t over-optimize.
50–200 concurrent users: Start running the math. Compare your actual monthly spend against flat licensing. You’re approaching the crossover point.
200–500 concurrent users: Per-minute costs are a material budget line item. Evaluate flat licensing, on-prem, or OEM platforms. The savings typically justify switching effort.
500+ concurrent users: If you’re still on per-minute billing, you’re likely overpaying relative to flat licensing alternatives.
Beyond cost, three factors matter. Data sovereignty: regulated industries (banking, healthcare, government) may need on-premise deployment, which not all per-minute providers offer. Predictability: per-minute fluctuates with usage; flat licensing gives finance a fixed number. Scalability economics: does your video cost grow linearly with your product’s growth, or stay flat? That answer shapes whether video is a cost center or a fixed cost that improves margins as you scale.
Frequently Asked Questions
What is per-minute billing in video APIs?
Per-minute billing charges based on total participant-minutes consumed on your platform. One user on a call for one minute equals one participant-minute. Rates typically range from $0.99–$3.99 per 1,000 minutes for audio and $3.99–$8.99 for video. Platforms like Samvyo use an alternative model — flat licensing — where usage volume doesn’t affect cost.
How much does enterprise video infrastructure cost per month?
On per-minute billing, a 3,000-person enterprise running mixed meetings pays $3,300–$6,700/month. A telehealth platform with 500 concurrent doctors pays $25,000–$56,000/month. Recording, TURN, and storage add 1.5x–2.5x on top. Flat-licensed platforms like Samvyo decouple cost from usage, keeping the monthly number fixed regardless of volume.
When does per-minute billing stop making sense?
The crossover point is typically 100–200 concurrent users. Below that, per-minute billing is efficient — you pay only for what you use. Above it, flat licensing models become significantly cheaper. Samvyo’s flat licensing, for example, keeps cost constant whether you run 200 or 2,000 concurrent sessions.
Is it cheaper to build your own video infrastructure?
Building from scratch costs $500K–$2M upfront with WebRTC specialists at $150K–$250K/year. Five-year TCO ranges from $1.5M–$7M. It makes sense when video is your core product. For companies where video is a feature inside a larger product, managed platforms like Samvyo offer enterprise-grade infrastructure without the build-and-maintain burden.
What is flat licensing for video infrastructure?
Flat licensing means you pay a fixed fee for unlimited video usage regardless of participant-minutes. Your cost stays constant as usage grows. Samvyo uses this model — one negotiable license price that includes unlimited calls, on-premise deployment, and white-label capability. It becomes significantly cheaper than per-minute billing at enterprise scale.
What does on-premise video deployment mean?
On-premise deployment means the video platform runs entirely on servers you own and control — data never leaves your network. Samvyo supports full on-premise deployment, which is required by regulators in banking (RBI video KYC), healthcare (HIPAA), and government where sensitive data must stay within the organization’s infrastructure.
Can Zoom or Microsoft Teams be used as embeddable video infrastructure?
Zoom and Teams are end-user communication products, not embeddable infrastructure. Neither supports full on-premise deployment, deep white-labeling, or OEM redistribution. For companies that need to embed video into their own product under their own brand, platforms like Samvyo are purpose-built for that use case — with SDKs, white-label capability, and on-prem deployment.
What’s Next?
If you’re currently on per-minute billing and running 200+ concurrent users, it’s worth running your own numbers through the math above. Calculate your actual monthly participant-minutes, apply the rates, add recording and TURN costs, and see where you land.
If the number surprises you, Samvyo offers a free infrastructure cost comparison where we map your current usage against our flat licensing model — no commitment, just the math. Book a 30-minute walkthrough using this link and bring your usage numbers. Or if you are looking for a detailed demo, book a demo using this link.