Estimating Internet Traffic Demand with the CASUAL Model
How the CASUAL model combines users, services, and access technologies into a three-axis framework for estimating bandwidth demand — applied to ten South American countries using publicly available data.
The problem: how much bandwidth does a country need?
Internet Service Providers, governments, and network operators share a fundamental question: given the number of users, the services they consume, and the access technologies available, how much network capacity is required?
Traditional traffic models like Fractional Brownian Motion (FBM) approach this from a purely traffic-centric perspective — they analyze packet traces and extrapolate. But they ignore a critical dimension: demand. They don’t account for the fact that a user on a 2 Mbps ADSL connection generates fundamentally different traffic than one on a 100 Mbps fiber link, even when visiting the same website.
This is where the CASUAL model offers a different lens.
What is CASUAL?
CASUAL stands for Cube of Access, Service, and User Allocation Free (Cubo de Acceso, Servicios y Usuarios de Asignación Libre). It was originally developed as a doctoral thesis at the University of Cantabria (Spain) and presented at WSAES in 2006. The original implementation was validated using real traffic traces captured during the 1998 FIFA World Cup in France — a massive, well-documented traffic event that provided the empirical data needed to calibrate the model’s parameters.
The core idea is that internet traffic is the result of the interaction of three independent axes:
- Users axis — who is consuming the service (residential, small business, enterprise)
- Services axis — what service is being consumed (Web, VoIP, P2P, streaming, FTP)
- Access axis — how the user connects (ADSL, fiber, cable modem, satellite)
Each combination of user type, service, and access technology produces a different traffic profile. The model treats these as a three-dimensional space where each “cube” represents a specific scenario with its own parameters.
Why three axes matter
Most traffic models flatten everything into a single stream of packets. CASUAL preserves the structure of demand by keeping the axes separate. This means you can:
- Predict capacity for user growth without changing the service or access assumptions
- Estimate the impact of a new access technology (e.g., fiber rollout) without recalculating the entire model
- Analyze service-specific bandwidth — how much capacity does web browsing alone require vs. video streaming?
Each axis can use its own prediction model independently. The user axis could use a Bass diffusion model for adoption forecasting. The access axis could use Markov transition matrices to project how subscribers migrate from ADSL to fiber over time.
The On-Off multilevel foundation
Under the hood, CASUAL builds on On-Off traffic models, which have been used since the 1980s to describe the bursty nature of network traffic. A single traffic source alternates between two states:
- On — actively transmitting data (downloading a page, streaming a video)
- Off — idle, waiting (user reading a page, time between requests)
The probability of being in each state is derived from the mean duration of On and Off periods. This simple binary model, when superimposed across many independent sources, produces traffic with self-similar characteristics — the same bursty patterns appear whether you observe the traffic at millisecond, second, or minute timescales.
CASUAL uses three nested On-Off levels, following ITU recommendations:
- Connection level — models whether the user is connected to the internet (timescale: minutes to hours)
- Session level — models the download and viewing of individual pages (timescale: seconds to minutes)
- Burst level — models the download of individual objects within a page (timescale: milliseconds to seconds)
Each level has its own On-Off probabilities derived from measurable parameters.
From theory to measurable parameters
One of CASUAL’s practical strengths is that its parameters can be estimated from publicly available data — no expensive traffic measurement equipment required.
Connection level
- Connections per hour — how many times a user connects per day
- Connection duration — how long the user stays online
- Access speed — the bandwidth of the user’s connection
The connection probability is calculated as the ratio of average hours a user is online per month to the total hours in a month.
Session level
- Page transfer time — how long it takes to download a requested page
- Thinking time — the time the user spends reading before requesting the next page
These can be measured using free web analysis tools that report download time, page size, number of objects, and latency for any public website.
Burst level
- Object download time — how long each individual object (image, script, stylesheet) takes to download
- Inter-object idle time — the gap between consecutive object downloads, essentially the network latency
The session and burst probabilities follow negative binomial distributions, which capture the self-similar characteristics of internet traffic.
Applying CASUAL to South America
For my undergraduate thesis in Electronic Engineering at the University of Antioquia (2012), I adapted and implemented the CASUAL model to estimate bandwidth demand across ten South American countries: Colombia, Ecuador, Peru, Chile, Bolivia, Paraguay, Uruguay, Argentina, Brazil, and Venezuela, plus Mexico.
The data came entirely from public sources:
- User statistics: ITU, CIA World Factbook, Internet World Stats, and national telecom ministries
- Website parameters: Alexa (top sites by country, time on site, pages per visit) and Pingdom (download time, page size, number of objects, latency)
- Traffic composition: annual reports from Akamai, Cisco, and TeleGeography on the percentage of total internet traffic attributable to web browsing
The thesis contribution: from traffic traces to web downloads
This is where my thesis diverges from the original work. The original CASUAL model depended on real traffic traces — packet captures from network equipment, like those obtained during the 1998 World Cup. That kind of data requires expensive measurement infrastructure (network taps, specialized probes) and access to ISP-level equipment, making the model impractical for administrative or strategic planning contexts.
My adaptation replaced the dependency on real traffic traces with data obtained from web page downloads. Instead of capturing packets at a network node, I downloaded the most visited websites in each country using free tools like Pingdom (which reports download time, page size, number of objects, and latency) and combined that with user behavior statistics from Alexa (time on site, pages per visit). This reinterpretation of the model’s parameters preserved the mathematical framework — the On-Off probabilities, the negative binomial distributions, the multi-level aggregation — while feeding it with data that anyone can obtain without specialized equipment.
This was possible because the Poisson On-Off process assumes that session and burst durations are identically distributed, allowing us to use aggregate statistics instead of individual traffic traces. The Central Limit Theorem justified this approach: with millions of users, the mean of any distribution provides sufficient information to study the aggregate behavior.
The result: a model that an ISP’s planning department or a government regulator could use with publicly available data, without needing a single packet capture.
Results: capacity by country
The model estimated web service capacity requirements for each country across multiple years (2007-2011). For example, in 2011:
| Country | Web capacity (Gbps) | Total estimated capacity (Gbps) |
|---|---|---|
| Brazil | 264.53 | 1,556 |
| Mexico | 126.72 | 745 |
| Argentina | 109.10 | 642 |
| Colombia | 99.55 | 586 |
| Chile | 290.06 | 1,706 |
Web traffic represented between 17% and 46% of total internet traffic depending on the year, with the percentage decreasing as video streaming and P2P grew. Using these percentages, the model extrapolated from web-only capacity to total bandwidth demand.
Validation against FBM
To contextualize the results, I compared CASUAL estimates with those from a Fractional Brownian Motion model fed with actual packet traces captured via Wireshark. The two models are not directly comparable — CASUAL is a demand model while FBM is a traffic model — but they share a structural similarity: both rely on On-Off state analysis and calculate capacity from mean values.
The comparison showed that both models identified consistent growth trends across countries, with CASUAL generally producing more conservative estimates because it accounts for access speed limitations that constrain actual throughput.
Lessons for modern capacity planning
While this research used 2007-2011 data, the framework remains relevant:
1. Demand ≠ traffic. A traffic model tells you what flows through the network. A demand model tells you what would flow if capacity were sufficient. For planning purposes, demand is what matters.
2. Publicly available data is surprisingly useful. You don’t always need expensive measurement equipment. Web analytics, CDN reports, and telecom statistics can feed a model that produces actionable estimates.
3. Separation of concerns works. By keeping users, services, and access as independent axes, you can update one dimension without rebuilding the entire model. This is the same principle that makes good software architecture maintainable.
4. Self-similarity is real and matters. Internet traffic exhibits the same bursty patterns at every timescale. Models that capture this property (through heavy-tailed distributions and On-Off superposition) produce more realistic estimates than those assuming smooth, Poisson-distributed arrivals.
5. The model scales with the question. Need to estimate capacity for a single website? Use one source. Need country-level bandwidth? Aggregate N sources. Need to project 5 years out? Plug a growth model into the user axis.
From teletraffic to platform engineering
Looking back at this work from a platform engineering perspective, I see the same patterns that apply to modern capacity planning:
-
Know your axes. In cloud infrastructure, the equivalent axes might be: users (tenants), services (APIs, databases, queues), and access (regions, connection types). Understanding demand along each axis independently is more useful than a single aggregate metric.
-
Model before you measure. Just as CASUAL designs the capacity estimation before deploying measurement equipment, modern SRE practice designs observability before deploying features — the same principle behind Risk-Driven Development.
-
Use the data you have. Not every organization has APM tools or distributed tracing. But every organization has access logs, cloud billing data, and user counts. A simple model with available data beats a sophisticated model with no data.
The CASUAL model taught me that capacity planning is fundamentally about understanding the intersection of who uses your system, what they do with it, and how they connect to it. That lesson translates directly to every platform I’ve worked on since.
📄 Download the original thesis (PDF, Spanish) — “Implementación del Modelo CASUAL para la Estimación de la Demanda de Tráfico de Internet en Algunos Países de la Región Suramericana”, Universidad de Antioquia, 2012.