How I Would Serve Exam Results to 14 Lakh Students at Once Without Breaking a Sweat

|11 min read
Share

Imagine fourteen lakh students, all opening the same URL at the same second. They have been waiting months for this result. The anxiety is real, the traffic spike is guaranteed, and the margin for error is zero. I designed this system knowing that the worst possible outcome is a student seeing someone else's marksheet, and the second worst is the system going down entirely. Every architectural choice I made flows from those two constraints. Scale is a solved problem if you pick the right abstraction. The real question is what that abstraction should be, and in this case, the answer is deceptively straightforward: treat every exam result as a static file that already exists before the first student ever makes a request. This shifts the entire conversation from "how do I scale my database" to "how do I make sure my CDN is healthy."

The standard approach to building a result portal looks like this: a student submits a roll number, the request hits a load balancer, the load balancer routes it to an application server, the application server queries a database, the database returns a row, and the server formats and returns a response. That pipeline works fine at ten thousand requests per second. At five lakh requests per second, the database becomes the wall. No amount of connection pooling or read replicas changes the fundamental physics of a relational database under that kind of write-read pressure. I have seen systems built exactly this way collapse within ninety seconds of results going live, with engineers scrambling to add read replicas while students are getting timeout errors. The architecture I am describing here makes that entire failure mode irrelevant by removing the database from the path that students ever touch.

The core idea is pre-generation. A pre-generation pipeline is a process that runs before results go public and converts every student record stored in a database into a static JSON file and a PDF stored in object storage. In this case, that object storage is Amazon S3, which is a managed file storage service that can serve files at effectively unlimited scale. For fourteen lakh students, the JSON files total about 2.8 gigabytes and the PDFs total about 168 gigabytes. Those numbers sound large but S3 handles them without any configuration changes. Once these files exist in S3, the database has already done its job. It contributed its data to the files. Its work is complete. What remains is purely a file-serving problem, and file-serving is something content delivery networks were purpose-built to handle.

A content delivery network, or CDN, is a globally distributed network of servers called edge nodes. Each edge node is a server located physically close to groups of users. When a student in Pune requests their result, the CDN serves that file from an edge node in Mumbai instead of routing the request all the way to a data center. This reduces latency, which is the time a packet takes to travel from the user to the server and back. It also means that the origin server, which is the server where the file actually lives, receives only a fraction of the total requests. The CDN caches the file after the first fetch and serves it from memory for every subsequent request. At the scale I am designing for, the CDN absorbs ninety-five to ninety-nine percent of all traffic. The backend infrastructure barely registers the load.

I use three CDN providers in this design: Akamai as the primary, Cloudflare as the secondary, and Fastly as the backup. Running multiple CDN providers in parallel is called a multi-CDN strategy. DNS-based traffic steering, handled through AWS Route 53 or NS1, directs requests to the appropriate CDN based on health checks. If Akamai's health checks fail, Route 53 automatically redirects traffic to Cloudflare within seconds. If Cloudflare also fails, traffic moves to Fastly. Each CDN independently serves from S3 as its origin. This removes the single point of failure that a single-CDN architecture introduces. A single CDN provider going down on result day is a real risk. It has happened to production systems at scale. Running three providers means I need all three to fail simultaneously before students notice anything.

The pre-generation pipeline follows a specific timeline relative to result publication. At twelve hours before results go live, the source database, which is an Aurora PostgreSQL instance, enters a data freeze. No further writes are permitted. This guarantees that the data I generate files from is identical to the data students will expect. At eight hours before, I load every result record into DynamoDB. DynamoDB is a fully managed NoSQL database that scales horizontally without any configuration. It serves as the fast-query fallback layer in case a student hits the API directly rather than the CDN. At six hours before, I preload Redis, which is an in-memory data store, with the complete dataset of 2.8 gigabytes. Redis is fast enough to return a result in under one millisecond. At four hours before, the PDF generation job runs, converting every result record into a formatted PDF and uploading it to S3. At two hours before, I optionally warm the CDN cache by prefetching files from S3 to edge nodes, so the first student request hits cache rather than origin. At T=0, I flip a feature flag that makes results visible and purge the CDN cache to force a fresh fetch of the live flag state.

The file structure on S3 follows a predictable pattern: /results//.json for the result data and /results//.pdf for the marksheet. When a student enters their roll number and date of birth on the frontend, the JavaScript running in their browser computes the exact S3 path and requests it through the CDN. The CDN checks its cache. On a hit, it returns the file instantly. On a miss, it fetches from S3, stores the response in cache, and returns it. The date of birth validation happens on the client side for the CDN path and on the server side only if the student ends up hitting the fallback API. This is an important security consideration: the CDN path serves pre-signed URLs for PDFs, meaning each PDF URL is cryptographically signed with an expiry time and is tied to specific access parameters. A student who tries to guess another student's roll number will get a file, but accessing the PDF requires a valid signature that the frontend only generates after verifying the date of birth.

Caching strategy varies by resource type. The root HTML page at "/" gets a 300-second TTL, which means edge nodes serve it from cache for five minutes before checking for updates. Static assets like JavaScript and CSS bundles get a one-year TTL because I use content-addressed filenames that change when the file changes. Result JSON files get a 60-second TTL with stale-while-revalidate behavior. Stale-while-revalidate means the CDN serves the cached version immediately to the user while fetching a fresh copy from origin in the background. This keeps latency low even when the cache is technically expired. PDF files get a 24-hour TTL because they are static and will never change once generated. Together, these TTLs mean the CDN serves almost every request from memory while still refreshing fast enough to reflect any emergency corrections.

The API layer exists as a fallback and for internal operations, but it does not sit in the hot path. I run it on ECS Fargate, which is a serverless container execution environment on AWS. Fargate scales horizontally based on load without me provisioning individual servers. The API accepts requests only from the CDN's IP ranges, meaning students cannot bypass the CDN and hit the API directly. When a request does reach the API layer because of a CDN miss or a validation step, it first checks Redis. If Redis has the result, it returns it immediately. If Redis misses, it queries DynamoDB. If DynamoDB misses, it falls back to Aurora PostgreSQL. This three-tier fallback ensures no student request returns an error as long as at least one storage layer is functional. In practice, the Redis layer serves everything. Its entire dataset fits in memory and has been preloaded before result time.

The Redis cluster runs on ElastiCache Serverless, which is AWS's managed Redis offering that scales capacity automatically. The full 2.8-gigabyte dataset of result records fits comfortably in memory. In-memory storage means retrieval time stays consistently below one millisecond at the ninety-ninth percentile. I preload Redis at T-6 hours by running a batch job that reads every record from DynamoDB and writes it into Redis in pipeline mode. Pipeline mode batches multiple Redis write commands into a single network round trip, making the preload fast enough to complete in under thirty minutes even for fourteen lakh records. DynamoDB runs in on-demand capacity mode, meaning AWS automatically allocates read and write units as needed without me setting a fixed capacity ceiling. This ensures DynamoDB never throttles requests even if Redis has a cold start and pushes unexpected load down the chain.

Security in this system operates at multiple layers simultaneously. TLS 1.3, which is the latest version of the transport layer security protocol that encrypts data in transit, is enforced across every endpoint. Backend services run inside private subnets within a Virtual Private Cloud. A Virtual Private Cloud, or VPC, is an isolated network segment on AWS where resources are invisible to the public internet unless explicitly exposed. The Application Load Balancer, which sits between the CDN and the API layer, uses security group rules that accept connections only from known CDN IP ranges. IAM policies follow least-privilege rules, meaning each service has permission to do exactly what it needs and nothing more. Secrets like database credentials and signing keys live in AWS Secrets Manager rather than environment variables. No personally identifiable information appears in log outputs. PDF access requires a signed URL with an expiry window of fifteen minutes, making it time-limited and tied to a specific request context.

The waiting room is the last line of defense before backend overload, and in this architecture it only activates if the CDN fails. I implement it using Cloudflare Waiting Room, which holds users in a virtual queue and admits them at a configurable rate. I set the initial active user limit to two to three lakh and tune the admission rate dynamically based on observed backend load. Cloudflare Waiting Room works at the edge, meaning it intercepts requests before they reach any origin server and holds overflow users on Cloudflare's infrastructure. The waiting room protects the API layer in the rare scenario where CDN traffic suddenly redirects to origin at scale. Without it, a CDN failure would translate directly into five lakh simultaneous requests hitting Fargate containers, which would cascade into Redis and then DynamoDB. The waiting room acts as a pressure valve that prevents that cascade.

The multi-region strategy uses AWS ap-south-1 in Mumbai as the primary region and ap-south-2 in Hyderabad as the disaster recovery region. All data is replicated to the secondary region continuously. Aurora PostgreSQL uses Global Database, which replicates across regions with a replication lag under one second. DynamoDB uses Global Tables, which provide active-active replication meaning both regions can serve reads and writes simultaneously. Route 53 runs health checks against both regions every ten seconds. If the primary region fails, Route 53 detects the failure within thirty seconds and updates DNS records to point to the secondary region. CDN origins also have regional fallback configured, so the CDN switches origin endpoints without requiring DNS propagation to complete. My target recovery time objective is sixty seconds, meaning students experience at most one minute of degraded service during a full regional failure.

Observability in this system centers on three things: CDN cache hit ratio, latency percentiles, and error rates. Cache hit ratio tells me whether the CDN is doing its job. If that number drops below ninety percent, it means a large portion of requests are hitting origin, which puts pressure on the API layer. I track latency at the fiftieth, ninety-fifth, and ninety-ninth percentiles. The ninety-fifth percentile target is 200 milliseconds and the ninety-ninth percentile target is 500 milliseconds. Anything above those thresholds triggers an alert. Error rates are segmented by status code: 4xx errors indicating client-side issues like invalid roll numbers and 5xx errors indicating server-side failures. I use Datadog for application-level metrics and AWS CloudWatch for infrastructure metrics. CDN logs from both Akamai and Cloudflare stream into a central log aggregation pipeline so I can correlate events across providers in real time.

Failure scenarios map cleanly to the layered architecture. If Akamai goes down, Route 53 moves traffic to Cloudflare within seconds and students see a brief delay at most. If Redis fails entirely, every request falls to DynamoDB, which runs in on-demand mode and absorbs the load. The latency increases from sub-millisecond to single-digit milliseconds but the system keeps responding. If DynamoDB has a regional failure, Aurora PostgreSQL takes over as the source of truth. Aurora is not in the hot path by design, so this path is slow but functional. If the entire Mumbai region fails, Route 53 and CDN origin failover redirect traffic to Hyderabad within sixty seconds. The system is designed so that each failure degrades performance gradually rather than causing a hard outage. Students may notice slower responses in a cascading failure scenario. They will still get their results.

The thing I find most satisfying about this architecture is what result day looks like for the engineering team. The CDN cache hit ratio sits at ninety-seven percent. The API layer processes a few hundred requests per second instead of five lakh. Redis memory usage holds steady at the preloaded 2.8 gigabytes. Aurora's connection count barely moves because it serves no student traffic. The Fargate containers handling the fallback API run at fifteen percent CPU utilization. Every metric is green. Fourteen lakh students get their results in under 200 milliseconds. The system is, in every meaningful sense, bored. That is the design goal. The system should handle the biggest traffic spike of the year the same way it handles a Tuesday afternoon. When you build around static assets and CDN-first delivery, peak load becomes a configuration concern rather than an engineering emergency.