
A modular, auditable framework for compiling, randomising, and distributing large-scale examination question papers — built for transparency, security, and adaptability.
No organisation, board, or institution is referenced. This is a neutral conceptual model — free to fork, adapt, and deploy.
At scale, a question paper passes through many hands before it reaches a candidate. Each handoff — printing, transport, distribution — is a potential leak point. Traditional paper-based systems rely on physical secrecy: sealed envelopes, last-minute printing, and courier logistics. These measures are costly, fragile, and hard to audit.
A digital approach introduces new challenges: how do you prevent a question set from being screen-captured, forwarded, or reconstructed from partial information? How do you ensure that even the systems generating the paper do not expose it too early?
The model described here addresses these problems through three principles: late generation, structural randomisation, and modular compilation.
The system separates question compilation (done well in advance) from question paper generation (done one hour before the exam). A large, curated question bank exists — but no actual question paper exists until the latest responsible moment.
A randomisation engine draws from this bank to produce multiple non-overlapping question sets at once. Each set is hashed and registered before distribution. Centres receive their assigned set only at the point of delivery, via a CDN-based distribution layer.
The question bank itself is a living, auditable artifact — built by subject-matter contributors, deduplicated by an AI filter, and version-controlled.
Build a question bank well in advance. Deduplicate with AI. Lock the pool.
Run the randomiser exactly one hour before exam start. No paper exists before that window.
Push hashed, sealed QP sets via CDN to all exam centres simultaneously.
The diagram below maps the full lifecycle — from question compilation through AI filtering, to late-stage randomised generation and CDN-based centre distribution.
↑ All counts and timings shown are illustrative defaults — every parameter is modular.
Each stage in the pipeline addresses a specific problem. The architecture is intentionally separable — phases can be implemented independently or swapped out.
How do you build a large, reliable question pool without duplication?
Subject-matter contributors (parameterised as N setters per domain) each contribute a fixed quota per cycle. In the reference model: 25 contributors from a technical/science domain each submit 10 questions per cycle (e.g. 5 Physics + 5 Chemistry), yielding 250 questions. Another 25 contributors from a life-science domain each submit 10 per cycle (e.g. 5 Botany + 5 Zoology), yielding a further 250. Total raw pool: 500 questions.
All parameters — number of contributors, questions per contributor, subject groupings — are modular variables. Adjust them freely for your context.
With 500 questions from many independent contributors, overlaps are inevitable. How do you clean the pool?
Before any paper is generated, the full question pool is passed through an AI filter that detects semantic duplicates — questions that test the same concept with different wording. When a match is found, the questions are merged or one is removed, and the total count decremented accordingly. This phase runs during the setup window, not at exam time.
The ceiling target (e.g. 500 questions) is a quality threshold. If the filtered pool falls below it, contributors add replacements until the threshold is met. Once met, the pool is locked.
How do you store and version-control a curated question pool?
The locked, deduplicated pool is stored as a structured Book — a collection of question modules, each with metadata (subject, difficulty, topic tags, contributor ID). Each question is an individual, addressable unit. The Book can be version-controlled, audited, and extended for future exam cycles. In a CBT context, each question module renders independently in the exam interface.
The Book format is agnostic — it can be stored as JSON, a database schema, or a CMS. The key property is that each question is individually addressable and carries its own metadata.
How do you ensure no complete question paper exists before the exam window?
Exactly one hour before the exam start time, the generation engine runs. A seeded RNG draws 180 questions from the locked pool (or your configured count) and assembles a Question Paper. This output is hashed — the hash acts as a fingerprint. A Scrambler then checks whether this hash collides with any previously generated set. If it does, a new seed is generated and the process repeats until a unique set is confirmed. The result: QP-Set1, QP-Set2, QP-Set3, QP-Set4 — each unique, each hashed.
The one-hour window is a deliberate constraint. Short enough to drastically reduce leak risk, long enough to handle distribution and fallback. This is a configurable parameter.
How do you deliver different sets to different centres simultaneously, at scale?
A main coordination node holds the generated sets. Sets are pushed to a CDN layer, which routes each sealed package to its assigned exam centre. Each centre receives exactly one QP set — the assignment is determined by the coordination node and logged immutably. The CDN handles the scale; the coordination node handles the auditability.
In a hybrid model, encrypted local copies are pre-seeded at centres ahead of time. The QP selection is generated centrally at T−60, and the decryption key is released via the CDN at T−15, exactly when the centre disables its network. This eliminates last-mile connectivity as a failure point during the session.
Hybrid here describes how the exam is delivered, not the format of the questions. The question paper is rendered on a computer; the candidate answers on a paper OMR sheet. The terminal is a presentation surface — it does not capture, store, or transmit responses.
This split is deliberate. It lowers the interaction burden on candidates who may have little or no prior computer experience, keeps the answer record physical and verifiable, and removes an entire class of digital-input attack surface from the exam itself.
From T−15, the local terminal runs offline. Internet access in the hall is disabled before the QP package is decrypted, so no online aid — search, AI assistance, remote help — is available during the session. Rough sheets are provided at every desk for working. The only inputs the candidate needs are the left and right arrow keys to move between questions. A mouse may be available for quick navigation, but it is never required.
A locked-down screen displays one question at a time, drawn from the locally cached, sealed QP set. No answers are captured here. The screen is a viewing surface only.
All answers are marked on an OMR sheet at the desk. Working is done on the rough sheets provided. The answer record stays physical end-to-end, exactly as in a paper-based exam.
At T−15, the hall is taken offline and the package is decrypted locally. The terminal needs nothing online for the rest of the session. Connectivity returns only after collection.
Computers display questions; they never capture answers. This removes input-based leak vectors and keeps the interface minimal.
Left and right arrow keys are the only required inputs. A mouse is optional, never essential. No generic keyboard is needed.
From T−15, the hall is offline. Once the package is decrypted, the exam runs without any network — and none is allowed in the room.
The local QP package is encrypted at rest. The decryption key arrives via CDN at T−15 — the same moment the hall goes offline — and is never stored at the centre in advance.
A candidate who has never used a computer can complete the exam. Familiarity with OMR carries over from existing paper-based exams.
Every event — key delivery, decryption, session start, OMR collection — is logged with a verifiable timestamp.
The model does not claim to be leak-proof. It claims to be structurally harder to leak than traditional approaches, and auditable when a leak does occur.
Because QP generation happens at T−60, there is no complete question paper to leak before that window. Contributors never see a composed paper — only their own questions.
Every generated set has a unique cryptographic hash. If a paper leaks, the hash identifies which set and — by inference — which centre received it.
Leaking one set does not compromise others. The Scrambler ensures sets are structurally distinct.
Contributors do not know which of their questions will be selected, in what order, or in which set. The pool is large relative to any single paper.
Pre-seeded centre copies are useless without the decryption key. The key is held in the coordination node until T−15, then released to centres as the hall goes offline.
The CDN delivery log records exactly which set reached which centre, when, and with what confirmation. Post-incident forensics become tractable.
This model is a conceptual blueprint — not a finished product. Every parameter in it is a variable. The number of question setters, the size of the pool, the number of generated sets, the timing of the generation trigger, the hash strategy — all of these are configuration choices, not fixed requirements.
If you are building an examination system, you are encouraged to read this model, take what is useful, discard what is not, and publish your adaptations. The goal is a shared vocabulary for thinking about exam security — not a single canonical implementation.
Examination systems are high-stakes infrastructure. This model is shared in the spirit of open technical discourse — to surface ideas, invite critique, and lower the cost of reasoning carefully about a hard problem. It is not a claim of completeness. Take it, improve it, and share what you learn.