Open Source / Conceptual Modelv0.1 — Conceptual Release

CBT Hybrid Examination Model

View Architecture

A modular, auditable framework for compiling, randomising, and distributing large-scale examination question papers — built for transparency, security, and adaptability.

No organisation, board, or institution is referenced. This is a neutral conceptual model — free to fork, adapt, and deploy.

What makes large-scale examinations hard to secure?

At scale, a question paper passes through many hands before it reaches a candidate. Each handoff — printing, transport, distribution — is a potential leak point. Traditional paper-based systems rely on physical secrecy: sealed envelopes, last-minute printing, and courier logistics. These measures are costly, fragile, and hard to audit.

A digital approach introduces new challenges: how do you prevent a question set from being screen-captured, forwarded, or reconstructed from partial information? How do you ensure that even the systems generating the paper do not expose it too early?

The model described here addresses these problems through three principles: late generation, structural randomisation, and modular compilation.

Compile early. Generate late. Distribute fast.

The system separates question compilation (done well in advance) from question paper generation (done one hour before the exam). A large, curated question bank exists — but no actual question paper exists until the latest responsible moment.

A randomisation engine draws from this bank to produce multiple non-overlapping question sets at once. Each set is hashed and registered before distribution. Centres receive their assigned set only at the point of delivery, via a CDN-based distribution layer.

The question bank itself is a living, auditable artifact — built by subject-matter contributors, deduplicated by an AI filter, and version-controlled.

Compile early

Build a question bank well in advance. Deduplicate with AI. Lock the pool.

Generate late

Run the randomiser exactly one hour before exam start. No paper exists before that window.

Distribute fast

Push hashed, sealed QP sets via CDN to all exam centres simultaneously.

System Overview

The diagram below maps the full lifecycle — from question compilation through AI filtering, to late-stage randomised generation and CDN-based centre distribution.

STAGE 1COMPILESTAGE 2FILTERSTAGE 3BOOKSTAGE 4GENERATESTAGE 5DELIVERSyllabusN Setters · Domain AN Setters · Domain B(5A + 5B) × N≈ 250 questions(5C + 5D) × N≈ 250 questionsRaw Pool250 + 250 = 500 QsAI FilterDetect semantic duplicatesMerge or remove · decrement countceilingmet?No — add more questionsto the poolYes — lock poolQuestion Book▸ Question Module 1▸ Question Module 2 ···Each question is anindividually addressablemoduleRUN AT T−60 — 1 HOUR BEFORE EXAMRNG EngineQP Output180 Qs · #hash → setScramblerhash ∈ prior sets?→ regen with new seedFinal: QP-Set1 · QP-Set2 · QP-Set3 · QP-Set4Main CoordinationCDNCentre 1Centre 2Centre N

↑ All counts and timings shown are illustrative defaults — every parameter is modular.

What each phase solves

Each stage in the pipeline addresses a specific problem. The architecture is intentionally separable — phases can be implemented independently or swapped out.

01
Phase 1

Question Compilation

Problem

How do you build a large, reliable question pool without duplication?

Approach

Subject-matter contributors (parameterised as N setters per domain) each contribute a fixed quota per cycle. In the reference model: 25 contributors from a technical/science domain each submit 10 questions per cycle (e.g. 5 Physics + 5 Chemistry), yielding 250 questions. Another 25 contributors from a life-science domain each submit 10 per cycle (e.g. 5 Botany + 5 Zoology), yielding a further 250. Total raw pool: 500 questions.

All parameters — number of contributors, questions per contributor, subject groupings — are modular variables. Adjust them freely for your context.

N setters per domainQ per setter per subjectSubject groupingsDomain count
02
Phase 2

AI Deduplication Filter

Problem

With 500 questions from many independent contributors, overlaps are inevitable. How do you clean the pool?

Approach

Before any paper is generated, the full question pool is passed through an AI filter that detects semantic duplicates — questions that test the same concept with different wording. When a match is found, the questions are merged or one is removed, and the total count decremented accordingly. This phase runs during the setup window, not at exam time.

The ceiling target (e.g. 500 questions) is a quality threshold. If the filtered pool falls below it, contributors add replacements until the threshold is met. Once met, the pool is locked.

Ceiling target (default: 500)Similarity threshold for mergeReview workflow for edge cases
03
Phase 3

The Question Book

Problem

How do you store and version-control a curated question pool?

Approach

The locked, deduplicated pool is stored as a structured Book — a collection of question modules, each with metadata (subject, difficulty, topic tags, contributor ID). Each question is an individual, addressable unit. The Book can be version-controlled, audited, and extended for future exam cycles. In a CBT context, each question module renders independently in the exam interface.

The Book format is agnostic — it can be stored as JSON, a database schema, or a CMS. The key property is that each question is individually addressable and carries its own metadata.

Question schemaMetadata fieldsStorage backendVersioning strategy
04
Phase 4

Late Generation & Randomisation

Problem

How do you ensure no complete question paper exists before the exam window?

Approach

Exactly one hour before the exam start time, the generation engine runs. A seeded RNG draws 180 questions from the locked pool (or your configured count) and assembles a Question Paper. This output is hashed — the hash acts as a fingerprint. A Scrambler then checks whether this hash collides with any previously generated set. If it does, a new seed is generated and the process repeats until a unique set is confirmed. The result: QP-Set1, QP-Set2, QP-Set3, QP-Set4 — each unique, each hashed.

The one-hour window is a deliberate constraint. Short enough to drastically reduce leak risk, long enough to handle distribution and fallback. This is a configurable parameter.

Questions per paper (default: 180)Number of setsGeneration trigger timeHash collision strategy
05
Phase 5

CDN Distribution

Problem

How do you deliver different sets to different centres simultaneously, at scale?

Approach

A main coordination node holds the generated sets. Sets are pushed to a CDN layer, which routes each sealed package to its assigned exam centre. Each centre receives exactly one QP set — the assignment is determined by the coordination node and logged immutably. The CDN handles the scale; the coordination node handles the auditability.

In a hybrid model, encrypted local copies are pre-seeded at centres ahead of time. The QP selection is generated centrally at T−60, and the decryption key is released via the CDN at T−15, exactly when the centre disables its network. This eliminates last-mile connectivity as a failure point during the session.

CDN providerCentre-to-set assignment logicEncrypted fallback strategyDelivery confirmation protocol

What makes it a hybrid model?

Hybrid here describes how the exam is delivered, not the format of the questions. The question paper is rendered on a computer; the candidate answers on a paper OMR sheet. The terminal is a presentation surface — it does not capture, store, or transmit responses.

This split is deliberate. It lowers the interaction burden on candidates who may have little or no prior computer experience, keeps the answer record physical and verifiable, and removes an entire class of digital-input attack surface from the exam itself.

From T−15, the local terminal runs offline. Internet access in the hall is disabled before the QP package is decrypted, so no online aid — search, AI assistance, remote help — is available during the session. Rough sheets are provided at every desk for working. The only inputs the candidate needs are the left and right arrow keys to move between questions. A mouse may be available for quick navigation, but it is never required.

  1. 01
    View on screen
    Locked-down terminal renders one question at a time.
  2. 02
    Mark on OMR
    Answers are bubbled on paper; working on rough sheets.
  3. 03
    OMR collected
    Paper record is sealed and dispatched after the session.
What the computer does
Render the paper

A locked-down screen displays one question at a time, drawn from the locally cached, sealed QP set. No answers are captured here. The screen is a viewing surface only.

What the candidate does
Answer on OMR

All answers are marked on an OMR sheet at the desk. Working is done on the rough sheets provided. The answer record stays physical end-to-end, exactly as in a paper-based exam.

What the centre does
Disable the network

At T−15, the hall is taken offline and the package is decrypted locally. The terminal needs nothing online for the rest of the session. Connectivity returns only after collection.

  1. T−24h
    Encrypted bundle prepositioned
    Sealed local copy is staged at every centre well before the exam.
  2. T−60
    QP generated centrally
    RNG + Scrambler produce the unique sealed sets at the main coordination node.
  3. T−15
    Disable network · decrypt locally
    The hall goes offline. The decryption key arrives via CDN and unlocks the package on local systems.
  4. T 0
    Session runs fully offline
    Questions render on screen; candidates mark answers on OMR; rough sheets are at each desk.
  5. After
    Reconnect · upload logs
    Network is restored, audit logs are synced, and OMR sheets are dispatched for processing.
Rendering only

Computers display questions; they never capture answers. This removes input-based leak vectors and keeps the interface minimal.

Two-key navigation

Left and right arrow keys are the only required inputs. A mouse is optional, never essential. No generic keyboard is needed.

Offline during session

From T−15, the hall is offline. Once the package is decrypted, the exam runs without any network — and none is allowed in the room.

Sealed until T−15

The local QP package is encrypted at rest. The decryption key arrives via CDN at T−15 — the same moment the hall goes offline — and is never stored at the centre in advance.

Accessible by default

A candidate who has never used a computer can complete the exam. Familiarity with OMR carries over from existing paper-based exams.

Audit trail

Every event — key delivery, decryption, session start, OMR collection — is logged with a verifiable timestamp.

Where does the model reduce leak surface?

The model does not claim to be leak-proof. It claims to be structurally harder to leak than traditional approaches, and auditable when a leak does occur.

01
No complete paper early

Because QP generation happens at T−60, there is no complete question paper to leak before that window. Contributors never see a composed paper — only their own questions.

02
Hashed set fingerprinting

Every generated set has a unique cryptographic hash. If a paper leaks, the hash identifies which set and — by inference — which centre received it.

03
Multiple non-overlapping sets

Leaking one set does not compromise others. The Scrambler ensures sets are structurally distinct.

04
AI filter reduces insider knowledge

Contributors do not know which of their questions will be selected, in what order, or in which set. The pool is large relative to any single paper.

05
Encrypted local cache

Pre-seeded centre copies are useless without the decryption key. The key is held in the coordination node until T−15, then released to centres as the hall goes offline.

06
Immutable distribution log

The CDN delivery log records exactly which set reached which centre, when, and with what confirmation. Post-incident forensics become tractable.

Fork it. Adapt it. Improve it.

This model is a conceptual blueprint — not a finished product. Every parameter in it is a variable. The number of question setters, the size of the pool, the number of generated sets, the timing of the generation trigger, the hash strategy — all of these are configuration choices, not fixed requirements.

If you are building an examination system, you are encouraged to read this model, take what is useful, discard what is not, and publish your adaptations. The goal is a shared vocabulary for thinking about exam security — not a single canonical implementation.

Design Principles
  • Parameters over hard-coded values
  • Auditability over opacity
  • Late generation over early exposure
  • Structural randomisation over manual shuffling
  • Modular components over monolithic systems
MIT License— use freely, adapt freely, share improvements

A starting point, not an endpoint.

Examination systems are high-stakes infrastructure. This model is shared in the spirit of open technical discourse — to surface ideas, invite critique, and lower the cost of reasoning carefully about a hard problem. It is not a claim of completeness. Take it, improve it, and share what you learn.