Files

T

Jānis Kacēns 34eb47b595 Phase 0: skeleton, config, chi router, /healthz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-11 11:25:00 +03:00

22 KiB

Raw Permalink Blame History

QBank — Execution Plan

A self-hosted quiz app. Import PDF/DOCX documents containing multiple-choice questions, parse them with an LLM, store them, generate randomised tests, take them on phone or laptop, and review wrong answers afterward.

This document is the build plan. Work through phases in order. Each phase has a clear acceptance test — don't move on until it passes.

Locked-in tech decisions

Language: Go (1.22+ for net/http pattern routing, though we'll use chi).
Router: github.com/go-chi/chi/v5.
DB: SQLite via modernc.org/sqlite (pure Go, no CGo — simplifies Docker builds).
Sessions: github.com/alexedwards/scs/v2 with SQLite store.
Passwords: golang.org/x/crypto/bcrypt.
PDF text extraction: github.com/ledongthuc/pdf as the primary; fall back to shelling out to pdftotext (poppler-utils) if quality is poor on real docs.
DOCX text extraction: hand-rolled — DOCX is a zip; unzip in memory, parse word/document.xml, concatenate <w:t> text nodes. ~50 lines, no extra deps beyond archive/zip and encoding/xml.
OpenAI client: github.com/sashabaranov/go-openai. Use the current cost-efficient model that supports JSON schema response_format (verify the current model name in OpenAI docs at build time; e.g. gpt-4o-mini or its successor).
Templates: html/template standard library.
CSS: Tailwind via the standalone CLI (no Node.js needed) — download the binary in the Dockerfile.
Auth model: two named accounts seeded at first run. Each user has their own test history.
Hosting target: self-hosted on a local Portainer instance. Deployed as a Docker container via a stack (docker-compose). A named volume holds qbank.db and uploaded files.

Project layout

qbank/
├── cmd/server/main.go              # entry point
├── internal/
│   ├── config/                     # env + flags
│   ├── db/
│   │   ├── schema.sql
│   │   ├── db.go                   # open, migrate, helpers
│   │   └── repo.go                 # queries
│   ├── models/                     # Question, Answer, Test, etc.
│   ├── parse/
│   │   ├── pdf.go
│   │   ├── docx.go
│   │   └── chunk.go
│   ├── llm/
│   │   └── openai.go               # ExtractQuestions(text) -> []Question
│   ├── auth/                       # login, middleware, session
│   ├── handlers/                   # http handlers, one file per area
│   └── web/
│       ├── templates/*.html
│       └── static/                 # tailwind output, manifest, sw.js, icons
├── data/                           # gitignored: qbank.db, uploads/
├── .env.example
├── Dockerfile
├── docker-compose.yml
├── go.mod
└── README.md

Phase 0 — Skeleton & config

Tasks:

go mod init qbank.
Wire cmd/server/main.go with chi, a /healthz route returning 200, structured logging via log/slog.
Config loader in internal/config: reads OPENAI_API_KEY, SESSION_SECRET, DATA_DIR (default ./data), PORT (default 8080), ADMIN_USERS (comma-separated name:password pairs, seeded on first start only).
.env.example with the above keys.
Makefile with run, build, tailwind, tidy.

Acceptance: go run ./cmd/server boots, curl localhost:8080/healthz returns OK.

Phase 1 — Database & migrations

Schema in internal/db/schema.sql:

CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  name TEXT UNIQUE NOT NULL,
  password_hash TEXT NOT NULL,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE questions (
  id TEXT PRIMARY KEY,            -- sha256(question_text)[:16]
  text TEXT NOT NULL,
  source TEXT,
  tags TEXT,                      -- comma-separated, simple
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE answers (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  question_id TEXT NOT NULL REFERENCES questions(id) ON DELETE CASCADE,
  text TEXT NOT NULL,
  is_correct BOOLEAN NOT NULL,
  position INTEGER NOT NULL        -- canonical order as imported
);

CREATE TABLE tests (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  user_id INTEGER NOT NULL REFERENCES users(id),
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  completed_at DATETIME,
  n_questions INTEGER NOT NULL,
  question_ids TEXT NOT NULL       -- JSON array, in chosen order
);

CREATE TABLE test_answers (
  test_id INTEGER NOT NULL REFERENCES tests(id) ON DELETE CASCADE,
  question_id TEXT NOT NULL REFERENCES questions(id),
  selected_answer_id INTEGER REFERENCES answers(id),
  is_correct BOOLEAN,
  answered_at DATETIME,
  PRIMARY KEY (test_id, question_id)
);

-- Per-user mastery stats. Drives weighted sampling in Phase 6 and the
-- accuracy display in Phase 5. A missing row means "never seen" — treat
-- as (times_seen=0, times_correct=0).
CREATE TABLE user_question_stats (
  user_id INTEGER NOT NULL REFERENCES users(id),
  question_id TEXT NOT NULL REFERENCES questions(id) ON DELETE CASCADE,
  times_seen INTEGER NOT NULL DEFAULT 0,
  times_correct INTEGER NOT NULL DEFAULT 0,
  last_seen_at DATETIME,
  PRIMARY KEY (user_id, question_id)
);

CREATE INDEX idx_test_answers_test ON test_answers(test_id);
CREATE INDEX idx_answers_question ON answers(question_id);
CREATE INDEX idx_stats_user ON user_question_stats(user_id);

Tasks:

db.Open(path) — runs schema on first start (idempotent via IF NOT EXISTS).
Repository functions: CreateUser, GetUserByName, InsertQuestion, GetQuestion, ListQuestions(filter), CountQuestions, CountAnswers, CreateTest, RecordAnswer, FinishTest, GetTest, ListTestsForUser, UpsertStat(userID, questionID, gotItRight bool) (atomically increments times_seen and conditionally times_correct, sets last_seen_at), GetStatsForUser(userID, questionIDs) (returns a map; missing entries mean unseen).
Seed admin users from ADMIN_USERS env var on first start only (skip if users table non-empty).

Acceptance: delete qbank.db, start the server, verify users are seeded; sqlite3 qbank.db ".schema" shows all tables.

Phase 2 — Auth & layout

Login page (POST → set session via scs).
RequireAuth middleware on everything except /login, /healthz, /static/*.
Base template with header (app name, current user, logout, nav: Library / Upload / Take Test / History).
Tailwind setup: download standalone CLI in Dockerfile and make tailwind; web/static/tailwind.css is the build output. Use mobile-first layout — single column under md:, content max-width 2xl on desktop.
Add web/static/manifest.json and web/static/sw.js (minimal — cache shell, network-first for API). Wire them in the base template so "Add to Home Screen" works on iOS and Android.

Acceptance: log in as each seeded user, see your name in the header, log out. On a phone, "Add to Home Screen" produces a standalone-looking icon.

Phase 3 — Document parsing & LLM extraction

internal/parse:

ExtractPDF(r io.Reader) (string, error) using ledongthuc/pdf. If text comes back empty or gibberish (heuristic: <50 chars or <2% alphanumeric ratio), return a sentinel error so the handler can show "scan-based PDF — please convert to text first."
ExtractDOCX(r io.Reader) (string, error) — open as zip, find word/document.xml, walk XML, concatenate <w:t> text content, insert newlines on <w:p> boundaries.
Chunk(text string, maxRunes int) []string — split on double-newlines greedily up to ~10k runes per chunk (well under model context).

internal/llm:

ExtractQuestions(ctx, chunk string) ([]ParsedQuestion, error) calls OpenAI chat completions with response_format set to a JSON schema:

{
  "type": "object",
  "properties": {
    "questions": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["question", "answers"],
        "properties": {
          "question": {"type": "string"},
          "answers": {
            "type": "array",
            "minItems": 2,
            "items": {
              "type": "object",
              "required": ["text", "correct"],
              "properties": {
                "text": {"type": "string"},
                "correct": {"type": "boolean"}
              }
            }
          }
        }
      }
    }
  },
  "required": ["questions"]
}

System prompt: "You extract multiple-choice questions from study material. Return every question found. Exactly one answer per question must be marked correct. If the source doesn't clearly mark a correct answer, omit that question entirely. Do not invent questions not present in the text."

Validate the response: drop any question without exactly one correct: true. Dedupe by question text hash.

Acceptance: feed in a small handcrafted PDF and DOCX with 3–4 known Q&As, the function returns them correctly.

Phase 4 — Upload & import review flow

GET /upload — file input (accepts .pdf,.docx), optional comma-separated tags, optional source override.
POST /upload:
- Save file to data/uploads/<timestamp>_<filename>.
- Extract text → chunk → call LLM per chunk → merge & dedupe → stash candidate questions in the session (or in a temp import_drafts table keyed by a UUID; the temp table is cleaner, do that).
- Redirect to /import/<draft_id>.
GET /import/<draft_id> — editable preview. Each candidate question shows: question text (editable), each answer (editable, with radio for which is correct), tag input, "delete this question" checkbox.
POST /import/<draft_id> — write surviving questions to questions and answers. Delete the draft. Redirect to library with a flash message ("Imported N questions, M skipped").

Files larger than ~5MB or with >100 candidate questions should still work — paginate the review page if needed.

Acceptance: upload a real PDF, edit a couple of answers in the preview, confirm; library count goes up by the right amount.

Phase 5 — Library & stats

GET / (library page):

Total questions, total answers (sum across all questions), per-source breakdown, per-tag breakdown.
Searchable/filterable list of all questions. Each row shows the current user's mastery for that question: "seen 8× · 3 correct (38%)" — pull from user_question_stats. Unseen questions show "—".
Sort options: alphabetical, lowest accuracy first (i.e. "weakest first"), most-seen first.
"Take a test" CTA.

GET /questions/<id> — view & edit a single question; show the same per-user stats prominently. POST /questions/<id> — save edits. POST /questions/<id>/delete — delete (cascades to answers; archive instead by adding a deleted_at column if you'd rather, but hard delete is fine for v1).

Acceptance: counts match SELECT COUNT(*); editing a question persists; deleting a question removes its answers too.

Phase 6 — Take a test

GET /test/new — form:

Number of questions (default 10, capped at total available after filters).
Optional tag filter, optional source filter. Show count of matching questions live.
Sampling mode (radio): "Focus on weak spots" (default, weighted) vs "Mix evenly" (uniform random). Weighted mode biases toward questions the current user gets wrong more often; uniform gives a representative cross-section. See "Weighting algorithm" appendix below.

POST /test/new —

Resolve the candidate pool from filters.
For each candidate, compute the user's weight from user_question_stats (formula in appendix). Uniform mode skips weighting (all weights = 1).
Weighted-sample N candidates without replacement using the A-Res algorithm.
Shuffle the resulting order.
Create tests row with question_ids JSON.
Redirect to /test/<id>/q/1.

GET /test/<id>/q/<n> — show the n-th question. Answers shuffled deterministically per (test_id, question_id) using a seeded RNG so refreshing doesn't re-shuffle. Progress indicator ("Question 3 of 10"). Mobile-friendly: large tap targets for answer choices.

POST /test/<id>/q/<n> — inside one transaction:

Record selection in test_answers (compute is_correct server-side).
Call UpsertStat(userID, questionID, isCorrect) to bump times_seen, conditionally times_correct, and update last_seen_at.
If more questions remain → redirect to next.
Else → mark test complete (completed_at), redirect to /test/<id>/results.

Only the owning user can access their test.

Acceptance:

Start a 5-question test, answer all 5, end on results page.
After several tests, weighted mode demonstrably surfaces previously-wrong questions more often than ones consistently answered correctly — verify by inspecting user_question_stats and re-running test creation a few times.
A mastered question (e.g. seen 10×, correct 10×) still has a non-zero chance of appearing — confirm by running enough tests with a small pool that the floor weight is exercised.

Phase 7 — Results & review

GET /test/<id>/results:

Score: X / N (Y%).
Time taken.
For each question: question text, every answer listed, with markers: ✅ correct answer, ❌ user's choice (if wrong), ✅ both (if user got it right).
Source & tags shown per question.

Acceptance: wrong answers are clearly visible, correct answer always shown, page renders cleanly on a phone.

Phase 8 — History

GET /history:

List of past tests for the current user: date, score, n_questions, link to results.
Aggregate stat: overall % correct across all attempts.
Stretch: "Weak spots" — questions this user has gotten wrong more than once, with link to the question.

Acceptance: taking multiple tests builds up a history; clicking back into a past test shows the full review.

Phase 9 — Containerise & deploy to Portainer

Build:

Dockerfile: multi-stage.
- Stage 1 (golang:1.22-alpine or newer): go mod download, run the Tailwind standalone CLI to produce web/static/tailwind.css, then CGO_ENABLED=0 go build -o /out/qbank ./cmd/server.
- Stage 2 (gcr.io/distroless/static-debian12 if not using pdftotext, otherwise alpine:3.20 with apk add --no-cache poppler-utils). Copy the binary, web/templates/, and web/static/. EXPOSE 8080. Non-root user. ENTRYPOINT ["/qbank"].
.dockerignore: at minimum data/, *.db, .env, .git, node_modules (if any sneak in).

docker-compose.yml (this is what you paste into Portainer's "Stacks → Add stack → Web editor"):

services:
  qbank:
    image: ghcr.io/<you>/qbank:latest   # or build locally and push, see below
    container_name: qbank
    restart: unless-stopped
    ports:
      - "8080:8080"                     # or put it behind your existing reverse proxy
    environment:
      DATA_DIR: /data
      PORT: "8080"
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      SESSION_SECRET: ${SESSION_SECRET}
      ADMIN_USERS: ${ADMIN_USERS}       # e.g. "alice:hunter2,bob:correcthorse"
    volumes:
      - qbank-data:/data
    healthcheck:
      test: ["CMD", "/qbank", "healthcheck"]   # implement a tiny subcommand, or use wget if base image has it
      interval: 30s
      timeout: 5s
      retries: 3

volumes:
  qbank-data:

Set the three env vars in Portainer's stack "Environment variables" section so they don't end up in the compose file. ADMIN_USERS is only read on first start (when the users table is empty) — subsequent restarts ignore it, so it's safe to leave set.

Image options — pick one:

Build on your machine, push to a registry: docker build -t ghcr.io/<you>/qbank:latest . then docker push. Portainer pulls it. Simplest if you already use GHCR/Docker Hub.
Build on the Portainer host directly: in the stack, replace image: with build: { context: ., dockerfile: Dockerfile } and use Portainer's git-based stack deployment pointing at the repo. Portainer pulls the repo and builds. Good for iterating without a registry round-trip.

HTTPS / external access:

If you have an existing reverse proxy on that host (Traefik, Caddy, Nginx Proxy Manager), don't publish 8080 — put qbank on the proxy's network and add labels/config so the proxy terminates TLS and forwards to the container's internal :8080. Mention this in the README so future-you remembers.
If you don't have a reverse proxy yet and want it reachable from her phone outside your LAN, the easiest add-on is Caddy in front, or a Tailscale sidecar so the app is only reachable on your tailnet (private, no public exposure, works on her phone with the Tailscale app installed).

Healthcheck note: the distroless image has no shell, so the healthcheck command must be the binary itself. Implement a healthcheck subcommand in cmd/server/main.go that does a localhost GET against /healthz and exits non-zero on failure. Or, if you switch the base image to alpine, you can use wget --spider -q http://localhost:8080/healthz and skip the subcommand.

Backups: the qbank-data volume holds everything that matters. A simple cron on the host runs docker run --rm -v qbank-data:/data -v /backups:/out alpine tar czf /out/qbank-$(date +%F).tar.gz /data weekly. Document this in the README.

Acceptance: stack comes up in Portainer, container is healthy, you can log in from your laptop and from her phone, restarting the container preserves all questions and history.

Appendix A — Weighting algorithm

Each question has a per-user weight that determines its sampling probability in "Focus on weak spots" mode. Implement in internal/sampling (new package).

Weight formula

For a question with times_seen = s, times_correct = c, and last_seen_at = t:

wrong = s - c

# Laplace-smoothed error rate. Pretending you've seen the question once
# right and once wrong dampens noise from tiny sample sizes — one wrong
# answer doesn't catapult the question to top priority, and one right
# answer doesn't bury it.
error_rate = (wrong + 1) / (s + 2)

# Floor so mastered questions still appear occasionally. 0.15 means a
# perfectly-mastered question shows up at roughly 15% the frequency of a
# brand-new one. Tune to taste.
base = max(0.15, error_rate)

# Recency nudge. Long-unseen questions creep back up regardless of past
# accuracy — this prevents the "staleness death-spiral" where low-weight
# questions never get sampled, never get updated, and stay low forever.
# Caps at 2× after ~30 days; unseen questions (t is NULL) get the full 2×.
days_since = (now - t).days  if t != NULL else 30
recency = 1 + min(days_since / 30.0, 1.0)

weight = base * recency

Unseen questions (no row in user_question_stats) get base = 0.5 (mid-range, not punished and not boosted) and full recency multiplier — they end up at weight ~1.0, which is roughly the same as a question you've gotten wrong half the time.

Knobs worth exposing as constants at the top of the file (don't make them config — just easy to find for tuning):

const (
    FloorWeight       = 0.15  // never drop below this
    RecencyCapDays    = 30.0  // days at which recency multiplier saturates
    RecencyMaxMult    = 2.0   // peak recency multiplier
    UnseenBaseWeight  = 0.5   // base for questions with no stats row
)

Weighted sample without replacement (A-Res)

// SelectWeighted picks n distinct items from candidates using their weights.
// Equivalent to repeated weighted draws without replacement, in one pass.
// Time: O(m log m) where m = len(candidates).
func SelectWeighted(candidates []Candidate, n int, rng *rand.Rand) []Candidate {
    if n >= len(candidates) {
        return candidates
    }
    type keyed struct {
        c   Candidate
        key float64
    }
    keys := make([]keyed, len(candidates))
    for i, c := range candidates {
        u := rng.Float64()
        if u == 0 {
            u = 1e-12 // avoid log(0) / pow weirdness
        }
        // Efraimidis–Spirakis A-Res key: u^(1/weight)
        keys[i] = keyed{c, math.Pow(u, 1.0/c.Weight)}
    }
    sort.Slice(keys, func(i, j int) bool { return keys[i].key > keys[j].key })
    out := make([]Candidate, n)
    for i := 0; i < n; i++ {
        out[i] = keys[i].c
    }
    return out
}

Testing the sampler

Write a deterministic test in internal/sampling/sampling_test.go that:

Builds a fake pool of 100 questions with hand-crafted stats (some mastered, some weak, some unseen).
Runs SelectWeighted 10,000 times with a seeded RNG.
Asserts that high-error-rate questions are sampled at >3× the rate of mastered ones, and that mastered ones are still sampled at least N times (proving the floor works).

Where it plugs in

internal/sampling/weight.go — ComputeWeight(stat *UserQuestionStat, now time.Time) float64. Takes nil for unseen.
internal/sampling/select.go — SelectWeighted above.
internal/handlers/test.go POST /test/new —
1. candidates := repo.ListQuestions(filter).
2. stats := repo.GetStatsForUser(userID, candidateIDs) (map keyed by question_id).
3. For each candidate: cand.Weight = sampling.ComputeWeight(stats[c.ID], time.Now()) — or 1.0 if mode is "uniform".
4. picked := sampling.SelectWeighted(candidates, n, rng).
5. Shuffle picked, persist as the test's question_ids.

Conventions for the build

Errors bubble up; handlers translate to HTTP via a small httpErr helper. Never panic in a handler.
All DB access goes through internal/db/repo.go — no inline SQL in handlers.
Templates use a single layout.html with {{block "content"}}. Pages define content only.
CSRF: scs has built-in support; enable it on POST routes.
File upload limit: 20MB, enforced via http.MaxBytesReader.
All user-supplied text rendered with html/template's default escaping — never use template.HTML on imported content.
Log: one structured line per request with method, path, status, duration, user.
Tests: write table-driven tests for parse/, llm/ (mock the OpenAI client behind an interface), and the question-dedup logic. Handler tests with httptest for the auth flow and the test-taking flow.

Open questions to confirm before Phase 0

Two separate accounts or one shared? (Plan above assumes separate.)
Should the LLM auto-suggest tags during import, or are tags manual-only? (Plan above is manual; auto-tagging is a small addition to the prompt + schema.)
Hard delete or soft delete for questions? (Plan above is hard delete.)
Comfortable with the weighting defaults in Appendix A (floor 0.15, recency cap at 30 days, unseen base 0.5)? These are tuneable later but it's worth a sanity check up front — if you want mastered questions to appear more rarely, raise the floor; more often, lower it.

22 KiB Raw Permalink Blame History Unescape Escape