How the Chatbot Works

The chatbot is an interactive resume. It helps people ask questions about Tim's work and then points back to evidence.

When you ask a question, it searches a small library of approved resume documents. Then it writes an answer using those documents and adds source buttons so you can check the proof yourself.

The "Who are you?" choice helps the chatbot decide what kind of answer is most useful. An investor may want proof and risk. A recruiter may want skills and work history. It still uses the same source library.

The "Answer depth" choice controls which local model path answers the question. Fast is the recommended usage mode. Deep uses a small reasoning-capable model path, will take 5-10x longer, and is still experimental. Recruiter role-fit checks use a separate job-fit prompt and model route.

Backend stack

Fast model family: IBM Granite 4.1 3B
Fast model: granite-4.1-3b-tim-resume:latest
Deep model: jackrong-qwen35-fixed:latest
Role-fit Fast model: hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M
Embedding model: qwen3-embedding:0.6b

Current working architecture: the static site sends questions to a private evidence API. That API retrieves relevant source passages, asks the selected local model to answer with citations, and returns the answer plus source metadata to the browser.

Retrieval uses a local SQLite vector index built from a curated evidence corpus documenting Tim's work experience and completed projects. Embeddings are generated with qwen3-embedding:0.6b using 256-dimensional normalized vectors. For normal resume questions, model_profile selects either granite-4.1-3b-tim-resume:latest for the Fast path or jackrong-qwen35-fixed:latest for the experimental Deep path. The Developer/Builder model selector exposes a curated comparison catalog, with the Tim Resume Granite fine-tune as the default Granite option. Recruiter role-fit checks use the role-fit prompt and currently expose the Fast route through hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M; a Jackrong Deep role-fit route exists backend-side but remains hidden in the public UI while testing.

The Tim Resume fine-tune is aimed at answer semantics, citation behavior, and formatting discipline. The facts live in the RAG corpus, so new source docs can be ingested without retraining the model for every content update.

The thinking profile is designed for a larger synthesis pass: more retrieved candidates, expanded source packing, and a larger context budget when the local machine can handle it. In Deep mode, the site can stream the model's raw thinking output in a capped trace box with a disclaimer; the cited answer remains the actual output.

The backend returns an answer plus source objects, including full cleaned source text when available. The frontend replaces citation IDs with clickable chips and formats the source text in a readable popup.

Current stack

Static frontend: Plain HTML, CSS, and a small JavaScript client
API service: Private Flask and Waitress evidence API
Fast generation model: granite-4.1-3b-tim-resume:latest
Deep generation model: jackrong-qwen35-fixed:latest
Embedding model: qwen3-embedding:0.6b
Role-fit Fast model: hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M
Index store: Local SQLite vector index
Retrieval profiles: Fast keeps a compact source pack. Thinking can request more candidates, grouped source packing, and a larger context window.
Runtime limits: 5 active sessions, 15 second cooldown, 1 active generation, queue depth 5, 8192-token context window for normal Fast and Deep routes, 768 output-token budget for normal Fast answers, 1024 output-token budget for normal Deep answers, 16384-token context window for role-fit checks, and 3072 output-token budget for role-fit checks

Current eval numbers

The 100-question corpus coverage eval checks whether the model can answer across the resume evidence set, cite the right sources, respect missing-evidence boundaries, and follow the role/profile contract. It reports both a strict pass count and a weighted partial score out of 100. The partial score weights source/citation behavior most heavily: cited source titles, citation contract, required mentions, missing-evidence behavior, retrieved source titles, forbidden terms, answer/source presence, role contract, and model-profile contract.

The 25-question deep synthesis eval is a harder answer-quality check. It uses multi-source questions with frozen evidence packs and grades semantic answer quality rather than retrieval quality. The rubric is faithfulness 30, completeness 25, synthesis 20, citation quality 15, and audience fit 10.

The latest current-corpus Granite runs use corpus local-08ed82e14185. Jackrong Qwen with reasoning has not completed the current 100-question corpus suite; the numbers below use the latest saved reasoning-enabled Jackrong runs and mark the older corpus snapshots explicitly.

Model	100q corpus coverage	Avg / max latency	Notes
`hf.co/ibm-granite/granite-4.1-3b-GGUF:Q4_K_M`	`60/100` strict, `92.199/100` partial	`16.670s` / `39.344s`	Original IBM Granite 4.1 3B base model, 0 citation-format failures.
`granite-4.1-3b-tim-resume:latest`	`67/100` strict, `94.324/100` partial	`6.282s` / `14.390s`	Selected Tim Resume fine-tune, fresh May 28 rerun, 0 citation-format failures.
`jackrong-qwen35-fixed:latest` with thinking	No full current 100q run recorded; latest strict coverage-style run was `2/25` strict, `20.600/25` partial	`51.001s` / `94.500s`	Older `local-61da9f7ebdc7` 25-question strict run, 0 citation-format failures.

Model	25q synthesis quality	Avg / max latency	Notes
`hf.co/ibm-granite/granite-4.1-3b-GGUF:Q4_K_M`	`8.1/10`; 20 strong, 5 passable, 0 problematic	`9.287s` / `15.704s`	Original IBM Granite 4.1 3B base model on `local-08ed82e14185`, 0 errors and 0 citation failures.
`granite-4.1-3b-tim-resume:latest`	`8.8/10`; 23 strong, 2 passable, 0 problematic	`4.390s` / `6.516s`	Selected Tim Resume fine-tune on `local-08ed82e14185`, 0 errors and 0 citation failures.
`jackrong-qwen35-fixed:latest` with thinking	`9.00/10`; 21 pass, 4 borderline, 0 fail	`74.608s` / `218.625s`	Older six-model synthesis batch on `local-77763852aa26`, 0 errors and 0 citation failures.

Stats for Nerds