How the chatbot works
The chatbot is an interactive resume. It helps people ask questions about Tim's work and then points back to evidence.
When you ask a question, it searches a small library of approved resume documents. Then it writes an answer using those documents and adds source buttons so you can check the proof yourself.
The "Who are you?" choice helps the chatbot decide what kind of answer is most useful. An investor may want proof and risk. A recruiter may want skills and work history. It still uses the same source library.
The "Answer depth" choice controls which local model path answers the question. Fast is the recommended usage mode. Deep uses a small reasoning-capable model path, will take 5-10x longer, and is still experimental. Recruiter role-fit checks use a separate job-fit prompt and model route.
Backend stack
- Fast model family
- IBM Granite 4.1 3B
- Fast model
granite-4.1-3b-tim-resume:latest- Deep model
jackrong-qwen35-fixed:latest- Role-fit Fast model
hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M- Embedding model
qwen3-embedding:0.6b
Current working architecture: the static site sends questions to a private evidence API. That API retrieves relevant source passages, asks the selected local model to answer with citations, and returns the answer plus source metadata to the browser.
Retrieval uses a local SQLite vector index built from a curated evidence corpus documenting Tim's work experience and completed projects. Embeddings are generated with qwen3-embedding:0.6b using 256-dimensional normalized vectors. For normal resume questions, model_profile selects either granite-4.1-3b-tim-resume:latest for the Fast path or jackrong-qwen35-fixed:latest for the experimental Deep path. The Developer/Builder model selector exposes a curated comparison catalog, with the Tim Resume Granite fine-tune as the default Granite option. Recruiter role-fit checks use the role-fit prompt and currently expose the Fast route through hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M; a Jackrong Deep role-fit route exists backend-side but remains hidden in the public UI while testing.
The Tim Resume fine-tune is aimed at answer semantics, citation behavior, and formatting discipline. The facts live in the RAG corpus, so new source docs can be ingested without retraining the model for every content update.
The thinking profile is designed for a larger synthesis pass: more retrieved candidates, expanded source packing, and a larger context budget when the local machine can handle it. In Deep mode, the site can stream the model's raw thinking output in a capped trace box with a disclaimer; the cited answer remains the actual output.
The backend returns an answer plus source objects, including full cleaned source text when available. The frontend replaces citation IDs with clickable chips and formats the source text in a readable popup.
Current stack
- Static frontend
- Plain HTML, CSS, and a small JavaScript client
- API service
- Private Flask and Waitress evidence API
- Fast generation model
granite-4.1-3b-tim-resume:latest- Deep generation model
jackrong-qwen35-fixed:latest- Embedding model
qwen3-embedding:0.6b- Role-fit Fast model
hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M- Index store
- Local SQLite vector index
- Retrieval profiles
- Fast keeps a compact source pack. Thinking can request more candidates, grouped source packing, and a larger context window.
- Runtime limits
- 5 active sessions, 15 second cooldown, 1 active generation, queue depth 5, 8192-token context window for normal Fast and Deep routes, 768 output-token budget for normal Fast answers, 1024 output-token budget for normal Deep answers, 16384-token context window for role-fit checks, and 3072 output-token budget for role-fit checks
Current eval numbers
The 100-question corpus coverage eval checks whether the model can answer across the resume evidence set, cite the right sources, respect missing-evidence boundaries, and follow the role/profile contract. It reports both a strict pass count and a weighted partial score out of 100. The partial score weights source/citation behavior most heavily: cited source titles, citation contract, required mentions, missing-evidence behavior, retrieved source titles, forbidden terms, answer/source presence, role contract, and model-profile contract.
The 25-question deep synthesis eval is a harder answer-quality check. It uses multi-source questions with frozen evidence packs and grades semantic answer quality rather than retrieval quality. The rubric is faithfulness 30, completeness 25, synthesis 20, citation quality 15, and audience fit 10.
The latest current-corpus Granite runs use corpus local-08ed82e14185. Jackrong Qwen with reasoning has not completed the current 100-question corpus suite; the numbers below use the latest saved reasoning-enabled Jackrong runs and mark the older corpus snapshots explicitly.
| Model | 100q corpus coverage | Avg / max latency | Notes |
|---|---|---|---|
hf.co/ibm-granite/granite-4.1-3b-GGUF:Q4_K_M |
60/100 strict, 92.199/100 partial |
16.670s / 39.344s |
Original IBM Granite 4.1 3B base model, 0 citation-format failures. |
granite-4.1-3b-tim-resume:latest |
67/100 strict, 94.324/100 partial |
6.282s / 14.390s |
Selected Tim Resume fine-tune, fresh May 28 rerun, 0 citation-format failures. |
jackrong-qwen35-fixed:latest with thinking |
No full current 100q run recorded; latest strict coverage-style run was 2/25 strict, 20.600/25 partial |
51.001s / 94.500s |
Older local-61da9f7ebdc7 25-question strict run, 0 citation-format failures. |
| Model | 25q synthesis quality | Avg / max latency | Notes |
|---|---|---|---|
hf.co/ibm-granite/granite-4.1-3b-GGUF:Q4_K_M |
8.1/10; 20 strong, 5 passable, 0 problematic |
9.287s / 15.704s |
Original IBM Granite 4.1 3B base model on local-08ed82e14185, 0 errors and 0 citation failures. |
granite-4.1-3b-tim-resume:latest |
8.8/10; 23 strong, 2 passable, 0 problematic |
4.390s / 6.516s |
Selected Tim Resume fine-tune on local-08ed82e14185, 0 errors and 0 citation failures. |
jackrong-qwen35-fixed:latest with thinking |
9.00/10; 21 pass, 4 borderline, 0 fail |
74.608s / 218.625s |
Older six-model synthesis batch on local-77763852aa26, 0 errors and 0 citation failures. |