Model Info

The interactive resume can route questions through different local model paths. The goal is not to chase generic leaderboard scores. The useful question is whether a model can answer from Tim's evidence corpus, cite sources cleanly, respect missing evidence, and do it quickly enough for a public site.

These numbers are the latest completed local runs available as of May 28, 2026 HST. Rows are labeled when they come from an older corpus snapshot, so this page should be read as an operational model-selection snapshot, not a final benchmark paper.

The current default Fast model is granite-4.1-3b-tim-resume:latest, a local Granite 4.1 3B fine-tune selected from the V4 backend-raw checkpoint sweep.

100-Question Corpus Coverage Eval

This eval checks broad resume coverage. It asks 100 role-aware questions across investor, recruiter, entrepreneur, builder, and friend perspectives. Most questions are source-grounded; a smaller set checks whether the model can say when evidence is missing instead of inventing facts.

The score has two useful parts. The pass count is strict: the answer has to satisfy the required claims and boundaries. The partial score gives fractional credit for getting some required pieces right, so it is often a better signal when comparing close local models.

Model	Profile	Result	Avg latency	Notes
`granite-4.1-3b-tim-resume:latest`	Fast	67/100, 94.324/100 partial	6.282s	Current default model on corpus `local-08ed82e14185`; fresh May 28 rerun, max latency `14.390s`, 0 citation failures.
`hf.co/ibm-granite/granite-4.1-3b-GGUF:Q4_K_M`	Fast	60/100, 92.199/100 partial	16.670s	Base Granite comparison on corpus `local-08ed82e14185`; max latency `39.344s`, 0 citation failures.
`hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M`	Fast	70/100, 93.759/100 partial	12.653s	Latest saved 100-question run is from corpus `local-77763852aa26`; 0 citation failures.
`qwen3.5:2b`	Fast	66/100, 93.743/100 partial	9.401s	Latest saved 100-question run is from corpus `local-77763852aa26`; 0 citation failures.
`granite4:tiny-h`	Fast	69/100, 94.209/100 partial	9.773s	Legacy tiny comparison model on corpus `local-77763852aa26`; 0 citation failures.
`jackrong-qwen35-fixed:latest`	Thinking	No full current 100-question run; latest strict coverage-style run was 2/25, 20.600/25 partial	51.001s	Older `local-61da9f7ebdc7` 25-question strict run; used for experimental Deep answers and Builder thinking comparison.
`qwen3.5:4b`	Thinking	74/100	15.120s	Older corpus `local-61da9f7ebdc7` comparison run; retained as historical Deep-model context.

25-Question Deep Synthesis Eval

This eval is harder. It asks 25 multi-source questions that need synthesis across several documents, not just one retrieved snippet. Each row has expected claims, required boundaries, forbidden claims, and a frozen evidence pack.

Newer rows use subagent semantic judging on a 10-point scale. Older rows are normalized from earlier manual grades. The table reports semantic quality, answer bucket split, latency, and citation hygiene.

Model	Profile	Result	Avg latency	Notes
`granite-4.1-3b-tim-resume:latest`	Fast	8.8/10; 23 strong, 2 passable, 0 problematic	4.390s	Current default model on corpus `local-08ed82e14185`; 0 errors and 0 citation failures.
`hf.co/ibm-granite/granite-4.1-3b-GGUF:Q4_K_M`	Fast	8.1/10; 20 strong, 5 passable, 0 problematic	9.287s	Base Granite comparison on corpus `local-08ed82e14185`; 0 errors and 0 citation failures.
`jackrong-qwen35-fixed:latest`	Fast	9.11/10; 22 pass, 3 borderline, 0 fail	16.126s	Older six-model synthesis batch on corpus `local-77763852aa26`; high semantic grade but much slower than the default Fast model.
`jackrong-qwen35-fixed:latest`	Thinking	9.00/10; 21 pass, 4 borderline, 0 fail	74.608s	Experimental Deep route; useful for comparison, but too slow for normal public traffic.
`hf.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M`	Fast	8.60/10; 19 pass, 5 borderline, 1 fail	5.239s	Older six-model synthesis batch on corpus `local-77763852aa26`; 0 errors and 0 citation failures.
`granite4:tiny-h`	Fast	7.39/10; 15 pass, 5 borderline, 5 fail	10.503s	Legacy tiny comparison from corpus `local-77763852aa26`; malformed-output and privacy-wording risks were noted.
`qwen3.5:2b`	Thinking	Not in six-model manual grade; no-judge run clean	104.147s	Completed all 25 with 0 errors and 0 citation failures, but the thinking profile was far too slow for the Builder dropdown.
`qwen3.5:4b`	Thinking	1/25, 20.188/25 partial	21.300s	Older strict-scored run, not part of the six-model manual grade. The Builder catalog exposes Qwen comparison models with public thinking disabled.

Current Takeaway

Normal resume Fast is currently pinned to granite-4.1-3b-tim-resume:latest. The Builder dropdown is for comparison, with the Tim Resume Granite fine-tune as the default, base Granite as a reference point, Nemotron and Qwen as small-model comparisons, and jackrong-qwen35-fixed:latest as the only public thinking-capable option. Deep reasoning remains experimental because it increases latency sharply.