Limitations and workarounds
This runs from Tim's own desktop, not a cloud chatbot provider. That makes it more impressive from a "work with what you got" standpoint, but it also means uptime will be uneven: model tests, hardware maintenance, power issues, heat, network outages, and all of the realities of serving a local AI system from consumer hardware in rural Hawaii can temporarily take it offline.
Further, because his GPU's memory is very low, the capability of the supported models have many limitations. But everyone knows that necessity is the mother of invention. Here's a list of some of the issues resulting from Tim's rather constrained environment, and what he did to fix it.
Poor instruction following
Problem: Most compatible models are simply too weak for reliable instruction following.
Solution: Properly fine-tuning a tiny model often results in performance akin to a much bigger model on a single type of task, at the cost of being worse at all others.
High hallucination rate
Problem: Despite running a RAG system, hallucination risk is still relatively high.
Solution: All surfaced citations are clickable so that users can quickly and easily check if its claim is supported by the actual documentation.
Low context limits
Problem: Even with tiny models, context length can easily cause memory to explode.
Solution: Every prompt is a one-off. Normal resume questions are kept tweet-sized in the browser, role-fit checks get a larger but still hard-capped input, and total context size is capped. This means you can't ask a model about a previous question you asked, but since this prevents Tim's computer from crashing, it's a pretty reasonable trade-off.
Model overthinking
Problem: Most tiny reasoning models suffer from overthinking and never return a valid output.
Solution: After searching for and testing many open-source reasoning models, Tim found a pre-existing fine-tune of a model specifically designed to have shorter reasoning traces. This allows it to actually output a valid answer. Usually.
Concurrent visitor traffic
Problem: The backend cannot handle multiple concurrent requests.
Solution: Prompts are rate-limited by IP to prevent spam, the prompt queue is set to 5, active site sessions are capped to 5 at a time, and Deep-mode requests have much longer wait times between requests.
Environmental problems
Problem: Tim lives in an actual jungle, and power/network outages are not uncommon.
Solution: The page you're reading right now. Since setting up a generator and paying for Starlink just to have reliable power and internet was outside the $0 budget, the next best thing was to create this subpage to persuade visitors that these limitations exemplify creative innovation under extensive constraints.
Frontend design
Problem: Tim is not an elite frontend UX designer, but he still wanted to stand out from every other vibe-coded slopsite.
Solution: Plain HTML. The apparent lack of design is 100% intentional. Every element on the site has a purpose, and nothing was included that wasn't intentional. The contrast of having a custom chatbot from 2026 on a website that looks like it's from 2006 is much more visually interesting than purple gradients, glowing text, and rounded containers.