← back to the BuildWise brief
BuildWise · System diagrams

Facts in code,
judgment in the model.

● LIVE ON HUGGING FACE · DETERMINISTIC FEASIBILITY + GEOMETRY · LOCAL-EMBEDDING RAG

An AI slicer coach for 3D printing. You state your printer, your material, and what you're trying to make; BuildWise returns hardware-aware settings with an honest reason on every row. The spine is one rule: everything that can be made deterministic — specs, physical feasibility, geometry, retrieval — is computed in code and handed to the model as authoritative context. Claude is reserved for the one thing it's actually good at: reasoning over grounded facts and communicating them well. These five diagrams trace exactly where that line falls.

Blue — deterministic, computed in code
Amber — Claude · the judgment
Coral — refused in code (the gate says no)
01 / 05 — THE CORE LOOP

Many concerns funnel into one grounded prompt, and one model call.

What you provide on the left is turned into facts by deterministic code — printer and material specs, a feasibility verdict, the measured geometry of your part, and the engineering principles your goal depends on. All of it converges into a single assembled prompt — the narrow waist — which is handed to Claude for one low-temperature, cached call. The model reasons and writes the settings table; it never sources the facts itself.

You provide Deterministic — computed in code Printermake · model Materialfilament type STL · optionalthe actual part Intentstrong · fast · hot car Factstyped repositories · specs Feasibilityverdict · blockers / warnings Geometrybbox · volume · overhangs RetrievalRAG · local embeddings ONE PROMPT Assembled context facts · verdict geometry · principles level · verbosity Claude LOW TEMP · CACHED reasons · communicates Settings table value · why honest tradeoffs Blocker → refused in code

Everything left of Claude is deterministic — same inputs, same context, every time. There is exactly one model call, at low temperature, over a cached system prompt. If the feasibility step returns a blocker, the assembled prompt carries a refuse verdict — the model is never handed a physics question it could get wrong.

02 / 05 — THE FEASIBILITY GATE

Whether a material can print on a printer is physics, not opinion — so it's decided in code.

Can polycarbonate print on a stock Ender 3? The hotend caps at 240 °C, PC needs ~260 °C, and PC requires an enclosure the machine doesn't have. A pure-code check compares material requirements against printer limits and returns a verdict — blockers that make it impossible, warnings that make it tricky. The impossible is refused before the model is ever called; Claude only communicates the verdict and pivots.

Material · requirements PC (polycarbonate) print_temp_min 260 °C bed_temp_min 110 °C enclosure_need REQUIRED abrasive no Printer · limits Ender 3 (stock) hotend.max 240 °C bed.max 100 °C chamber.enclosed false nozzle brass check_feasibility() PURE CODE · ZERO LLM compare reqs against limits Blockers — physics says no 260 °C needed > 240 °C hotend 110 °C bed > 100 °C max enclosure required · frame open Warnings — printable, caveats abrasive → fit hardened nozzle passive chamber → big parts warp recommended-enclosure → expect it compatible = false → REFUSED before any model call Claude communicates the verdict, pivots to a printable material — never guesses the physics

This is the single most important reliability decision in the system. The model is good at judgment and terrible at quietly inventing a hotend temperature; so the physics is taken away from it entirely. A blocker ends the request in code with the exact reason; Claude only ever sees a verdict it has to explain, never a number it has to guess.

03 / 05 — TWO KINDS OF KNOWLEDGE

Facts and reasoning are different things, so they live in different places.

A printer's max hotend temperature is a fact — it belongs in sourced, diffable JSON, read through typed repositories, never invented. Why walls beat infill for strength is reasoning — it belongs in a curated Markdown corpus, embedded locally and retrieved by relevance. Both are grounded; neither is left to the model to recall. An unlisted printer still gets sane advice by mapping to the closest of four archetypes.

Facts — diffable, sourced, never invented data/*.json 11 printers · 14 materials sourced specs Typed repositories Pydantic models · validated authoritative at read time Archetype fallback unlisted printer → closest of 4 enclosed-heated · open-allmetal · … Reasoning — vetted principles, not forum myth corpus/*.md curated principles hand-written Local embeddings sentence-transformers · MiniLM runs on the host · no 2nd key Chroma vector store baked at build retrieved by relevance TWO STORES, ONE PROMPT Assembled context facts you can diff + principles you can read

The split is the point. Facts change by editing a JSON row and reading the diff; reasoning changes by adding a Markdown doc to the corpus. Neither path asks the model to remember something it might get wrong — it's handed both, already grounded.

04 / 05 — HYBRID RETRIEVAL

Plain semantic search let the material drown the intent. So retrieval pins the goal first.

A single embedding of "strength + PLA + Ender 3" once surfaced material-selection docs instead of the canonical walls-before-infill principle — the material words drowned the intent. The fix: for each stated optimization axis, pin its canonical principle with a metadata-filtered query, then fill the remaining slots with use-case reality-checks. The principle the goal depends on is always present, never left to chance — and an eval keeps it that way.

What we don't do — pure semantic embed("strength + PLA + Ender 3") material words dominate the vector → surfaces material docs, not walls-before-infill 1 · pin per axis for each goal — strength, speed, surface, accuracy, material: metadata-filtered query · top-2 → the canonical principle is guaranteed 2 · fill with reality-checks remaining slots: use-case + material, semantic, capped at k "lives in a hot car" → PLA-fails-in-heat surfaces alongside GROUNDED PRINCIPLE SET the goal's principle is always present pinned principles + use-case checks, de-duplicated, capped Eval harness — the guard "phone case that needs to flex" → flexible-parts-use-TPU 8 evals · run on every change

The eval is the part that matters most: retrieval relevance is held to a test, not to vibes. When it ran during development it caught a missing corpus doc — a goal with no principle to surface — and that class of regression has been fenced off ever since.

05 / 05 — THE GROUNDED PROMPT

A static brain that caches, a per-request body that grounds — inspectable for free.

The coach's behavior lives in one static system prompt — the engineering rules, the persona, the experience-level depth control, the output structure. Keeping it static means it caches on the API. Everything request-specific — the facts, the verdict, the geometry, the retrieved principles — rides in the user message. And dry-run assembles the entire thing without spending a token, so the prompt can be inspected and tuned for nothing.

Static system prompt engineering rules · persona depth control · output structure anti-repetition voice rule PROMPT CACHE ✓ Per-request user message facts + feasibility verdict geometry report + principles experience level + verbosity changes every request Claude LOW TEMPERATURE reasons over grounded context Settings table value · why · tradeoffs Markdown → rendered in the dark UI Dry-run — assemble, don't call the full prompt, returned for inspection $0 · caught a retrieval bug before any token spent

It's built like production AI, not a demo: a cached static prompt, low temperature for repeatability, and a dry-run mode that lets the entire assembled prompt be read and tuned without a single paid call. The model is the last step, and the cheapest one to get right.