Use the LLM Once. Then Never Again.

When you build with Claude, the default move is to put the model in the request path. User clicks a button, server calls the API, response goes back. It feels modern. It also makes every click cost money, take seconds, and produce a subtly different answer each time.

There’s a quieter pattern that works better for a lot of the AI work we’ve shipped: call Claude once during development, bake the answer into a JSON file checked into the repo, and at runtime do nothing but a dictionary lookup. The model isn’t in the path. It’s in the build.

The problem (a worked example)

Compliance assessments require sampling. HITRUST particularly, but SOC 2 and ISO 27001 follow the same shape. An auditor can’t test every user account, every change ticket, every backup, so they test a representative subset and infer the rest. Each control requirement specifies what to sample, and the assessor has to draw items from a defined population: applications, accounts, tickets, devices, and so on.

So there’s a mapping problem at the heart of every assessment. For each sampling requirement in the framework, which population do I pull from?

In HITRUST CSF v11.7.0, there are approximately 88 distinct populations an assessor can sample from. The mapping from a given sampling requirement to its population isn’t obvious. “Verify that user access reviews are performed quarterly” pulls from one population; “Verify that privileged user activity is logged” pulls from a different one. A human can figure out any single mapping in a few seconds. Doing it across every sampling requirement in an assessment is tedious and error-prone. Letting Claude do it is exactly the kind of thing the model is good at.

The default approach we didn’t take

The obvious build: when the assessor exports their test plan, the server walks every sampling requirement, asks Claude “which population catalog entry fits this?”, waits for the answer, and writes it into the export.

That’s one API call per sampling requirement, every export. Across a full assessment, best case: tens of seconds and a few dollars per click. Worst case: one of those calls returns a slightly different population than last time, and now the same assessment exported twice has two different test plans. Auditors do not love that.

The move

We asked Claude once, during development, to map every sampling requirement in the framework to its population. The output is a static JSON file checked into the repo. At application startup the file gets loaded into a dictionary. At export time, the sampling logic looks up the requirement by ID, gets the population, and the sample count is computed by a plain function using the framework’s sampling rules (≥250 → 25 items, 50–249 → 10% rounded up, <50 → min 3). Pure math, no API.

Runtime cost: milliseconds. Recurring API cost: zero.

Why this is better than it looks

A few things click into place once the model is out of the request path:

Determinism. Same assessment in, same test plan out, every time. This matters more in compliance tooling than people give it credit for.
Auditability. The mapping is a JSON file. A domain expert can open it, scan it, diff it against a prior version. Try doing that with “what the model said last Tuesday.”
Reviewability. When we found a bad mapping, we fixed one line in JSON and shipped. No rerun, no retraining, no backfill.
Failure mode shifts left. A wrong mapping is caught once, during the one-time generation, not silently in production exports the customer might not notice for a quarter.
Cost shape changes. You pay model cost in the build phase, not per user, per click, forever.

Where the pattern fits

It works when three things are true:

The input domain is bounded. The framework’s sampling requirements and its roughly 88 populations are both knowable and finite.
The mapping is stable. Framework updates are slow; you regenerate when the framework changes, not per request.
A human can review the output. JSON is the right format precisely because the SME can read it.

When any of those breaks (user-generated input, per-user context, mappings that change per session), the pattern falls apart and you do need the model in the path.

Where it doesn’t fit

The same project has features that do call Claude at runtime, and they’re the right call:

Diagram analysis. Every customer uploads a different architecture diagram. You can’t pre-compute a classification for an image you haven’t seen yet. Slow and expensive, but necessary.
Per-row explain. The assessor wants prose-quality rationale for a specific row in their specific environment. Generated on demand.

The two patterns coexist. The rule of thumb: if the answer is the same for everyone, generate it once. If the answer depends on this specific customer’s inputs, generate it live.

A side effect: cleaner QA

There’s a benefit for HITRUST assessment quality that’s worth naming. Every validated assessment goes through HITRUST QA before certification, and QA reviewers look hard at sampling: did the assessor pull from the right population, and can they show their work?

A deterministic mapping answers both questions before QA asks them. The test plan is reproducible, so the same assessment never produces two different sampling decisions. The mapping is a JSON file, so the rationale for every population choice is inspectable line by line, not buried in an assessor’s memory or a one-off spreadsheet. And a wrong mapping is caught once, during the one-time generation and review, rather than surfacing as a QA finding that sends the assessment back for rework.

For assessor firms running multiple concurrent engagements, that shift matters. QA rework is the expensive kind of delay: it lands late, it pulls senior assessors back onto closed files, and it moves certification dates. Pushing the sampling-decision failure mode to build time, where a domain expert reviews it once, takes a recurring QA risk off the table.

The lesson, compressed

The instinct when building with an LLM is to put it where the user clicks. Sometimes that’s right. Often the better move is to put the LLM where the developer clicks, capture the output, and ship the artifact. The model becomes a build-time tool, not a runtime dependency. Your latency, cost, and audit story all get materially better, and you stop paying for the same answer 10,000 times.

Use Claude where the answer is unknowable in advance. Cache the rest in JSON.

Running HITRUST assessments, or building tooling to support them? Get in touch.

About the author

Gary Isaac is a HITRUST CCSFP and CISA. Fourteen years inside the ONC test lab and HITRUST practice at Drummond Group, where he helped launch the firm’s HITRUST service line. He builds AI-assisted compliance assessment tooling on an open-source cybersecurity assessment platform.