Borrowed Intelligence

From owning physical servers, to renting cloud resources to finally renting intelligence at scale, we are moving up the value chain and also quickly changing our approach to problem solving.

Someone asked me this week how to identify use cases worth solving with AI. The conversation kept ending up in the same place — teams asking “should we use AI here? or let's just use AI for that” and getting stuck on the technical answer. I feel its the wrong question to start with, the question is more about procurement, not engineering. Assuming "the AI" as a large intelligence machine: are we borrowing the right amount of cognition (intelligence as a service) for what we're trying to solve?

Most teams skip that question and end up in one of the following:

Force-fitting simple (or any) logic with AI: paying for intelligence you don't really need
Over-borrowing (paying for cognitive capacity they don't need): Over-paying for the value you will get from AI
Under-borrowing (paying anyway, in errors and human cleanup time): Paying for the errors you have to clean up after AI

The mindset adjustment

When a team uses cloud compute, they know they're renting resources. When you rent an EC2 instance, you know what you are getting in terms of memory, CPU, disk space etc. You know what you are paying for. With AI, the same procurement reality is slightly harder to map — the borrowed thing can feel like a capability, but it is not. It's intelligence you're metering by the prompt. The model can change. The price can change. The terms can change. Your team's actual capability — the thing that compounds — is what they do with the borrowed intelligence.

That reframe matters because it shifts use case identification from a technical question (can the model do this?) to an economic one (is this worth borrowing intelligence for solving this?). Most of the bad AI projects we see started with the first question and never asked the second.

“Just use an agent for that”

In almost every conversation about AI use cases I've had this year includes: “let's just use an agent for that.”. The customer has heard about agents, seen the demos, and reaches for the most powerful primitive as a silver bullet. It's almost always wrong as the starting point.

Agents are the most expensive (in the current primitive offerings) form of borrowed intelligence. Highest cognition per task — multiple model calls, sometimes multiple models, highest variance per outcome — loops can scale, tool selection can drift, context windows fill up, and hallucinations can get amplified without proper guardrails or memory management. Highest surface area to govern — access to external tools, credentials, retry behavior, output verification. Sometimes the problem genuinely needs all of that - like a software engineer that can read, understand, reason, write code, and assess impact on the overall system. But most of the problems are not that complex and can be solved with one structured prompt and a few function calls.

The borrowing decision has two dimensions, and “just use an agent” collapses both into a single decision process ignoring every other aspect of the problem:

Tier-fit — which model, at what reasoning capacity, for what task complexity.
Form-fit — which primitive fits the shape of the work, from a single LLM call up to an autonomous agent. The primitives menu below walks the full set.

We made the longer case for when an agent is actually the right form in When to Build an Agent. The summary: pick the agent form when the work genuinely requires the model to plan, decide which tool to call next, and reason over the results of its own actions. Most use cases don't — and starting at “agent” means starting at the most expensive form of borrowing the menu offers.

The AI Primitives

Once you stop reaching for “agent” by default, the question becomes: what's actually the most cost effective way to solve this problem? - thinking this way will help you zoom into the right solution. Start with the simplest possible primitive and work your way up if needed - that is the essence of form-fit.

By form, cheapest to most expensive:

Single LLM completion. Classify, summarize, extract, generate. One round-trip. Cheapest cognition you can borrow. The right answer for the majority of “I need AI for X” use cases.
Structured output. Same as above, but the model returns typed JSON your downstream code consumes directly. No parsing layer. The right answer for anything that becomes a record.
Function calling. The model picks one of a small set of tools and supplies arguments. One turn, sometimes two. Right for “look it up and answer.”
Retrieval-augmented generation (RAG). Search → ground the model in the retrieved context → answer. Right when the knowledge isn't in the model and won't fit in a single prompt.
Workflow with model calls inside. A deterministic pipeline with one or two model steps. You own the orchestration; the model handles variance only at the points where variance is irreducible. Right for most “process this” or “triage this” use cases.
Agent (autonomous loop). Model plans, decides which tool to call, observes results, iterates. Right when the work genuinely requires deciding what to do next based on what just happened. Most expensive form. Most teams' first reach.

By modality (growing importance in almost every domain):

Frontier models in 2026 are multimodal natively — the same Claude / GPT / Gemini call can take text, image, and audio. The primitive question isn't which model to use; what's the limitation and cost for images vs audio vs video vs text input/output? It's which input/output shape fits the use case.

Text-only. The default and cheapest. Right for most things.
Vision input. Document understanding, layout-aware extraction, screenshot reasoning. Often replaces an OCR-plus-LLM pipeline at a fraction of the engineering cost.
Voice in/out. Transcription with intent capture, voice-driven workflows, call summarization. The right primitive when the existing process is a phone call or a recording.
Image/video generation. Right when the output is the image or video — not when you want a description of one.
Code-specific tooling. Purpose-tuned models that beat general models at lower price tiers for completion, review, refactoring.

The offerings matter because the borrowing decision is form × modality × tier — three aspects, not one. “Just use an agent on Claude Opus 4.8” is a single point on a much larger landscape, and almost never the right one.

Use intelligence when it's more cost-effective to borrow than to build.

What's actually worth borrowing

The task has clear input → output mapping but high variance per instance. Borrowed intelligence handles variance well; deterministic code handles it better at scale. If 90% of your cases are plain heuristics, write the code and let AI handle the 10%. Don't borrow cognition for the 90% just because it's easier to ship a prompt than a rule engine.

The cost of being right vastly exceeds the cost of compute. A model that prevents a $1,000 mistake at $0.50 of token spend is a good borrow. A model that drafts an email at $0.50 of token spend is okey, but not exactly the AI Transformation story people sell.

The work is repetitive enough to justify the setup. You're going to spend real engineering time on prompts, evals, retry logic, output schemas, guardrails. If you're doing all that for 30 invocations a quarter, you're over-investing. If it's 30,000, you're under-investing. The setup cost is the same; the borrow only pays off at volume.

AI augments judgment instead of replacing it. This is the one most teams get wrong. Augmenting — the human still owns the conclusion, the model just expands what they consider. Replacing — you're trusting the model's output as the final answer with no review layer. The augmenting use cases compound. The replacing use cases are where the guardrails break down, and you start seeing the issues that were never expected in the first place.

Over-borrowing vs. under-borrowing

Over-borrowing (intelligence-overspend) is the more visible mistake. It looks like routing every API call through Opus 4.8 or GPT-5.5 by default, because that's the “best” model right now. The result is real money spent on cognition you're not using — a classification task that needs flag-or-don't-flag capability, but you give it full reasoning capacity. Most teams catch this when the bill arrives.

Under-borrowing is the very simple mistake and it's everywhere. It looks like running a complex reasoning task on a small/cheap model because the unit price (input/output tokens) is low, then compensating the failure rate with human review. The token bill stays small. The labor cost shifts elsewhere. But the total cost of ownership is higher.

The clean way to think about both is utilised intelligence: of the cognitive capacity you're paying for, what fraction are you actually consuming? A flag/don't-flag classifier on Opus 4 is consuming maybe 2% of the model's reasoning — you're paying for the other 98% as a tax on default behavior. A multi-step planning task on Haiku 4.5 might be consuming 110% of the model's reasoning — and that extra 10% is showing up as human cleanup time in another team's budget.

The right tier is the one where utilised intelligence sits in a healthy range — not maxed out, not wildly under-spent. Most teams have never measured this even once.

What to do

Tier-match by task complexity, not by default. Build a routing table that maps model tier to use case. Start every new task at the cheapest tier that meets your failure-rate target. Move up only when the data tells you to, never because the bigger model is the newer one.
Track failure rates per tier alongside costs. Cost-per-1k-tokens without failure-rate-per-1k-invocations is half a picture. The composite — cost-per-correct-outcome — is what matters. We made the broader case for output-side measurement in Tokens Are Not the Metric.
Treat the borrow as architectural, and re-tier on a schedule. Pricing tier, latency, rate limits, model availability — all load-bearing for what you can ship. Model price/perf curves move faster than your architecture review cycle. The Pro tier today may be the Lite tier in six months (see Flash Beats Pro), and the architectural cost of being wrong about this compounds (see The Month-Six Bill). Re-tier quarterly. Build the muscle before you need it.

Borrowed intelligence is leverage. We made the broader case for that in 100x Engineers — leverage with judgment compounds; leverage without it just amplifies waste, faster. The procurement question (borrow or not, at what tier, for what) is the upstream version of the deployment question (point it at what).

The team that wins isn't the one borrowing the most intelligence. It's the one that knows when to borrow it, at what tier, in what form, for what — and isn't attached to the answers they started with.