Chapter 2: AI as a System, Not a Model
The Integration That Wasn't Reviewed
A healthcare technology company decided to add AI-powered clinical decision support to their electronic health records platform. The implementation seemed straightforward: connect an LLM to their existing patient data, add some retrieval logic to pull relevant medical history, and expose the capability through their clinician interface.
The architecture review focused on the model. Which provider? What were the data processing terms? Was PHI being sent externally? The team chose a provider with healthcare-specific compliance certifications, implemented the API integration, and deployed behind their existing authentication layer. The model was secure. The vendor was vetted.
Three months after launch, a clinician noticed something odd. The AI was referencing lab results from a patient she hadn't queried. Investigation revealed the problem: the retrieval system, built quickly during a sprint focused on "making the AI smarter," had been configured to pull context from a shared vector database. That database contained embeddings from multiple patients—and the similarity search didn't respect patient boundaries. The AI wasn't accessing the wrong patient's records directly. It was finding semantically similar content across all patients and surfacing it as relevant context.
The breach wasn't in the model. It wasn't in the API. It was in a retrieval component that nobody had thought to include in the security review because it wasn't "the AI part."
This chapter is about why that blind spot exists—and how to see AI systems as the distributed architectures they actually are.
Why Thinking in Systems Matters
When engineers think about AI, they think about the model. This is natural. The model is what produces the outputs. It's what gets the attention in demos, in vendor pitches, in executive presentations. "We're using GPT-4" or "we fine-tuned Llama" are the sentences that define AI projects in most organizations.
But the model is not the system.
An AI system in production is a collection of components: data stores that hold context, retrieval mechanisms that find relevant information, orchestration logic that decides what to do with inputs, integration points that connect to external services, logging pipelines that capture activity, identity systems that authenticate users and authorize actions. The model sits somewhere in the middle—important, but not singular.
What actually happens when someone interacts with an AI system? A request arrives at an API gateway. Authentication is validated. The request is routed to an orchestration layer. That layer might call a retrieval system to gather context. The context and the original request are assembled into a prompt. The prompt goes to the model. The response comes back. The orchestration layer might parse that response, call tools, invoke other services, or store results. The final output is logged, potentially filtered, and returned to the user.
The model touched one step. The system handled a dozen.
Security teams that focus on the model are securing one component of a distributed system. They're checking the brakes while ignoring the steering, the fuel line, and the navigation system. The vehicle might stop when asked—but that doesn't mean it was going the right direction, or that it won't leak fuel along the way.
The Architecture of AI Systems
To secure AI systems, you first have to see them clearly. What follows is a breakdown of the architectural layers that exist in most production AI deployments. Not every system has every layer, but most have more than their builders realize.
The Request Layer
Every AI system starts with input. That input arrives through some interface: an API endpoint, a chat widget, a programmatic integration, an internal service call. This is the request layer.
The request layer is where authentication happens. It's where rate limiting gets enforced. It's where initial input validation occurs. In traditional application security, this layer gets significant attention. In AI systems, it often gets the same attention—and teams assume that's sufficient.
The problem is that the request layer establishes identity, but it doesn't constrain what the rest of the system does with that identity. A user who authenticates as "employee" might trigger a retrieval operation that accesses data intended only for "executive." The request layer approved the request. The downstream layers didn't check whether the approval was sufficient for what they were about to do.
Think of the request layer as a building's front door. It verifies you're allowed to enter the building. It doesn't verify you're allowed to enter every room.
The Orchestration Layer
Modern AI systems rarely send requests directly to models. Instead, an orchestration layer sits between input and inference. This layer decides what happens with a request: Does it need context? Should tools be invoked? Is this a simple query or a multi-step workflow?
Orchestration is where agentic behavior lives. It's where chain-of-thought reasoning gets implemented, where retry logic handles failures, where complex workflows get decomposed into steps. Frameworks like LangChain, Semantic Kernel, and custom-built orchestrators all operate at this layer.
The orchestration layer is also where privilege accumulates invisibly. Each capability the orchestrator can invoke—each tool, each data source, each external service—represents a privilege. The orchestrator typically has access to all of these capabilities simultaneously. It's a single component with the combined permissions of every integration it can call.
In traditional systems, we'd call this a "god service" and flag it as an antipattern. In AI systems, we call it an orchestrator and treat it as normal.
The Retrieval Layer
Most production AI systems don't rely solely on the model's training data. They augment the model's context with retrieved information—documents, database records, conversation history, user profiles. This is Retrieval-Augmented Generation, or RAG, and it's become the default architecture for enterprise AI.
The retrieval layer is where data flows from storage into the model's context window. It typically includes:
- Vector databases that store embeddings and support similarity search
- Document stores that hold the source material those embeddings reference
- Query logic that translates user requests into retrieval operations
- Ranking and filtering that determines what context is actually used
The retrieval layer is the most common source of data exposure in AI systems. Not because it's poorly built, but because it's built without security constraints. Retrieval systems are optimized to find relevant content. "Relevant" is a semantic judgment, not a security boundary. The most semantically relevant document might also be the most sensitive.
When the healthcare company's retrieval system pulled patient data across boundaries, it was doing exactly what it was designed to do: find similar content. The design just didn't include the concept of "similar content that this user is allowed to see."
The Inference Layer
This is where the model lives. A request—now enriched with retrieved context—gets sent to an LLM for processing. The model generates a response.
The inference layer is what most AI security tools focus on. Input scanners look for injection attempts. Output filters check for sensitive content. Guardrails try to prevent policy violations. These controls matter, but they're operating at a single point in a multi-stage pipeline.
The inference layer also has trust implications that teams often miss. If you're using a third-party model API, your prompts—including all that retrieved context—are leaving your environment. If you're running models locally, you need infrastructure capable of inference, which often means GPUs with their own security considerations. If you're using fine-tuned models, the weights themselves become an artifact that needs protection.
The model is a component. It has inputs, outputs, and a trust relationship with the rest of the system. Treating it as the entire system is like treating the database as the entire application.
The Tool Layer
Modern AI systems can do things. They call APIs, execute code, query databases, send messages, modify files. This capability—tool use—is what separates a chatbot from an agent.
The tool layer is where AI systems acquire privilege in the external world. Each tool is a capability, and capabilities are privileges. A tool that can read files has read access. A tool that can execute code has execution privilege. A tool that can call external APIs has network access and whatever permissions those APIs grant.
Tool invocations are typically triggered by the model's output. The model decides to use a tool, the orchestrator executes that tool, and the result flows back into the model's context. This creates a control flow where the model—often running on external infrastructure, trained on unknown data, behaving in ways that are difficult to predict—is making decisions about privilege exercise.
The tool layer is where "AI safety" in the abstract becomes "access control" in the concrete. The question isn't whether the model is aligned. The question is whether the model's decisions about tool use are constrained by the same authorization policies that would constrain a human making those decisions.
The Memory Layer
AI systems that maintain context across interactions need memory. This might be conversation history, user preferences, accumulated knowledge, or task state. Memory allows AI systems to feel continuous rather than stateless.
Memory is persistence, and persistence is a security concern. What gets stored? For how long? With what access controls? Memory systems often accumulate sensitive information over time—users share personal details, confidential information appears in conversations, patterns of behavior become visible.
The memory layer also creates attack surface for indirect prompt injection. If an attacker can influence what gets stored in memory, they can influence future interactions. Memory becomes a vector for persistent compromise.
Most memory implementations are optimized for utility, not security. They store everything that might be useful, retain it indefinitely, and make it accessible to improve response quality. These are reasonable defaults for user experience. They're dangerous defaults for security.
The Output Layer
Eventually, the AI system produces output that goes somewhere: back to a user, into a downstream system, logged for analysis. The output layer handles this final stage.
Output filtering is common here—scanning responses for PII, checking against content policies, redacting sensitive information. These controls are valuable, but they're also the last line of defense. By the time information reaches the output layer, it's already been retrieved from storage, processed by the model, and potentially acted upon by tools.
The output layer also includes logging, and logging AI systems is harder than logging traditional applications. Prompts can be huge. Responses can be nondeterministic. Context windows contain data from multiple sources. What you log affects your ability to investigate incidents, but also creates its own data protection challenges.
Control Planes and Data Planes
A useful mental model for AI systems comes from network architecture: the distinction between control planes and data planes.
The data plane is where the actual work happens. In networking, it's where packets flow. In AI systems, it's where data moves: user inputs, retrieved context, model outputs, tool results. The data plane is high-volume, performance-sensitive, and where most activity occurs.
The control plane is where decisions are made about how work happens. In networking, it's where routing tables are configured. In AI systems, it's where orchestration logic runs, where retrieval queries are constructed, where tool invocations are decided. The control plane is lower-volume, but higher-impact.
Security in traditional systems often distinguishes between data plane attacks (intercepting traffic, manipulating packets) and control plane attacks (poisoning routing tables, hijacking configuration). The same distinction applies to AI systems.
Data plane risks in AI:
- Sensitive information flowing through the system
- Outputs containing data that shouldn't be exposed
- Logging capturing content that creates liability
- Data exfiltration through normal-seeming interactions
Control plane risks in AI:
- Prompt injection changing system behavior
- Retrieval queries being manipulated to access unauthorized data
- Tool selection being influenced by adversarial inputs
- Orchestration logic being bypassed or corrupted
Most AI security tools operate at the data plane. They scan inputs and outputs, filter content, detect sensitive information. Fewer tools address control plane risks—the ways that system behavior itself can be manipulated.
An attacker who controls the control plane doesn't need to exfiltrate data directly. They can change how the system behaves, what it retrieves, what tools it uses. The data plane security remains intact while the control plane is compromised.
Trust Boundaries in AI Systems
A trust boundary is a point in a system where the level of trust changes. In traditional application security, trust boundaries exist between users and servers, between internal services and external APIs, between privileged and unprivileged processes.
AI systems have trust boundaries, but they're often invisible because the system is treated as monolithic.
The User-System Boundary
Users interact with AI systems through interfaces. They're not trusted—their inputs are validated, their access is controlled, their actions are logged. This boundary is usually well-understood.
But consider: when a user's request triggers retrieval, what trust level does the retrieval operation run at? Usually, it runs with the system's credentials, not the user's. The user triggered the operation, but the operation doesn't inherit the user's access constraints. This is a trust boundary that most systems cross without noticing.
The System-Model Boundary
If you're using a third-party model API, every inference request crosses a trust boundary. Your prompts leave your environment. Your retrieved context—potentially containing sensitive information—goes to the model provider. The model's response comes back, but you have limited visibility into what happened during processing.
Even if you trust the model provider contractually, you're still crossing a boundary. The model was trained on data you didn't control. It behaves in ways you can't fully predict. It might be updated, changed, or degraded without your knowledge.
The Model-Tool Boundary
When a model invokes a tool, it's directing action in the external world. The model is making a decision, and that decision is being executed with real privileges.
This boundary is where trust gets inverted from traditional security models. Normally, we trust the caller and distrust the called. The caller has context, intent, authorization. The called service just executes.
With AI tool use, the "caller" is a model making probabilistic decisions based on statistical patterns. The "called" is a tool with real access to real systems. We're granting execution privilege to something that doesn't have intent in any traditional sense.
The Retrieval-Storage Boundary
Retrieval systems sit between models and data stores. They query storage, rank results, and return context. This boundary determines what information the model sees.
If the retrieval system can access storage with broad permissions, every query inherits those permissions. A user who can only see certain documents might trigger retrieval that returns any document—because the retrieval system's permissions, not the user's, govern what's accessible.
Why Model-Centric Thinking Fails
Model-centric security fails because it misidentifies where risk lives.
The model is the least controllable component. You can't patch an LLM like you patch a server. You can't firewall its internal reasoning. You can't apply least privilege to attention mechanisms. The model is, in a meaningful sense, a black box. You can influence its inputs and filter its outputs, but you can't secure its internals.
The surrounding system is the most controllable component. You can absolutely control what data flows into retrieval. You can constrain what tools are available. You can enforce authorization at every layer. You can log, monitor, and audit system behavior. The system is where security engineering actually works.
Model-centric thinking leads teams to invest heavily in controls they can't fully implement (model-layer security) while underinvesting in controls they absolutely can implement (system-layer security).
It also creates a dangerous gap in threat modeling. If you think of the model as the attack surface, you defend the model. But an attacker who can manipulate what the model sees (retrieval poisoning), what the model can do (tool escalation), or how the model's outputs are used (downstream trust) doesn't need to attack the model at all.
The model is the least important component to secure because it's the hardest to secure. The system is the most important because it's where your controls actually have leverage.
Common Mistakes Organizations Make
Treating the Model Provider as the Security Boundary
Organizations evaluate model providers extensively. They review SOC 2 reports, negotiate data processing agreements, assess compliance certifications. This is appropriate—but it creates false confidence when teams treat provider security as system security.
The model provider is one component. Their security controls protect their infrastructure. They don't protect your data pipelines, your retrieval systems, your tool integrations, or your output handling. A secure provider inside an insecure system doesn't make the system secure.
What this misses: The provider boundary is real, but it's not the most important boundary. The boundaries between your own components—where data flows without authentication, where privileges accumulate without authorization, where trust is assumed without verification—are where breaches happen.
Building Retrieval Without Access Control
Retrieval systems are often built by ML engineers focused on relevance, not security. The goal is to find the most useful context. Access control adds complexity, reduces flexibility, and isn't in the ML playbook.
The result is retrieval systems that can access broad data stores with no concept of authorization. They index everything, embed everything, and return whatever is most semantically similar. This is exactly what you want for utility. It's exactly what you don't want for security.
What this misses: Retrieval is a query mechanism, and queries need authorization. Every retrieval operation should be constrained by the permissions of the user or service that triggered it. "Find similar documents" should mean "find similar documents that this principal is allowed to see."
Granting Tools Without Constraining Their Use
Teams add tools to AI systems because tools make systems useful. The AI can query databases, call APIs, execute code. Each tool expands capability.
But tools are privileges, and privileges should be constrained. Organizations that would never grant a service account admin access to production databases will give an AI orchestrator the ability to execute arbitrary SQL because "it needs to answer questions about data."
What this misses: Tool use should be governed by the same principles as any privilege grant. Least privilege applies. Separation of duties applies. Audit requirements apply. The fact that decisions are made by a model doesn't exempt them from authorization policy.
Logging Prompts Without Logging Context
Many organizations log AI interactions—inputs and outputs. This is better than nothing. But it's insufficient for understanding what actually happened.
If an AI system exposed sensitive information, the logs might show that a prompt was sent and a response was returned. They won't show what the retrieval system found, what context was injected, what tools were considered, what intermediate steps occurred. You can see the question and the answer without any visibility into why.
What this misses: AI systems have intermediate states that matter for security. Retrieval results, tool invocations, orchestration decisions—these are where exposures happen. If your logging doesn't capture these layers, your incident response is flying blind.
Assuming Statelesness When Memory Exists
Engineers often think of LLM interactions as stateless: request in, response out. But production AI systems maintain state. Conversation history persists. User preferences accumulate. Vector databases grow. Memory systems remember.
This state creates attack surface. Information stored in memory can be exfiltrated later. Malicious content injected into memory can influence future interactions. The "stateless" system has persistence that enables persistent compromise.
What this misses: Memory is storage, and storage is a security concern. What goes into memory? Who can access it? How long is it retained? Can it be poisoned? These questions apply to AI memory just as they apply to any data store.
Architectural Questions to Ask
System Boundary Questions
- Can you draw a diagram of every component in your AI system, including retrieval, orchestration, tools, and memory?
- For each component, do you know what data it can access and what actions it can take?
- Are there components that were added "to make the AI smarter" that weren't included in security reviews?
Why these matter: You can't secure a system you can't see. Most AI systems have components that were added iteratively without architectural review.
Data Flow Questions
- Can you trace the path of a user input from receipt to final response?
- At each stage, what additional data gets added to the context?
- Do you know the sensitivity classification of data that enters the model's context window?
Why these matter: Data exposure happens along the flow, not just at endpoints. If you can't trace the flow, you can't identify where sensitive data might leak.
Trust Boundary Questions
- When a user triggers a retrieval operation, whose permissions govern what gets retrieved?
- If your orchestrator calls a tool, what authorization check occurs before execution?
- When context crosses from your system to a model API, do you know what's in that context?
Why these matter: Trust boundaries are where security controls should concentrate. Invisible boundaries mean missing controls.
Privilege Questions
- What is the combined privilege of your orchestration layer if you aggregate all its capabilities?
- Could your AI system, through tool use, take actions that a human in the same role could not?
- If the model's outputs were malicious, what's the worst action your system would execute?
Why these matter: Privilege accumulation in AI systems is subtle. Capabilities granted individually can combine into excessive access.
Failure Mode Questions
- If your retrieval system returned unauthorized data, would anything else in the system prevent exposure?
- If your model started behaving unexpectedly, how would you know?
- If a tool invocation had unintended consequences, do you have rollback capability?
Why these matter: Defense in depth requires knowing your failure modes. Single points of failure become breach points.
Key Takeaways
AI systems are distributed systems, not model wrappers. The model is one component among many. Retrieval, orchestration, tools, memory, and integration points all carry risk. Security attention should be proportional to attack surface, not novelty.
Control planes matter more than data planes. Filtering inputs and outputs is necessary but insufficient. The decisions about what data to retrieve, what tools to invoke, and how to orchestrate workflows—the control plane—determine what the system actually does.
Trust boundaries in AI systems are often invisible. Data crosses from user context to system context to model context to tool context without explicit authorization at each boundary. Making these boundaries visible is the first step to controlling them.
Model-centric security invests in the wrong controls. The model is the component you can't secure directly. The surrounding system is where you have leverage. Securing the system secures the model's environment, which is the best you can actually do.
Treat your AI system like the distributed architecture it is. The same principles that govern microservices, data pipelines, and integration patterns apply. Least privilege, defense in depth, separation of concerns, observability—these aren't AI-specific, but they're AI-necessary.
The model is what makes an AI system feel magical. The architecture is what makes it safe—or dangerous. The next chapter examines the most neglected layer of that architecture: the data that feeds everything else.