Chapter 4: Training and Fine-Tuning: The Forgotten Supply Chain
The Model That Came with Baggage
A mid-sized insurance company wanted to automate claims processing. Building a model from scratch was too expensive—they didn't have the data science team or the compute budget. Instead, they did what everyone told them to do: start with a foundation model, fine-tune it on their domain-specific data, and deploy.
They chose a well-regarded open-weights model from a popular repository. The model had good benchmark scores, an active community, and permissive licensing. The data science team downloaded it, fine-tuned it on eighteen months of claims data, ran validation tests, and deployed it to production. The security team reviewed the deployment infrastructure—network isolation, access controls, inference logging. Everything looked solid.
Three months later, a security researcher published findings about the base model they'd used. The model had been trained on a dataset that included scraped corporate documents, some containing credentials and API keys. Under specific prompting patterns, the model could be induced to generate text that looked remarkably like those credentials—not exact copies, but close enough to suggest the model had memorized patterns from sensitive training data.
The insurance company's fine-tuned model inherited this behavior. Their claims processing system—handling sensitive customer data—was built on a foundation that had been trained on data it should never have seen. The fine-tuning hadn't removed the problem. It had just buried it under a layer of domain-specific capability.
The security review had checked the right things at the wrong layer. Nobody had asked where the base model came from, what it had been trained on, or whether its provenance could be verified.
This chapter is about treating models as software artifacts—with all the supply chain discipline that implies—because a model you didn't train is code you didn't write, running in your environment, shaped by decisions you didn't make.
Why Training Is a Supply Chain Problem
When security teams think about supply chains, they think about code dependencies. They scan for vulnerable libraries, verify package signatures, and monitor for compromised maintainers. Years of painful breaches have taught the industry that code you import is code you trust—and trust requires verification.
Models are the same, but worse.
A software library has source code you can read. You can audit it, scan it for vulnerabilities, understand what it does. A model is a black box by design. You can observe its inputs and outputs, but you cannot meaningfully inspect billions of parameters to understand what behaviors are encoded. A library executes deterministically—given the same inputs, it produces the same outputs. A model's behavior varies with prompts, temperature settings, and contextual factors in ways that defy complete characterization.
And yet organizations treat model acquisition with less rigor than they treat npm packages.
The supply chain for models extends further back than most teams realize. When you fine-tune a foundation model, you're not just using that model—you're inheriting everything that went into it. The base model was trained on data you've never seen. That data was curated, filtered, and processed through pipelines you can't audit. The training infrastructure had security properties you can't verify. The organization that trained it made decisions about what to include and exclude that you can't review.
Fine-tuning doesn't reset this lineage. It extends it. Your fine-tuned model is the base model plus your modifications. Anything problematic in the base model—memorized sensitive data, encoded biases, backdoor triggers—persists unless specifically addressed. And addressing it requires knowing it exists, which requires transparency that most model providers don't offer.
The real problem is that the AI industry has normalized practices that would be unacceptable anywhere else in software. Downloading pretrained models from the internet and deploying them in production would be equivalent to downloading random binaries and running them on your servers. But because models are new, and because the urgency to ship AI is intense, organizations skip the supply chain hygiene they'd apply to anything else.
This chapter treats training and fine-tuning as what they are: the creation of software artifacts that will run in your environment, influence decisions, and process sensitive data. The same supply chain discipline applies—or should apply.
The Model Supply Chain: An Architectural View
To secure training and fine-tuning, you need to understand the supply chain as a system with discrete stages, each with its own trust implications.
Foundation Model Acquisition
Most organizations don't train foundation models. They acquire them—from commercial providers, open-weights repositories, or research releases. This acquisition is the first link in your supply chain.
Commercial API providers offer models as services. You don't receive the model; you receive access to inference endpoints. The supply chain risk here is operational dependency and data exposure, not artifact integrity. Your prompts and data flow to infrastructure you don't control.
Open-weights models are distributed as downloadable artifacts. You get the model weights and can run inference locally. This feels safer—no data leaving your environment—but you've imported an artifact of unknown provenance. The weights could have been modified between training and distribution. The training data and process are described in documentation you have to trust.
Commercial model downloads fall between these extremes. You license a model, download it, and run it in your infrastructure. You get contractual assurances about training, but verification remains limited.
Each acquisition path has different trust properties:
| Acquisition Type | What You Control | What You Trust |
|---|---|---|
| API Provider | Nothing | Provider's security, training choices, data handling |
| Open-Weights | Inference infrastructure | Training provenance, artifact integrity, community vetting |
| Commercial Download | Inference infrastructure | Provider's attestations, contractual commitments |
The architectural question is: what are you trusting, and is that trust justified?
For open-weights models, trust is often based on reputation and community adoption. A model with thousands of users and active discussion feels safer than an obscure release. But popularity isn't security. The insurance company in our opening scenario used a popular model. Popularity meant many eyes looked at capability benchmarks. It didn't mean anyone audited training data provenance.
Model Storage and Distribution
Once acquired, models need to live somewhere. Model storage is artifact storage, and artifact storage is a security boundary.
Model registries function like container registries or package repositories. They store versioned model artifacts with metadata. Access to the registry determines who can retrieve models for deployment or further training.
Storage infrastructure holds the actual bits—often object storage given model sizes. The storage layer has its own access controls separate from registry-level permissions.
Distribution channels move models from storage to training or inference environments. This might be direct download, artifact replication, or streaming from storage.
Each stage is an opportunity for tampering or unauthorized access:
- Registry compromise could substitute malicious models for legitimate ones
- Storage access could enable direct modification of model files
- Distribution interception could replace models in transit
The parallel to software supply chain attacks is exact. Just as attackers target package registries and build pipelines, model storage and distribution are natural attack points. The difference is that model tampering is harder to detect. You can diff two versions of a source file. You cannot meaningfully diff two versions of a billion-parameter model.
Training Infrastructure
When you train or fine-tune models, you create new artifacts. The infrastructure where this happens has security properties that transfer to the resulting model.
Compute environments run training workloads. These environments have access to training data, model weights, and training code. A compromised training environment could modify any of these.
Training pipelines orchestrate the training process—loading data, configuring hyperparameters, executing training loops, saving checkpoints. Pipeline integrity determines whether training produces the artifact you intended.
Experiment tracking systems record training configurations, metrics, and artifacts. These systems know what was trained on what data with what parameters. They're valuable for reproducibility—and valuable targets for attackers who want to understand your models.
Checkpointing and artifact storage save intermediate and final models. Checkpoint storage might be treated as ephemeral, with weaker controls than production model storage. But checkpoints are models—they can be deployed, exfiltrated, or tampered with.
The training environment is where data becomes model. This is a privilege concentration. Whatever identity runs training has access to potentially sensitive training data and produces an artifact that will influence production decisions. If that identity is compromised, or if the environment is compromised, the resulting model is suspect.
Fine-Tuning: Inheritance and Extension
Fine-tuning deserves special attention because it's where supply chain risks compound.
When you fine-tune a model, you start with a base model and adjust its weights using your data. The result is a new artifact that combines the base model's capabilities with domain-specific learning. This combination is not a replacement. It's an extension.
What transfers from the base model:
- Encoded knowledge and capabilities
- Memorized training examples
- Biases from base training data
- Any backdoors or triggers
- Behavioral patterns and tendencies
What fine-tuning adds:
- Domain-specific capabilities
- Adaptation to your data's patterns
- Potentially, new memorization from fine-tuning data
- Your organization's specific biases
Fine-tuning doesn't sanitize the base model. It's not a filter. If the base model can generate problematic content, the fine-tuned model likely can too. If the base model memorized sensitive data, that memorization persists. Fine-tuning might make problematic behaviors harder to elicit—the model is now more focused on your domain—but it doesn't remove them.
This inheritance is the core supply chain challenge. You didn't train the base model. You don't know exactly what's in it. But whatever's in it is now in your model, running in your environment, processing your data.
Artifact Integrity and Provenance
For software, we have mature mechanisms for integrity and provenance: signed commits, reproducible builds, software bills of materials, attestations about build environments. For models, these mechanisms are nascent.
Model cards document intended uses, training data sources, evaluation results, and limitations. They're useful but not verifiable. A model card says what the creator claims, not what can be independently confirmed.
Training attestations describe the training process, data, and environment. Some providers offer these; many don't. Even when provided, attestations are statements of intent, not cryptographic proofs.
Model signatures verify that an artifact hasn't been modified since signing. This confirms integrity from a point in time but doesn't verify what the model contains or how it was trained.
Reproducibility would let you verify a model by retraining from documented inputs. In practice, training is rarely reproducible. Hardware variations, random seeds, and undocumented preprocessing make exact reproduction impossible.
The gap between software supply chain maturity and model supply chain maturity is significant. For software, you can verify that the code you run matches the code in the repository, built by a known process. For models, you can verify that the file hasn't been corrupted in transit. That's about it.
CI/CD for Models: Building a Secure Pipeline
The solution isn't to abandon modern AI practices—it's to apply software engineering discipline to model development. Continuous integration and deployment patterns work for models, with adaptations.
Version Control and Lineage
Model development should be version-controlled with the same rigor as code.
Model versions should be immutable artifacts with unique identifiers. A deployed model should be traceable to a specific training run, which should be traceable to specific data, code, and base model versions.
Training code should live in version control. The code that produced a model should be recoverable, not lost in a data scientist's notebook history.
Configuration as code captures hyperparameters, data selections, and training settings. These configurations should be versioned alongside training code.
Data versioning tracks what data was used for which training run. This is harder than code versioning—data is larger and changes differently—but it's essential for understanding what went into a model.
The goal is reproducibility of lineage, if not reproducibility of exact weights. You should be able to answer: "What produced this model?" Even if you can't exactly recreate it, you should know what went in.
Automated Testing
Software has test suites. Models need them too, adapted for probabilistic behavior.
Capability tests verify that the model does what it should. Can it perform the tasks it was fine-tuned for? These are functional tests.
Boundary tests verify that the model doesn't do what it shouldn't. Does it refuse inappropriate requests? Does it handle adversarial inputs gracefully? These are security-relevant tests.
Regression tests compare new model versions to baselines. Has capability degraded? Have unwanted behaviors appeared? These catch problems introduced by training changes.
Extraction tests probe for memorization. Can specific prompts elicit training data? This is especially important for fine-tuned models with sensitive training data.
Bias and fairness tests evaluate outputs across demographic dimensions. These aren't just ethics checks—biased outputs can indicate problematic training data or inherited biases from base models.
Automated testing won't catch everything. Models are too complex for exhaustive testing. But automated testing catches the predictable problems and establishes a baseline for human review.
Gated Deployment
Models shouldn't flow directly from training to production. Gates between stages create checkpoints for verification.
Training → Staging: After training completes, models move to a staging environment for evaluation. This gate verifies that training succeeded and produced a functional model.
Staging → Review: Automated tests run in staging. Results go to reviewers who assess whether the model meets requirements. This gate catches what automated tests miss.
Review → Production: Approved models are promoted to production through a controlled process. This gate ensures only reviewed models reach production.
Each gate is an opportunity to catch problems before they affect production. The insurance company in our opening scenario had no gate that examined base model provenance. If they had, they might have asked questions about training data before deployment.
Artifact Signing and Verification
Every model artifact should be signed. Every deployment should verify signatures.
Signing creates a cryptographic attestation that a specific artifact was produced by a specific process at a specific time. The signing key should be controlled—ideally by a build system, not individuals.
Verification confirms that a model being deployed matches the signed artifact. This prevents substitution attacks where a malicious model replaces a legitimate one.
Chain of custody tracks artifact movement from creation to deployment. Combined with signatures, this creates an audit trail showing exactly which model is running where.
These mechanisms exist for software. Applying them to models requires treating models as first-class artifacts in your development and deployment systems.
Third-Party Model Risk
The most acute supply chain risks come from models you didn't create. Managing third-party model risk requires structured evaluation.
Evaluating Model Providers
Before acquiring a model, evaluate the source:
Training transparency: What does the provider disclose about training data, methodology, and infrastructure? More disclosure enables better risk assessment. Complete opacity should be disqualifying for sensitive use cases.
Security practices: How does the provider secure their training environment, model storage, and distribution? Have they experienced breaches? How did they respond?
Update and patching cadence: How does the provider handle discovered issues? Do they release updated models? How are updates communicated and distributed?
Contractual commitments: What does the provider guarantee about training practices, data handling, and security? Are there audit rights? Breach notification requirements?
This evaluation parallels vendor security assessments for any critical supplier. The difference is that model providers are often startups, research labs, or open-source communities without enterprise security programs.
Isolation and Containment
Third-party models should be isolated from your most sensitive systems until trust is established.
Network isolation limits what a model can access. A model running inference shouldn't have network access to internal systems it doesn't need.
Data isolation controls what data flows to the model. If you're evaluating a new model, start with non-sensitive data. Expand access as trust increases.
Blast radius limitation constrains what a compromised model could affect. If a third-party model turns out to have problems, how much damage could it cause? Design systems so the answer is "limited."
Isolation isn't permanent. It's a trust-building mechanism. As you verify a model's behavior and provenance, you can expand its access. But starting isolated protects against the unknown.
Ongoing Monitoring
Third-party model risk doesn't end at acquisition. It requires ongoing attention.
Behavioral monitoring watches for unexpected outputs. If a model starts behaving differently—generating unusual content, failing on tasks it previously handled—investigate.
Vulnerability tracking follows disclosures about models you use. When researchers publish findings about a model, assess whether those findings affect your deployment.
Provider monitoring tracks the health of your model sources. If an open-weights model's maintainers disappear, or if a commercial provider experiences a breach, reassess your risk.
Third-party models are living dependencies. They require the same ongoing management as any critical external component.
Common Mistakes Organizations Make
Treating Model Download Like Data Download
Organizations that would never download and run arbitrary executables will download and deploy arbitrary models. The reasoning: it's just weights, just numbers, just data. What could go wrong?
What's wrong is that those weights encode behavior. A model is executable in the sense that matters—it processes inputs and produces outputs that influence decisions. A malicious model can produce malicious outputs. A compromised model can leak information. A poorly trained model can make harmful decisions.
The download is the beginning, not the end. What comes after download determines whether those weights become a trusted part of your system or a liability.
What this misses: Models are artifacts that execute in your environment. Treat them like binaries from external sources, with appropriate verification and isolation.
Fine-Tuning Without Base Model Assessment
Teams eager to ship often grab a foundation model, fine-tune it, and deploy—without assessing the base model beyond capability benchmarks. Does it perform the task? Great, let's customize it.
This skips the supply chain evaluation. Where did the base model come from? What was it trained on? Are there known issues? Has it been evaluated for the type of content it might generate?
Fine-tuning doesn't give you a clean slate. It gives you everything in the base model plus your additions. Skipping base model assessment means deploying unknown risks.
What this misses: Base model risk transfers to fine-tuned models. Assessment before fine-tuning is assessment before deployment.
No Model Versioning or Lineage
Data science workflows often evolve organically. Train a model, adjust parameters, retrain, adjust again. The model that reaches production might be several iterations removed from documented training, with changes recorded only in a data scientist's memory.
When something goes wrong—the model produces unexpected output, shows bias, leaks training data—there's no trail to follow. What was this model trained on? What version of the base model? What code? The answers are lost.
This is the "works on my machine" problem scaled to models. If you can't reconstruct how a model was created, you can't investigate problems or confidently recreate it.
What this misses: Lineage is auditability. Without versioning and lineage, you can't investigate incidents, satisfy auditors, or understand your own systems.
Storing Models Like Regular Files
Models are large files. Large files go in object storage. Object storage has default configurations. Those defaults often aren't appropriate for artifacts that will influence production decisions.
Training checkpoints accumulate in blob storage with open access. Model registries run without authentication because "it's just internal." Intermediate artifacts persist indefinitely because nobody thought to set retention policies.
This creates exfiltration targets, tampering opportunities, and compliance liabilities. Model artifacts deserve the same storage controls as production data.
What this misses: Model storage is sensitive artifact storage. Access controls, encryption, audit logging, and retention policies all apply.
Assuming Commercial Providers Handle Security
Using a commercial model API must be secure, right? They're a big company. They have security teams. We just call the API.
This reasoning outsources risk assessment to hope. Commercial providers have varying security postures, different data handling practices, and different transparency levels. "They're a provider" is not a security control.
Even if the provider is secure, you have responsibilities. What data are you sending in prompts? How are you handling outputs? What happens if the provider is breached? The shared responsibility model applies to AI just as it applies to cloud.
What this misses: Provider relationships are trust relationships that require explicit evaluation, not assumptions.
Architectural Questions to Ask
Model Provenance Questions
- For every model in production, can you identify its complete lineage—base model, training data, training code, and all fine-tuning steps?
- Do you have documented provenance for third-party models, including training data sources and methodology?
- If a base model was found to have problematic training data, could you identify all derived models in your environment?
- Are there models in production whose provenance you cannot fully reconstruct?
Why these matter: You can't assess risk for artifacts you can't trace. Unknown provenance is unknown risk.
Supply Chain Trust Questions
- What due diligence do you perform before acquiring a new model?
- For open-weights models, how do you verify artifact integrity between download and deployment?
- Do you have criteria for acceptable model sources, or can anyone download and deploy?
- How do you track vulnerabilities or issues disclosed about models you use?
Why these matter: Model acquisition is supply chain intake. Without structured evaluation, you're accepting whatever arrives.
Training Infrastructure Questions
- Who has access to training environments, and is that access regularly reviewed?
- Are training pipelines instrumented for integrity—can you detect unauthorized modifications?
- How are training checkpoints and intermediate artifacts protected?
- Can you verify that a deployed model was produced by your intended training process?
Why these matter: Training infrastructure is where models are created. Compromise here compromises everything downstream.
Artifact Management Questions
- Are model artifacts signed, and is signature verification enforced at deployment?
- Do you have version control for models, training code, and training configurations?
- Are there models in storage that should have been deleted?
- Can you produce a bill of materials for a deployed model—what went into it?
Why these matter: Artifact management determines whether you control your models or just store them.
Fine-Tuning Questions
- Before fine-tuning a base model, do you assess the base model's training provenance and known issues?
- How do you evaluate whether problematic behaviors from the base model persist after fine-tuning?
- Do you test fine-tuned models for training data extraction—both base model data and fine-tuning data?
- If the base model receives a security update, do you have a process to incorporate it into fine-tuned versions?
Why these matter: Fine-tuning extends the supply chain. Base model risks become your risks.
Third-Party Risk Questions
- Do you maintain an inventory of third-party models, including version and source?
- What is your process when a vulnerability is disclosed in a model you use?
- For commercial model providers, what security attestations do you require?
- How do you isolate third-party models during evaluation before full deployment?
Why these matter: Third-party models are the largest source of supply chain risk. Structured management is essential.
Key Takeaways
Models are software artifacts with supply chain risks. A model you didn't train is code you didn't write, shaped by decisions you didn't make, running in your environment. Apply the same supply chain discipline you'd apply to any critical dependency—verification, isolation, and ongoing monitoring.
Fine-tuning extends lineage; it doesn't reset it. Everything in the base model transfers to your fine-tuned model. Problematic training data, encoded biases, memorized content, and potential backdoors persist through fine-tuning. Base model assessment is essential before fine-tuning, not optional.
Model provenance is the foundation for trust. If you can't trace a model to its origins—what data, what code, what infrastructure, what base model—you can't assess its risk. Unknown provenance is unknown risk that you're accepting into production.
CI/CD patterns apply to models. Version control, automated testing, gated deployments, and artifact signing work for models. They require adaptation for probabilistic systems, but the principles translate. Building these practices prevents the ad-hoc workflows that create ungoverned models.
Third-party model risk requires active management. Evaluating providers, isolating new models, and monitoring for disclosures are ongoing responsibilities. A model from a reputable source is still a model you didn't create, with properties you can't fully verify.
Training and fine-tuning produce the artifacts that will run in production. Get supply chain security wrong here, and you've built on a compromised foundation. Get it right, and you've established the provenance and integrity that make deployment trustworthy. The next chapter examines that deployment—where models become dangerous by connecting to real users, real data, and real systems.