Developing a Non-Deterministic AI Tool According to a Safety Standard

A practical ISO 26262 argument for qualifying non-deterministic AI tools: Clause 11 as the tool qualification frame, a selected Part 6 and Part 2 development process, Clause 12-style evidence where it helps, and TR 5469 as AI-specific support.

A non-deterministic AI tool cannot be qualified for ISO 26262 by showing a few good outputs.

That is the trap.

The question is not whether the model usually gives useful answers. The question is whether a defined tool version, configuration, model setup, prompt set, retrieval corpus, workflow, and human-review process can be trusted for a defined ISO 26262 activity.

For that, the starting point is ISO 26262-8, Clause 11: confidence in the use of software tools. If the tool can introduce or fail to detect errors in a safety lifecycle activity, and the downstream process relies on its output, the tool needs a confidence argument. For a TCL3 AI tool, validation matters, but validation alone is not enough. You also need to show that the tool was developed, configured, verified, used, and maintained under a controlled process.

The short version is this:

Clause 11 is the tool qualification frame.
Part 6 can provide the selected software-development lifecycle evidence.
Part 2 can provide the selected safety-management evidence.
Clause 12 is not the tool qualification route, but it is useful as an evidence pattern and for reused software components.
ISO/IEC TR 5469:2024 is informative AI-specific support for issues such as probabilistic behaviour, drift, explainability, monitoring, and V&V limits.

That hierarchy is important. It keeps the claim narrow enough for an assessor to evaluate.

Diagram showing Clause 11 as the tool qualification frame, Clause 12 as the evidence pattern, Part 6 and Part 2 as selected evidence sources, and ISO/IEC TR 5469 as AI-specific guidance. — Clause 11 is the qualification frame. Part 6, Part 2, Clause 12, and TR 5469 support the argument without replacing it.

The assessor question

An assessor will not ask whether the AI tool is impressive. They will ask whether the qualification claim is controlled.

In practice, that means answering questions like these:

What exact tool version, model, prompts, retrieval corpus, configuration, and environment are qualified?
Which ISO 26262 activity does the tool support?
Which erroneous outputs could affect the safety lifecycle?
What downstream checks prevent or detect those errors?
What Tool Confidence Level follows from the tool impact and detection argument?
Which Clause 11 qualification method is selected?
Which Part 6 and Part 2 requirements are used for the development-process argument?
Which reused components need component-level qualification evidence?
What validation set shows suitability for the qualified use?
What changes trigger impact analysis, re-verification, or re-qualification?

If the article, plan, or qualification report cannot answer those questions plainly, the argument is not ready.

Clause 11 first

Clause 11 deals with confidence in software tools used to support ISO 26262 activities. Its concern is simple: a tool malfunction can create an erroneous output, fail to detect an error, or present a result in a way that the engineering team relies on when it should not.

For a non-deterministic AI tool, the malfunction is not limited to crashes or obvious software bugs. The risky output may be a confident but unsupported clause interpretation, a missed requirement, a wrong ASIL assumption, a hallucinated citation, a broken traceability link, or a report section that looks review-ready but is not supported by the source evidence.

That is why the qualification scope has to be precise. The report should define:

the AI tool version and qualified configuration,
the model and model-version assumptions,
the prompt, retrieval, and source-corpus assumptions,
the qualified use cases,
the users and required review workflow,
the execution environment,
the safety-relevant outputs,
the known limitations and excluded use cases, and
the conditions under which the qualification remains valid.

This is also where many weak arguments fail. They try to qualify “the AI” instead of a controlled tool package used in a defined way.

The development-process route

For TCL3, Clause 11 allows qualification by evaluating the tool development process. In practical terms, the tool developer has to show that a suitable software development process was applied.

The standard also acknowledges an important reality: no safety standard is fully applicable to software tools. A relevant subset has to be selected.

That matters for AI tools.

A non-deterministic AI compliance evaluator is not an embedded ECU. It does not have a vehicle-level safe state. It does not have an HSI in the normal ECU sense. It may include model selection, prompt control, retrieval control, evaluation datasets, guardrails, review workflows, logging, and monitoring of non-deterministic behaviour. Applying every embedded-software expectation literally would create paperwork, not confidence.

But selecting and adapting relevant requirements from Part 6 and Part 2 can create a defensible development framework.

Part 6 gives the lifecycle logic: requirements, architecture, implementation, verification, integration, validation, and control of the development environment.

Part 2 gives the management logic: responsibilities, competence, planning, configuration discipline, anomaly handling, confirmation measures, and assessment expectations.

Together, they let the team show that the AI tool was engineered instead of improvised.

Why validation alone is not enough

Choosing the development-process route does not make validation optional.

Validation is still necessary. It shows whether the tool behaves acceptably for the intended purpose.

But for a TCL3 non-deterministic AI tool, a validation set by itself rarely closes the argument. The tool can change because the model changes, the prompt changes, the retrieval index changes, the source corpus changes, or the user workflow changes. Even if the code stays the same, the behaviour may move.

So the stronger argument combines two things:

process evidence: the tool was developed and controlled using a selected safety-standard framework; and
validation evidence: the tool was tested against representative use cases, expected failures, edge cases, and known limitations.

The process evidence does not replace validation. It explains why the tool should be stable, controlled, reviewable, and maintainable. The validation evidence shows that this is true for the qualified use.

Where Clause 12 helps, and where it does not

Clause 12 qualifies software components for reuse. It does not qualify software tools.

So the claim should not be:

“The AI tool is qualified according to ISO 26262-8:12.”

That is too easy to challenge.

Clause 12 is useful in two narrower ways.

First, it can be used directly for reused or pre-existing software components inside the AI tool. That may include an ingestion pipeline, OCR component, retrieval/indexing component, embedding model, LLM interface, deterministic classifier, report generator, or review-workflow component. The point is to show that the component is suitable for its intended use inside the qualified tool package.

Second, Clause 12 provides a useful evidence pattern when selecting the relevant subset of Part 6 and Part 2. It asks practical questions that also matter for AI tool qualification: What exactly is being used? For what intended use? Against which requirements? Under which configuration? With what evidence that the result remains valid?

Used this way, Clause 12 is not the authority for the tool qualification claim. Clause 11 is. Clause 12 is a pattern for organizing evidence.

Clause 12 evidence question	AI tool qualification implication
What is the software component?	Identify the AI tool package, model, prompts, retrieval index, source corpus, libraries, guardrails, and configuration.
What is the intended use?	Define qualified use cases, input types, outputs, users, assumptions, exclusions, review expectations, and downstream checks.
Does it meet its requirements?	Show AI tool requirements, architecture, implementation controls, verification evidence, and regression evidence.
Is it suitable for the intended use?	Validate representative safety-engineering tasks and known failure modes.
Is the configuration controlled?	Control versions, prompts, retrieval data, dependencies, environment, releases, anomalies, and changes.
Are the qualification results still valid?	Review whether the evidence remains valid for the project context, model behaviour, corpus, workflow, and TCL claim.

Diagram of an AI safety tool evidence flow from source documents through retrieval and prompts, AI tool execution, human review, and feedback into regression and traceability evidence. — A defensible AI tool argument connects controlled inputs, retrieval, model behaviour, human review, regression evidence, and traceability.

What TR 5469 adds

ISO/IEC TR 5469:2024 is useful, but it has to be used carefully. It is informative guidance on AI and functional safety. It is not a new normative qualification route for automotive tools.

Its value is that it names AI-specific issues that ordinary ISO 26262 software lifecycle language does not always make explicit enough: probabilistic behaviour, data and concept drift, explainability, adversarial inputs, monitoring, lifecycle artefacts, and the limits of verification and validation for AI systems.

In this argument, TR 5469 sits underneath the tailoring rationale. It supports the decision to add controls such as:

pinned model and prompt configurations,
controlled retrieval indexes and source corpora,
documented validation-data relevance,
representative evaluation sets,
drift monitoring,
human-review constraints,
explainability and citation expectations,
incident feedback into regression tests, and
periodic re-validation where the model or corpus can change.

That is the right level for TR 5469. It strengthens the AI-specific reasoning without pretending to replace Clause 11.

The claim I would defend

A defensible claim would read something like this:

The AI tool is qualified under ISO 26262-8:11 for the defined use cases, configuration, environment, and human-review assumptions. For the TCL3 development-process route, a selected subset of ISO 26262 Part 6 and Part 2 was used as the safety-standard framework. ISO 26262-8:12 was used as an evidence pattern for the tailoring rationale and, where applicable, for reused software components. ISO/IEC TR 5469:2024 was used as informative AI-specific guidance. The selected subset, limitations, validation evidence, and conditions of use are documented in the tool qualification report.

That claim says what is qualified. It also says what is not being claimed.

It does not say the AI tool is generally “ISO 26262 compliant.” It does not say Clause 12 qualifies the tool. It does not say TR 5469 creates new requirements. It does not qualify every future model, prompt, corpus, environment, or use case.

That discipline is what makes the argument assessable.

What the dossier should contain

If I were reviewing this as a safety assessor, I would expect the dossier to contain at least the following evidence.

Area	Work product	Why it matters
Frame	AI tool criteria evaluation report	Defines tool impact, error detection, TCL, intended use, inputs, outputs, environment, constraints, and qualified configuration.
Plan	Tool qualification plan	Explains the selected Clause 11 method, the safety-standard subset, the tailoring rationale, and how the evidence will close the confidence argument.
Management	AI tool safety management framework	Defines responsibilities, competence, review flow, configuration/change management, anomaly handling, release criteria, independence expectations, and confirmation measures.
Tailoring	Safety-standard applicability matrix	Maps selected Part 6 and Part 2 requirements to evidence. Marks each requirement as applicable, adapted, not applicable, or covered elsewhere. Gives rationale for exclusions.
Requirements	AI tool requirements specification	Defines functions, safety-relevant behaviours, interfaces, constraints, expected outputs, error handling, logging, misuse cases, retrieval limits, citation expectations, confidence handling, and human-review assumptions.
Architecture	AI tool architecture and safety mechanisms	Shows where erroneous or unsupported AI output can be introduced, prevented, detected, reviewed, logged, or contained.
Verification	Verification evidence	Provides reviews, unit tests, integration tests, regression tests, static checks, requirements coverage, anomaly records, and corrective-action evidence.
Validation	AI tool validation evidence	Shows the tool works for the qualified purpose using representative cases, edge cases, known failure modes, and realistic project inputs.
Components	AI component qualification evidence	Provides Clause 12-style evidence for reused components whose behaviour affects the qualified output.
AI tailoring	TR 5469 tailoring note	Maps AI-specific controls to topics such as lifecycle artefacts, V&V limits, drift, explainability, monitoring, and incident feedback.
Closure	Qualification report	States the qualified version, use cases, TCL, methods, evidence summary, validation result, limitations, residual risks, and conditions of use.

The high-effort items are usually the qualification plan, applicability matrix, requirements specification, validation evidence, and component evidence. Those are also the places where assessors tend to find gaps.

What assessors will challenge

The hardest part is not naming the standards. It is proving that the selected evidence is enough for the intended use.

Common weak spots include:

an intended-use statement that is too broad,
a TCL argument that assumes human review catches everything,
missing traceability from AI tool requirements to verification and validation,
validation data that does not represent real project inputs,
uncontrolled prompt or retrieval changes,
model-version assumptions that are not enforceable,
component limitations that are not carried into the tool-level argument,
review workflows that allow AI output into the safety case without sufficient checks, and
a qualification claim that is broader than the evidence supports.

A good qualification report makes these limits visible. It does not hide them in optimistic language.

Align early with the assessor and customer

Do not wait until the end to reveal the tailoring strategy.

A Tier 1 customer, OEM, or TÜV assessor may accept the structure, but still expect more explicit traceability, more independence in verification, a stronger validation set, or clearer configuration controls. Those expectations should be discovered while the plan can still change.

The qualification plan should make the tailoring visible early:

which Part 6 requirements are selected,
which Part 2 requirements are selected,
which requirements are adapted,
which requirements are excluded,
which Clause 12-style component arguments are used,
which validation activities close the remaining confidence gap, and
which assumptions limit the qualified use.

If someone challenges the Clause 12 analogy, do not defend it as a separate route. Bring the discussion back to Clause 11. The selected method is evaluation of the tool development process. Clause 12 only helps structure the evidence selection.

Maintenance and re-qualification triggers

Qualification is not a badge you attach once and forget.

For non-deterministic AI tools, the qualification argument has to define which changes require impact analysis, partial re-verification, or full re-qualification. The obvious triggers are tool-version changes, dependency changes, execution-environment changes, and workflow changes. AI adds more:

model changes,
prompt changes,
retrieval-index changes,
embedding model changes,
source-corpus changes,
output-schema changes,
guardrail changes,
validation-set changes,
vendor API changes, and
changes to the human-review process.

The practical rule is simple: if a change can alter a safety-relevant output or the confidence argument for that output, it needs controlled impact analysis before the qualified-use claim is preserved.

Diagram showing a qualification maintenance loop: change, impact analysis, re-validation, updated evidence, and maintained confidence. — For non-deterministic AI tools, qualification has to be maintained through change control, impact analysis, re-validation, and updated evidence.

Toolchain versioning needs special care when the underlying LLM is provided through an external API. If the vendor can update model behaviour silently, or without a version signal that is meaningful for qualification, the qualified configuration needs compensating controls. Examples include pinned model identifiers where available, vendor change notifications, regression monitoring, periodic re-validation, or contractual/API constraints on model updates.

Ownership should also be explicit. The tool developer usually owns the generic impact analysis for changes to the qualified tool package. The tool user owns the project-specific decision that the tool remains valid for the intended use. For AI tools, both sides need to cooperate because model, prompt, corpus, and workflow changes can sit on either side of that boundary.

Bottom line

A TCL3 qualification argument for a non-deterministic AI tool can be defensible, but only if the claim is narrow.

Qualify the tool version, configuration, use cases, environment, model and retrieval assumptions, and human-review workflow. Show the selected development process. Show validation evidence. Define what changes trigger impact analysis or re-qualification.

Do not claim that the AI tool is generically ISO 26262 compliant. Do not present Clause 12 as the tool qualification route. Do not use TR 5469 as if it creates new automotive tool-qualification requirements.

Use Clause 11 as the frame. Use Part 6 and Part 2 for the selected development and management evidence. Use Clause 12 where it helps structure the evidence and qualify reused components. Use TR 5469 to make the AI-specific controls explicit.

Scoped, tailored, traceable, reviewable, and maintained under change control.