The Limits of Pure Hierarchical Agent Orchestration

Why evidence-based safety assessment needs controlled cross-checking between specialist agents.

When the lead agent becomes a lossy compression layer

Most multi-agent engineering systems begin with hierarchy. A lead agent receives the user request, decomposes the work, assigns tasks to specialist agents, gathers their outputs, and writes the final report. That pattern is useful. For a Digital Safety Assessor, it is often the right starting point.

By pure hierarchy, I mean a workflow where all specialist-agent interaction is routed through the lead agent.

The lead agent owns scope, sequencing, conflict resolution, user-facing communication, and the final report. The specialist agents perform focused checks: standards expectations, evidence inventory, traceability, verification evidence, and safety-case structure.

The lead remains the final decision-maker. Peer-to-peer checks should inform the lead assessor; they should not allow specialist agents to independently make final safety conclusions. The limitation appears when findings depend on relationships across specialist domains.

If every cross-domain question must pass through the lead, the lead can become a lossy compression layer: compressing detailed evidence, exceptions, uncertainty, and specialist reasoning into a clean but incomplete synthesis.

The goal is not free-form communication between agents, but logged, scoped, evidence-specific verification requests.

Where hierarchy loses important findings

Consider a simplified assessment involving one software safety requirement and one linked test:

Traceability Agent:
SSR-014 links to TC-088.

Test Agent:
TC-088 exists and passed.

Safety Case Agent:
The safety case claims all safety requirements are verified.

Individually, these statements may look acceptable. But the important assessment question is more precise:

Does TC-088 actually verify the behavior required by SSR-014?

The difference matters. A trace can exist, and a test can pass, while the verification remains inadequate.

SSR-014 requires safe-state transition within 100 ms.
TC-088 only checks diagnostic flag propagation.
The trace exists, but verification adequacy is insufficient.

This is not a formatting issue in a traceability matrix. It is a materially different safety-assessment result. A controlled peer-to-peer cross-check makes the dependency explicit:

Traceability Agent → Test Agent:
Does TC-088 verify the 100 ms safe-state transition requirement in SSR-014,
or only related diagnostic behavior?

Operationally, each cross-check should include the source claim, referenced artifact, question, expected response format, confidence or uncertainty statement, and log entry. In a safety context, the cross-check itself should become part of the assessment record.

The same pattern appears when an architecture document is from the wrong baseline, when a safety-case claim overstates verification coverage, or when a standard expectation depends on evidence that has not actually been reviewed. The important finding emerges from the relationship between specialist outputs, not from any one output alone.

Why clean reports are not enough

A Digital Safety Assessor is not a generic summarization tool. It should support engineering judgment by making the relationship between claims, artifacts, assumptions, and findings explicit.

A useful assessment output should answer:

What was checked?
Which artifacts were used?
Which claims passed, failed, or remained inconclusive?
Which findings are evidence-backed?
Which assumptions or human judgments remain open?

A hierarchical agent can produce a clean report. But a clean report is not necessarily a reliable report.

Requirement claim:
All software safety requirements are verified.

Traceability check:
Most requirements have linked tests.

Test adequacy check:
Some linked tests do not actually verify the required behavior.

Safety case check:
The safety case conclusion is therefore too strong.

The problem is not that hierarchy is wrong. The problem is that hierarchy can hide these interactions if they are not explicitly designed into the workflow.

Benefits of hierarchy

Hierarchy is still needed. A lead assessor or orchestrator gives the system:

a single owner of assessment scope
predictable sequencing
centralized conflict resolution
a clear final report owner
simpler human review
simpler logging
clearer access control
more predictable cost and latency

For simple tasks, such as reviewing one safety plan against expected sections, a lead agent plus one standards/checklist agent may be enough. There is no need to create a multi-agent communication network for every task. The issue arises when assessment findings depend on relationships across artifacts and specialist domains. That is common in safety engineering.

Failure modes of hierarchy-only systems

1. Lossy synthesis

The lead receives specialist summaries, not always the reasoning and exceptions behind them. “Requirements are mostly traced” can become an overstated conclusion if the exceptions are not inspected.

2. Bottlenecked cross-checking

All domain interactions depend on the lead noticing the right follow-up question. The lead must also catch when final wording is too strong.

3. Context overload

If the lead owns all cross-domain reasoning, it must carry standards expectations, evidence, requirements, architecture, tests, traceability, safety-case arguments, and report structure. Larger context windows help with capacity, but they do not replace context design.

4. Overconfident final reports

A hierarchy-only system may smooth uncertainty into fluent prose. A safety assessment should distinguish verified evidence, partial support, missing evidence, conflicting evidence, assumptions, and areas that require human judgment.

5. Weak challenge function

Safety assessment benefits from independent challenge. The same lead agent that coordinates and writes the report may not be the best place to independently challenge every conclusion in that report.

Practical takeaway

For the SSR-014 / TC-088 example, the lead assessor should not merely receive “linked test passed.” The workflow should preserve the more important question: whether the linked test actually verifies the required safe-state transition behavior.

The purpose is not to make agents chat. The purpose is to prevent important cross-domain findings from being lost in synthesis, while keeping the lead assessor in control of the workflow and the final conclusion.

Use hierarchy for control. Use controlled specialist-to-specialist checks for bounded, evidence-specific cross-checking.

Part 2 will look at the mechanics: who may request a cross-check, what the request must contain, how it is logged, how conflicts are escalated, and how the lead assessor keeps final control. In safety assessment, orchestration is not only about assigning work. It is about preserving the evidence relationships that make a conclusion trustworthy.

Implementation support: Quenos Technology helps organizations design practical Digital Safety Assessor workflows: lead-assessor control, specialist roles, evidence boundaries, review points, and integration with existing safety engineering processes. Learn more at quenos.technology.