AI Act: Strengthening Compliance Through Execution Understanding

An AI system can be compliant, audited, and validated before deployment… yet still become difficult to understand when an incident occurs in production.

The European framework for artificial intelligence represents a major step forward. It structures requirements around risk management (Article 9), record-keeping (Article 12), transparency (Article 13), human oversight (Article 14), and robustness (Article 15)

This introduces a fundamental shift: compliance is no longer a static state, but a continuous process grounded in real-world use.

Article 9 requires a continuous and iterative risk management system across the entire lifecycle, including post-deployment.
Article 14 requires effective human oversight capable of detecting and correcting anomalies in operation

The operational blind spot

In practice, a system can be:

  • formally compliant
  • properly documented
  • validated before deployment

…yet still remain difficult to:

  • observe in real-world conditions
  • understand its decisions
  • diagnose when it fails

However, several obligations under the AI Act depend directly on real execution:

  • human oversight must be demonstrable
  • risk management must rely on post-deployment data
  • traceability must remain usable over time

In other words:
compliance ultimately depends on real system behaviour in production.

The real problem: the incident

When an AI system:

  • produces an error
  • drifts
  • or generates an unexpected decision

the critical question becomes:

Can we reconstruct what actually happened?

Today, despite advances in observability (logs, traces, metrics), the answer is often partial.

Current tools help us understand:

  • what happened
  • where it happened

But far less:

  • why it happened
  • in which real context
  • with which complete causal chain

Thesis

Execution should not only be compliant or observable.
It must be reconstructible, explainable, and diagnosable.

Approach: OBELISK — Execution Evidence Layer

This gap led us to develop a complementary layer to existing approaches.

Objective: move from system observation to causal reconstruction of behaviour in production.

What this changes in practice

Operational reality measurement
Capture what actually happens: accepted, modified, and rejected decisions.

Real friction analysis
Detect weak signals: corrections, rework, escalations, operator time*.
This quantifies the gap between expected performance and real usage.

Failure diagnosis
In case of incidents, reconstruct:

  • the full execution chain
  • intermediate decisions
  • real business context
  • the precise point of failure

This shifts from correlation to actionable causal analysis.

Performance validation
Claims become verifiable: productivity gains, error reduction, actual robustness.

Context anchoring
Every observation is linked to:

  • a business workflow
  • a risk level
  • a real operational context

Example (instrumented real case)

On a typical use case:

  • 72% of suggestions accepted
  • 18% modified
  • 6% rejected
  • 4% escalated to humans

These indicators allow:

  • measuring real human oversight
  • identifying where the system fails
  • understanding why it fails

Positioning

Existing tools enable tracing and monitoring.
This approach enables reconstruction, explanation, and diagnosis.

Link to the AI Act

It directly strengthens:

  • traceability (Article 12)
  • human oversight (Article 14)
  • post-market monitoring (Article 72)
  • continuous risk management

and most importantly:
the ability to analyse real incidents in production.

Insight

Future compliance will not be a certification.
It will be a demonstration of real-world system behaviour.

Conclusion

Without the ability to explain failures, some AI Act obligations remain difficult to verify in practice.

Europe does not only need compliant systems.
It needs systems whose behaviour is:

  • observable
  • understandable
  • analysable when failures occur

 

 

Tunnisteet
recommendation ai regulation Trustworthy AI AI infrastructure