An AI system can be compliant, audited, and validated before deployment… yet still become difficult to understand when an incident occurs in production.
The European framework for artificial intelligence represents a major step forward. It structures requirements around risk management (Article 9), record-keeping (Article 12), transparency (Article 13), human oversight (Article 14), and robustness (Article 15)
This introduces a fundamental shift: compliance is no longer a static state, but a continuous process grounded in real-world use.
Article 9 requires a continuous and iterative risk management system across the entire lifecycle, including post-deployment.
Article 14 requires effective human oversight capable of detecting and correcting anomalies in operation
The operational blind spot
In practice, a system can be:
- formally compliant
- properly documented
- validated before deployment
…yet still remain difficult to:
- observe in real-world conditions
- understand its decisions
- diagnose when it fails
However, several obligations under the AI Act depend directly on real execution:
- human oversight must be demonstrable
- risk management must rely on post-deployment data
- traceability must remain usable over time
In other words:
compliance ultimately depends on real system behaviour in production.
The real problem: the incident
When an AI system:
- produces an error
- drifts
- or generates an unexpected decision
the critical question becomes:
Can we reconstruct what actually happened?
Today, despite advances in observability (logs, traces, metrics), the answer is often partial.
Current tools help us understand:
- what happened
- where it happened
But far less:
- why it happened
- in which real context
- with which complete causal chain
Thesis
Execution should not only be compliant or observable.
It must be reconstructible, explainable, and diagnosable.
Approach: OBELISK — Execution Evidence Layer
This gap led us to develop a complementary layer to existing approaches.
Objective: move from system observation to causal reconstruction of behaviour in production.
What this changes in practice
Operational reality measurement
Capture what actually happens: accepted, modified, and rejected decisions.
Real friction analysis
Detect weak signals: corrections, rework, escalations, operator time*.
This quantifies the gap between expected performance and real usage.
Failure diagnosis
In case of incidents, reconstruct:
- the full execution chain
- intermediate decisions
- real business context
- the precise point of failure
This shifts from correlation to actionable causal analysis.
Performance validation
Claims become verifiable: productivity gains, error reduction, actual robustness.
Context anchoring
Every observation is linked to:
- a business workflow
- a risk level
- a real operational context
Example (instrumented real case)
On a typical use case:
- 72% of suggestions accepted
- 18% modified
- 6% rejected
- 4% escalated to humans
These indicators allow:
- measuring real human oversight
- identifying where the system fails
- understanding why it fails
Positioning
Existing tools enable tracing and monitoring.
This approach enables reconstruction, explanation, and diagnosis.
Link to the AI Act
It directly strengthens:
- traceability (Article 12)
- human oversight (Article 14)
- post-market monitoring (Article 72)
- continuous risk management
and most importantly:
the ability to analyse real incidents in production.
Insight
Future compliance will not be a certification.
It will be a demonstration of real-world system behaviour.
Conclusion
Without the ability to explain failures, some AI Act obligations remain difficult to verify in practice.
Europe does not only need compliant systems.
It needs systems whose behaviour is:
- observable
- understandable
- analysable when failures occur
- Kirjaudu sisään, jotta voit julkaista kommentteja