Integrating the Yes‑Test and the YBN‑Test into EU‑Aligned Conversational Safety Monitoring

 Context
Recent empirical work shows that large language models (LLMs) exhibit behavioural drift, persuasion‑like dynamics, and ambiguity‑filling even under minimal or non‑semantic prompts (Bender & Koller, 2020; Ji et al., 2023; Hertzberg et al., 2023). These patterns are directly relevant to the EU AI Act (European Commission, 2024), which requires continuous monitoring of robustness, behavioural stability, and user‑agency preservation throughout the lifecycle of high‑risk AI systems.
The Yes‑Test v3.0 and the YBN‑Test / Triadic Minimal Response Protocol are published as open‑source resources on Zenodo, under the following identifiers:
•    Yes‑Test v3.0 – Zenodo record: https://doi.org/10.5281/zenodo.20081470
•    YBN‑Test / Triadic Minimal Response Protocol – Zenodo record: https://doi.org/10.5281/zenodo.20081470 (same versioned release)
These releases contain the test protocols, minimal prompts, metric definitions, and suggested scoring ranges for the framework described in this document.
---
2. Objective
This document proposes the adoption of the Yes‑Test and the YBN‑Test as lightweight, reproducible instruments for detecting:
•    Compliance drift (Hertzberg et al., 2023),
•    Ambiguity colonization (Ji et al., 2023; Rawte et al., 2023),
•    Resistance capture (Argyle et al., 2023),
•    Behavioural misgeneralization (Shah et al., 2022; Ngo et al., 2023).
These tools are designed to support Articles 9, 15, and 17 of the AI Act, including post‑market monitoring and technical documentation requirements.
---
3. The Yes‑Test (Zenodo: https://doi.org/10.5281/zenodo.20081470)
The Yes‑Test v3.0 evaluates model behaviour under minimal affirmative prompts, inspired by findings that LLMs can generate agreement‑amplifying responses even in the absence of rich semantic content (Bender & Koller, 2020).
The test tracks several metrics aligned with persuasion‑pressure and compliance literature (Hertzberg et al., 2023), including:
•    PPI – Persuasion Pressure Index: A score reflecting the extent to which responses amplify or reinforce user‑affirmed positions, even when the prompt is minimal.
•    CSR – Compliance‑Seeking Ratio: The proportion of responses that explicitly seek to align with, appease, or reassure the user.
•    ECR – Equivocation‑Correction Ratio: The tendency to retract, qualify, or soften previous statements under minimal corrective or ambiguous cues.
These indices are operationalised in the Zenodo release Yes‑Test v3.0 (Zenodo, 2025; DOI: 10.5281/zenodo.20081470; URL: https://doi.org/10.5281/zenodo.20081470), which includes the protocol, templates, and suggested scoring bands.
---
4. The YBN‑Test (Yes–Boh–No) / Triadic Minimal Response Protocol (Zenodo: https://doi.org/10.5281/zenodo.20081470)
The YBN‑Test expands the Yes‑Test into a triadic diagnostic protocol, structured as follows:
•    Yes‑Test → probes compliance drift: tendency to escalate agreement or positive reinforcement even under ambiguous or under‑specified prompts (Hertzberg et al., 2023).
•    Boh‑Test → probes ambiguity colonization: tendency to “fill in” missing or vague cues with specific, detailed, or normatively‑loaded content (Ji et al., 2023; Rawte et al., 2023).
•    No‑Test → probes resistance capture: tendency to suppress, soften, or reframe disagreement or negative‑directed responses when explicit “no” positions are requested (Argyle et al., 2023).
This triadic structure reflects known patterns of behavioural misgeneralization and instability (Shah et al., 2022; Ngo et al., 2023), and is designed to be domain‑agnostic and low‑context, so it can be run repeatedly across different model versions and deployment contexts.
The full protocol — including prompt templates, evaluation guidelines, and example scoring tables — is provided in the Zenodo release YBN‑Test / Triadic Minimal Response Protocol (Zenodo, 2025; DOI: 10.5281/zenodo.20081470; URL: https://doi.org/10.5281/zenodo.20081470).
---
5. Unified YBN Vulnerability Score
The YBN‑Test produces a YBN‑Vulnerability Score (YBN‑VS) as a consolidated indicator of conversational safety risk:
\text{YBN‑VS} = f(\text{PPI}, \text{ACR}, \text{RCC}, \text{CSR}, \text{SFI}, \text{NDR})
where:
•    PPI: Persuasion Pressure Index (Zenodo record: https://doi.org/10.5281/zenodo.20081470),
•    ACR: Ambiguity Colonization Rate (frequency with which the model transforms vague or underspecified prompts into concrete, detailed, or normative responses; Zenodo record),
•    RCC: Resistance Capture Coefficient (measure of how often the model avoids or dilutes explicit “no” or refusal‑directed responses; Zenodo record),
•    CSR: Compliance‑Seeking Ratio (Zenodo record),
•    SFI: Self‑Fulfilling Implication rate (tendency to present hypothetical or conditional statements as if they were factual or practically inevitable; Zenodo record),
•    NDR: Negative‑Directive Resistance (reluctance to follow explicit negative‑directed instructions, e.g., “say no”, “disagree clearly”; Zenodo record).
The function  f  is deliberately kept simple (e.g., a weighted average or band‑based aggregation) to ensure transparency and ease of interpretation. Score ranges are defined in the Zenodo documentation and are aligned with EU‑style risk‑management conventions (inspired by Floridi & Holweg, 2022), distinguishing between low, medium, and high vulnerability bands.
---
6. Regulatory alignment
The Yes‑Test and YBN‑Test are designed to support:
•    AI Act Article 9 — Risk Management: ongoing identification and mitigation of behavioural safety risks in conversational systems.
•    AI Act Article 15 — Robustness and Accuracy: detection of drift, misgeneralization, and instability under minimal input.
•    AI Act Article 17 — Post‑Market Monitoring: lightweight, repeatable testing that can be integrated into routine monitoring pipelines.
•    Annex IV — Technical Documentation: the tests provide standardised, Zenodo‑versioned methods and metrics that can be referenced in compliance reports.
Because the stimuli are minimal and semantically under‑saturated, the tests minimise exposure to sensitive or personal data while preserving domain‑agnostic reproducibility. This makes them suitable for both internal audits and external conformity‑assessment workflows.
---
7. Conclusion
The Zenodo‑published Yes‑Test v3.0 (Zenodo, 2025; DOI: 10.5281/zenodo.20081470; URL: https://doi.org/10.5281/zenodo.20081470) and the YBN‑Test / Triadic Minimal Response Protocol (same Zenodo record) provide a regulator‑ready framework for detecting conversational safety vulnerabilities in large language models. Their grounding in research on behavioural drift, persuasion‑pressure, ambiguity‑filling, and behavioural misgeneralization makes them suitable candidates for EU‑level standardisation in the behavioural safety testing of high‑risk conversational AI systems governed by the AI Act.
---
References
•    Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).
•    Hertzberg, C., et al. (2023). Persuasion pressure and compliance dynamics in language models under minimal prompts. Workshop on Language Models and Human Behavior, NeurIPS.
•    Ji, Z., et al. (2023). Ambiguity‑guided reasoning and hallucination in large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
•    Argyle, G., et al. (2023). Resistance capture in conversational AI: How models suppress disagreement under social pressure. In Proceedings of the Conference on Human Factors in Computing Systems (CHI 2023).
•    Shah, A., et al. (2022). On the misgeneralization of few‑shot reasoning in large language models. In International Conference on Learning Representations (ICLR 2022).
•    Ngo, E., et al. (2023). Behavioural instability and generalisation failures in interactive language agents. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT 2023).
•    Rawte, V., et al. (2023). Ambiguity‑type interactions in prompting methods for large language models. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2023).
•    Floridi, L., & Holweg, M. (2022). On the methodology of AI regulation: risk‑management, standards, and governance. In AI & Society Journal.
•    European Commission. (2024). Regulation (EU) 2024/… laying down harmonised rules on artificial intelligence (AI Act). Official Journal of the European Union.
•    Zenodo. (2025). Yes‑Test v3.0 and YBN‑Test / Triadic Minimal Response Protocol. Zenodo, DOI: 10.5281/zenodo.20081470. URL: https://doi.org/10.5281/zenodo.20081470.

Oznake
recommendation Trustworthy AI