Sandra Wachter and Brent Mittelstadt of the Oxford Internet Institute have just published a pivotal analysis of data protection law in the age of AI and big data. Below are the key policy take-aways, the full paper can be downloaded here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3248829
Based on the analysis of the legal status and protection of inferences, the following recommendations can be made for European policy:
1. Re-define the remit of data protection law
In order to ensure data protection law protects against the novel risks introduced by Big Data analytics and algorithmic decision-making, the ECJ should re-define the law’s remit to include assessment of the accuracy of decision-making processes. Data protection is only one component of the right to privacy, which also includes a right to identity, reputation, self-presentation, and autonomy. Big Data analytics produces privacy invasive, unpredictable, and counterintuitive inferences that threaten these components of privacy. In response, data subjects require greater control over how they are being seen or assessed by automated systems.
2. Focus on how data is evaluated, not just collected
The categories of personal, sensitive, anonymous and non-personal data reflect characteristics of data when it is collected, and determine the level of protection granted to input data. These characteristics can, however, change over time, as data is used for different purposes. The German Supreme Court has previously argued that there is no such thing as “irrelevant data” when it comes to data protection law, as informational technologies might use it for purposes that affect the data subject. Seemingly neutral data can be turned into data that affects the right to privacy, or offers grounds for discrimination and other harms. Basing protections on these distinctions is thus ineffective. The damage that can be done by data does not depend on any of these categories, but rather how it is used. Inferences or profiles drawn from any of these sources can be applied to and harm an individual or group. The belief that certain categories of data are fundamentally less harmful or risky than others is undermined by Big Data analytics. We recommend adopting the position taken by the Article 29 Working Party concerning the transformation of categories of data based upon processing purposes and impact. In future European policy-making and jurisprudence, levels of protection should be granted to data based primarily on their usage and impact, and secondarily on their source.
3. Do not focus only on the identifiability of data subjects
In order for data protection rights to apply, data must be suitable to identify the individual. This is misguided, because the identifiability of data is fluid and can change over time, depending on linkage, re-identification attacks, and other technological progress. Companies can use anonymisation techniques to avoid many obligations under data protection law. Similarly, pseudonymisation techniques can potentially minimise the requirements to respect individual rights. In such cases, data controllers are not required to comply with requests from data subjects under Art 15-20 if they are “not in a position to identify” him or her, unless the data subject can provide additional information that allows the data to be re-identified (Art 11(2)). Together, these provisions could create an incentive to de-identify data in order to avoid compliance with individual rights which has happened in the past. As we have argued above, inferences drawn from anonymous and non-personal data still pose risks for data subjects (see: Sections I.A and V). As a result, identifiability as a prerequisite to exercise individual rights creates a gap in the protection afforded to data subjects against inferential analytics. The potential and actual harm of inferential analytics should be reflected in future European policy-making and jurisprudence, regardless of whether the affected parties can be identified. This is not to suggest that data subjects should be granted rights over personal and anonymous data which has not been applied to them. Rather, improved channels of redress are required against models, profiles, and other background knowledge built from third party and anonymous data and subsequently applied to identifiable individuals.
4. Justify data sources and intended inferences prior to deployment of inferential analytics at scale
Following the recommendation to implement a right to reasonable inferences, data controllers should proactively justify their design choices for ‘high-risk’ inferential analytics prior to widespread deployment. Inspiration can be drawn from the German data protection law’s provisions on predictive assessments, such as credit scoring (see: Section V.A). Controllers should pay increased attention to addressing the following aspects of the source data and outputs of inferential analytics in addressing justification (see: Section V.A):
- The privacy invasiveness and the counter-intuitiveness of the data sources used to draw inferences, for example clicking behaviour, browsing behaviour, or mouse tracking.
- The aim of the inference to be drawn should justify the means or sources of data being used in terms of invasiveness. Inferring gambling or alcohol addiction to drive targeted advertising, for example, may actively harm the data subject.
- The usage of known proxy data (e.g. post code), or the intention to infer sensitive attributes (e.g. political views ) from non-sensitive data.
- The relevance of the source data and inference to a particular processing purpose. For example, the relevance of Facebook profiles and friend networks to loan decisions.
- The statistical reliability of the methods used to draw inferences.
This is a preliminary list of potential topics and information types to be included in justification disclosures under the right to reasonable inferences. Extensive debate and further research is required to determine which information should be included in different sectors. The myriad applications of inferential analytics demand a sectoral approach.
5. Give data subjects the ability to challenge unreasonable inferences
In line with the implementation of a right to reasonable inferences, European policy-makers should grant data subjects a new right to challenge unreasonable high-risk inferences, which can also support challenges to subsequent decisions. Data subjects can raise an objection with the data controller on the grounds that the inference or its source data is irrelevant or unreliable (see: Section V.B). For verifiable inferences, the data subject can provide supplementary information to rectify the inaccurate inference. For non-verifiable and subjective inferences, supplementary information can also be provided to attempt to convince the data controller to change its assessment. The right to rectification (Art 16 GDPR) may arguably already offer a remedy for non-verifiable and subjective inferences and opinions, depending upon one’s view of the necessity of verifiability in classifying inferences as personal data (see: Sections II.B, III. and IV.B). Taking this view, the right to reasonable inferences would embed an answer to the verifiability question in law, and thus strengthen data protection rights over inferences regardless of their verifiability and subjectivity. Similarly, it would complement the existing right to contest solely automated decisions and profiling with legal and significant effects (Art 22(3) GDPR), and potentially transform it from a merely procedural tool to a meaningful accountability mechanism (see section IV. E). The intention of an ex-post right to contest unreasonable inferences is, however, not to guarantee that a data controller must change its inference or assessment at the data subject’s request. Rather, it aims to establish a dialogue between data controllers and subjects in which the former share details and justifications for the proposed inferential processing which are open to comments and interrogation by the latter (Art 22(3)). This will be fruitful for both sides, as accurate assessment is in the interests of both parties. To achieve this, it will be necessary to redefine the purpose of data protection law (as suggested above) to include justification of assessments. Strengthening the position of the data subject in relation to controllers is necessary to sufficiently mitigate the novel risks of inferential analytics (see: Section I.A). Given the novel risks of Big Data analytics and algorithmic decision-making, inferences cannot justifiably remain ‘economy class’ personal data. Data subjects’ privacy interests require renewed protection to restore the fair balance between individual, public, and commercial interests that inspires data protection law. The current remit of data protection law works well to govern input data, but fails to provide meaningful control over how personal data is evaluated. A right to reasonable inferences is a first step to correct this imbalance.