Machine Learning & EU Data Sharing Practices

New multidisciplinary Stanford Law School research article: ‘Machine Learning & EU Data Sharing Practices’, published in Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 1/2020.

 

The article connects the dots between intellectual property (IP) on data, data ownership and data protection (GDPR and FFD), in an easy to understand manner. It also provides AI & Data policy and regulatory recommendations to the EU legislature.

As we all know, machine learning & data science can help accelerate many aspects of the development of drugs, antibody prophylaxis, serology tests and vaccines.

In an era of exponential innovation, it is urgent and opportune that both the Trade Secrets Directive, the Copyright Directive and the Database Directive shall be reformed by the EU Commission with the data-driven economy in mind. A right to machine legibility that drastically improves access to data, will greatly benefit the growth of the AI ecosystem.

Blog: https://airecht.nl/blog/2020/machine-learning-eu-data-sharing-practices

 

IP: Hand-labelled, annotated training datasets (corpora) are a sine qua non for supervised machine learning. But what about intellectual property (IP) and data protection? The article offers three solutions that address the input (training) data copyright clearance problem and create breathing room for AI developers. A right to machine legibility that drastically improves access to data, will greatly benefit the growth of an AI ecosystem.

Autonomously generated non-personal data should fall into the public domain. The article argues that strengthening and articulation of competition law is more opportune than extending IP rights. In an era of exponential innovation, it is urgent and opportune that both the TSD, the CDSM and the DD shall be reformed by the EU Commission with the data-driven economy in mind.

Both data sharing practices and AI-Regulation are high on the EU Commission’s agenda. The article discusses important European initiatives in the field of open data and data sharing.

Data protection: More and more datasets consist of both personal and non-personal machine generated data. Both the General Data Protection Regulation (GDPR) and the Regulation on the free flow of non-personal data (FFD) apply to these ‘mixed datasets’. Based on these two Regulations, data can move freely within the European Union.

Besides the legal dimensions, the article describes the technical dimensions of data in machine learning. Most AI models need centralized data. Federated learning, in contrast, trains algorithms by bringing the code to the data, instead of bringing the data to the code. Data sharing is not required.

Society should actively shape technology for good. The alternative is that other societies, with perhaps different social norms and democratic standards, impose their values on us through the design of their technology.

SSRN version: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3409712