Graphic photo where a robot hand touches digital dots.

Text and data mining (TDM) in the EU: What you need to know about copyright law and data analysis

Photo of Maryna Manteghi.
Maryna Manteghi

“Big Data” and TDM

We are producing more and more data every day, every minute, and every second. Much of this data is created online through different online platforms and services, such as social media (e.g., Facebook, Twitter, Instagram, and LinkedIn), e-commerce websites (e.g., Amazon and eBay), video streaming services (e.g., Netflix and YouTube), gaming platforms (e.g., Twitch, HitBox and Steam) and many others. The biggest problem is that most of this content (about 80 %) is unstructured and unsearchable. To analyze this “raw” data we need to use automated analytical techniques such as TDM (p.2) which can find “hidden” information (e.g., patterns and trends) within tons of digital data.

Everyone can benefit from it. For instance, TDM helps businesses improve products through social media analysis, pharma to find drug side effects from reports, and video streaming services to enhance movie recommendations based on users’ reviews. As a researcher, I also find it highly beneficial as TDM enables and speeds the analysis of tons of online publications.

Be careful: TDM might be illegal!

To mine, a computer needs to make copies of the collected data to be able to process and analyze it. But such data could be protected by copyright law if it is an original artistic, musical, literary or dramatic work created as a result of an intellectual process (e.g., books, music, photos, movies etc.). Copying of such works without authorization is not permitted (Article 2). So, to avoid legal issues, we should ask for permission or use copyright exceptions. The former solution is not practical because getting permission from copyright owners of all mined works would be difficult, costly and time-consuming. So, the latter option is better. I have found out that in the EU there are two specific copyright exceptions which allow the use of protected works for TDM under certain conditions.

EU Rules on TDM

The exceptions which are part of the Directive on copyright and related rights in the Digital Single Market regulate the use of TDM in the EU. The first exception (Article 3) benefits only research organizations and cultural heritage institutions which carry out TDM for the purpose of scientific research. The beneficiaries should also get so-called “lawful access” to materials which they want to analyze (e.g., through subscriptions or licenses). The second exception (Article 4) seems to be broader at first sight as it covers all types of beneficiaries and purposes. But in reality, it is quite restrictive. To use this exception, users must not only get “lawful access” to the content but also ensure that copyright holders have not prohibited the use of their data for TDM through licenses or for example in terms and conditions of their websites. If the use is reserved, users would have to pay twice to be able to mine –first to acquire “lawful access” to works and a second time to read and analyse digital data.

Conclusion

TDM has great potential and can benefit all fields. But the law regulating TDM in the EU could be better drafted. It is important to make copyright exceptions in articles 3 and 4 more user-friendly. For instance, it may be beneficial to extend the scope of the former provision by including more beneficiaries and using the broader term “research” instead of “scientific research”. The latter exception needs more clarity on the work of the “opt-out” mechanism. These changes would foster research and innovation.

Maryna Manteghi
The writer is a Doctoral Researcher at the Faculty of Law at the University of Turku, currently doing research on copyright and text and data mining. Her research interests include IPRs, AI and fundamental rights.

References:

Rayaprolu, “How Much Data Is Created Every Day in 2023?”, Feb 27, 2023, available at https://techjury.net/blog/how-much-data-is-created-every-day/#gref accessed May 3, 2023.

Jiawei Han, Jian Pei, Hanghang Tong, Data Mining: Concepts and Techniques, Elsevier 2012, available at http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf accessed May 4 2023.

Sean Flynn, Lokesh Vyas, “Examples of Text and Data Mining Research Using Copyrighted Materials”, March 6 2023, available at https://copyrightblog.kluweriplaw.com/2023/03/06/examples-of-text-and-data-mining-research-using-copyrighted-materials/ accessed April 21 2023.

Dame Wendy Hall, Jérôme Pesenti, “Growing the Artificial Intelligence Industry in the UK” available at (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/652097/Growing_the_artificial_intelligence_industry_in_the_UK.pdf accessed May 5 2023.

Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, OJ L 167, 22.6.2001, p. 10–19 available at https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=celex%3A32001L0029, accessed May 3 2023.

Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC, OJ L 130, 17.5.2019, p. 92–125, available at https://eur-lex.europa.eu/eli/dir/2019/790/oj, accessed May 7 2023.

Categories: Law

Kommentit on suljettu.