• Home
  • About Us
    • About Us
    • Subscribe
    • Privacy Policy
  • Advertise
    • Advertise On IP Watch
    • Editorial Calendar
  • Videos
  • Links
  • Help

Intellectual Property Watch

Original news and analysis on international IP policy

  • Copyright
  • Patents
  • Trademarks
  • Opinions
  • People News
  • Venues
    • Bilateral/Regional Negotiations
    • ITU/ICANN
    • United Nations – other
    • WHO
    • WIPO
    • WTO/TRIPS
    • Africa
    • Asia/Pacific
    • Europe
    • Latin America/Caribbean
    • North America
  • Themes
    • Access to Knowledge/ Open Innovation & Science
    • Food Security/ Agriculture/ Genetic Resources
    • Finance
    • Health & IP
    • Human Rights
    • Internet Governance/ Digital Economy/ Cyberspace
    • Lobbying
    • Technical Cooperation/ Technology Transfer
  • Health Policy Watch

The Dilemma Of Fair Use And Expressive Machine Learning: An Interview With Ben Sobel

23/08/2017 by Intellectual Property Watch Leave a Comment

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to print (Opens in new window)

The views expressed in this article are solely those of the authors and are not associated with Intellectual Property Watch. IP-Watch expressly disclaims and refuses any responsibility or liability for the content, style or form of any posts made to this forum, which remain solely the responsibility of their authors.

By Elise De Geyter for Intellectual Property Watch

Intellectual Property Watch recently conducted an interview with Ben Sobel, law and technology researcher, teacher, and fellow at Harvard University’s Berkman Klein Center for Internet and Society. Sobel has focused his research on copyright and the fair use doctrine, in particular in the context of artificial intelligence (AI). Below, he shares his views on expressive machine learning, “the fair use dilemma” and “Big Content versus Little Users”. Of note: the most pressing copyright question has to do with AI readers, not AI authors, according to Sobel.

Intellectual Property Watch (IPW): Copyright laws have always been challenged by the development of new technologies. To what extent are the challenges imposed by artificial intelligence different from the challenges faced by copyright before?

BEN SOBEL (SOBEL): Today’s AI is getting better at generating things that resemble works of human authorship: prose, images, music, movies, and the like. The question on everyone’s mind is, “who ought to own these?” “Can a computer program be an ‘author’ for the purposes of copyright law?” These are intriguing problems, but not novel ones: we ask similar questions whenever a new technology alters or attenuates an author’s role in the creative process. In the late 1800s, the United States Supreme Court considered whether a photograph could be a copyrightable work of authorship, rather than just a mechanical recording of facts about the world. In the 1980s, US Courts of Appeals evaluated who “authors” images of a video game that are generated by software in response to a player’s input. And IP scholars have been writing about how to treat output generated by an artificial intelligence for at least 30 years.

Ben Sobel

What’s overlooked is that before today’s AI can create anything, it has to learn from works made by human beings. This technique, called machine learning, lets computers learn to mimic or find patterns in input data. Training an AI typically requires making copies of the data on which it will be trained, and sometimes, copyrighted works are used to train AI without the permission of the rightsholders. This is presumptively copyright infringement unless it’s excused by something like fair use.

In some ways, machine learning looks a lot like other projects that involve large-scale, unauthorized reproduction of copyrighted works by computers. Projects like these—think image search engines and Google Books—have historically been deemed fair use in the United States. This is often because the uses are what some scholars call “non-expressive:” they analyse facts about works instead of using authors’ copyrightable expression.

Training an AI typically requires making copies of the data on which it will be trained, and sometimes, copyrighted works are used to train AI without the permission of the rightsholders. This is presumptively copyright infringement unless it’s excused by something like fair use.

I’m not certain that this rationale can protect emerging applications of machine learning. More than ever before, machine learning can take expression in works, precisely what copyright protects, and cobble it into something that companies hope to use in commerce. A good example is a recent Google project that taught an AI to write more conversationally by feeding it thousands of romance novels. Sometimes, these creative AI programs are even designed to compete with human creators at expressive tasks, like composing music or writing news stories. If expressive machine learning threatens to displace human authors, it seems unfair to train AI on copyrighted works without compensating the authors of those works.

So, to me, the most pressing question doesn’t have to do with AI authors, it has to do with AI readers. When humans copy without authorization, it’s infringement. When does robotic consumption become expressive enough and/or commercially significant enough that it, too, is infringing unless authorized—and what will we do about it?

To me, the most pressing question doesn’t have to do with AI authors, it has to do with AI readers. When humans copy without authorization, it’s infringement. When does robotic consumption become expressive enough and/or commercially significant enough that it, too, is infringing unless authorized—and what will we do about it?

IPW: How will the concept of fair use in the US be challenged by artificial intelligence?

SOBEL: We’re approaching what I call a “fair use dilemma,” because, in the context of commercial, expressive machine learning, no outcome seems desirable. If expressive machine learning weren’t fair use, an author could seek outsize remedies simply because her work ended up in a training dataset among thousands of other works. This would be a huge obstacle to the progress of a valuable technology.

Then again, if fair use gave companies carte blanche to train AI on copyrighted works without compensating authors, human creators would miss out on income that the spirit, and arguably the letter, of copyright law entitle them to receive. This would be a boon for AI and for those who stand to profit from it, but it’s not clear that society as a whole would benefit. A hyper-literate AI would be more likely to displace humans in creative jobs, and that could exacerbate the income inequalities that many people fear in the AI age.

IPW: Should the unequal power relationship between small creators and big AI enterprises influence the interpretation of fair use?

SOBEL: Absolutely. First, a disclaimer: I don’t mean to suggest that fair use must have a redistributive outcome, or that authors are entitled to compensation from any use of their work. Fair use should facilitate innovation, and it’s fine if that innovation proves to be lucrative for the innovators. But as machine learning and AI expand, we should think carefully about what fair use ought to subsidize.

The rhetoric around fair use often depicts “rightsholders” as powerful, incumbent companies and “users” as private individuals or scrappy startups with limited resources. Big Content versus Little Users may have been the paradigm in a previous decade, but I’m not sure it describes the present day. The internet’s most powerful companies are not, primarily, content owners; rather, they are platforms for user-generated content that make money by collecting users’ data and displaying ads. This means that ordinary people are the rightsholders to troves of copyrighted content—wall posts, emails, pictures, videos, music, etc.—that they license to platforms by accepting websites’ Terms of Use. In the platform economy, Big Users tend to have more power than Little Content.

Ordinary people are the rightsholders to troves of copyrighted content—wall posts, emails, pictures, videos, music, etc.—that they license to platforms by accepting websites’ Terms of Use. In the platform economy, Big Users tend to have more power than Little Content.

This economic reordering should influence our views of fair use. Fair use exists to foster free speech, research, innovation, and other socially beneficial activities—not to subsidize powerful companies that already have access to licensed data pursuant to their Terms of Use. Because of this, while I do think fair use will excuse expressive machine learning done for academic research or some artistic purposes, I’m less confident that it will protect companies that train commercial AI on the expressive aspects of copyrighted works, without the permission of those works’ authors.

IPW: Can you elaborate on the distinction between expressive use and non-expressive use and between low and high expressive engagement in the context of AI?

SOBEL: “Non-expressive” versus “expressive” use is a distinction that copyright scholars—most notably Matthew Sag and James Grimmelmann—devised to describe how courts have handled fair use claims that involve large-scale copying by computers. It’s premised on the idea that copyright protects engagement with an author’s expression, but it doesn’t give authors the right to control facts about their works (that is, the non-expressive elements of their works). When Google Books tells you where and how many times a particular keyword appears within a book, it provides a fact about that book. A use like this is therefore non-expressive, even though it involves wholesale copying without authorization.

Some machine learning, by my lights, clearly makes non-expressive fair use of input data. Facial recognition is a good example. Though training a facial recognition AI may require copying lots of copyrighted photographs, the information being used has nothing to do with photographers’ expressive choices and everything to do with matching facts about the subjects’ identity with facts about their physical appearance.

But now that machine learning is getting more sophisticated and its applications more varied, uses of input data seem more and more expressive. When an AI learns to write better prose by reading prose, or how to generate catchy melodies by listening to music, those uses come much closer to copyright-protected interests than a Google Books keyword search does. I’m not sure how the doctrine will evaluate these uses, but I doubt the label “non-expressive” ought to apply. Given that AI is engaging more and more with human expression—in the manner that we assume human readers always do—it seems strange that we would give AI free reign to consume copyrighted works in ways that would be infringement if done by humans. I can’t download an infringing copy of an album just because listening to it will help me write better music in the future.

IPW: Should there be distinct standards for originality or infringement with respect to works created by machine?

SOBEL: In some ways, today’s AI technology makes the question of “independent creation” and infringement easier to evaluate. There’s no way to track every single copyrighted work that a human author encounters in her lifetime, and copyright doctrine has developed convoluted proxies to determine when one author is likely to have copied from another. With machine learning, however, training data could be easily catalogued. We could determine what works an AI has and has not seen. It’s not clear what should be done after that point, though. Say an AI generated a novel without human oversight, and that novel infringed a pre-existing work—who ought to be liable for that infringement?

AI would be great at generating merely “novel” works (that is, works that haven’t existed before), but “original” works raise more difficult issues, because originality typically requires a small amount of creativity. Whether or not a computer program could impart that creativity is a thorny—and, as I understand it, unresolved—question of philosophy and semantics.

AI would be great at generating merely “novel” works (that is, works that haven’t existed before), but “original” works raise more difficult issues

IPW: Is there a way out of the fair use dilemma you have described?

The dilemma of expressive machine learning is serious enough that it may prompt us to revise doctrine and policy for the better. Many people are calling for changes in law and policy that promote social equity in the AI age, like universal basic income and “robot taxes.” Unexpectedly, a faithful interpretation of today’s copyright doctrine, paired with some higher-level compromises, could promote distributive justice in a similar way: by compensating the creators whose expression gives artificial intelligence some of its intelligence.

Ben Sobel is a Fellow at Harvard University’s Berkman Klein Center for Internet & Society.

Elise De Geyter obtained the LLM Intellectual Property and Technology Law at the National University of Singapore (class 2017). She has a particular interest in intellectual property policies and new technologies and was an intern at Intellectual Property Watch.

 

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to print (Opens in new window)

Related

Creative Commons License"The Dilemma Of Fair Use And Expressive Machine Learning: An Interview With Ben Sobel" by Intellectual Property Watch is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Filed Under: Features, Inside Views, IP Policies, Language, Themes, Venues, Access to Knowledge/ Education, Copyright Policy, English, Information and Communications Technology/ Broadcasting, North America, Regional Policy

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Email
  • Facebook
  • LinkedIn
  • RSS
  • Twitter
  • Vimeo
My Tweets

IPW News Briefs

Saudis Seek Alternative Energy Partners Through WIPO Green Program

Chinese IP Officials Complete Study Of UK, European IP Law

Perspectives on the US

In US, No Remedies For Growing IP Infringements

US IP Law – Big Developments On The Horizon In 2019

More perspectives on the US...

Supported Series: Civil Society And TRIPS Flexibilities

Civil Society And TRIPS Flexibilities Series – Translations Now Available

The Myth Of IP Incentives For All Nations – Q&A With Carlos Correa

Read the TRIPS flexibilities series...

Paid Content

Interview With Peter Vanderheyden, CEO Of Article One Partners

More paid content...

IP Delegates in Geneva

  • IP Delegates in Geneva
  • Guide to Geneva-based Public Health and IP Organisations

All Story Categories

Other Languages

  • Français
  • Español
  • 中文
  • اللغة العربية

Archives

  • Archives
  • Monthly Reporter

Staff Access

  • Writers

Sign up for free news alerts

This site uses cookies to help give you the best experience on our website. Cookies enable us to collect information that helps us personalise your experience and improve the functionality and performance of our site. By continuing to read our website, we assume you agree to this, otherwise you can adjust your browser settings. Please read our cookie and Privacy Policy. Our Cookies and Privacy Policy

Copyright © 2022 · Global Policy Reporting

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.