A settlement has been reached in one of the major court cases where authors have sued AI program makers and their large language models and chatbots. In this case, a class action suit was brought on behalf of authors against Anthropic. Anthropic produces Claude, a chatbot similar to ChatGPT but with certain differences, including a claim to be more ethical. Since they were using authors stolen works to train their model without permission, perhaps instead of calling it an ethical model we will say less unethical.
These programmers create their large language models using huge amounts of data gathered from places like the internet and books. This is how, when you ask them a question, and there's an unlimited number of questions you might ask, they always have an answer, and they give it to you in a matter of seconds. They had to do a lot of learning, and a major source of that data is millions, literally millions of books.
None of these modelers pay anything to the authors. The writers feel abused. They do all the work, and then these companies make lots of money off the information they created. Now, it must be pointed out that they are not providing users all the content in a book, just small portions at a time. Still, many people are only looking for a small amount of the content in each book. In the past, they might have had no choice but to buy the books and give the author their royalties, but now that is no longer necessary.
Anthropic may not be a household name, but they are big. They are backed by Amazon and Google, and some of their searches make use of data from Claude. For the record, Google (technically Alphabet) and Amazon are two of the five largest companies in the world, with values between $2-$3 trillion dollars. Anthropic is big time.
A few weeks ago, the authors announced they had reached a settlement with Anthropic. They said, “this landmark settlement will be the largest publicly reported copyright recovery in history, larger than any other copyright class action settlement or any individual copyright case litigated to final judgment.” The amount was $1.5 billion, with a “b.” That must be a major victory for the authors.
Not so fast. The amount is not as big as it may seem, and the outcome of the case does not bode all that well for authors long term.
As far as the $1.5 billion, it should be noted that Anthropic just last month raised another $13 billion in financing. The company is now valued at $183 billion. That $1.5 billion is a drop in the bucket, especially considering the amount of data they were able to obtain for it, and the fact that they were able to get away with doing it illegally. It actually sounds like a bargain now. So there is no need to have any sympathy for Anthropic. Considering the company is only four years old, it has done alright for itself. It was founded by a pair of brothers who previously worked for OpenAI, creator of ChatGPT.
The reason this is not an overwhelming victory for the authors is the nature of the copyright violation. The court ruled that Anthropic had a legal right to use the authors work and provide their answers based on what they wrote. That is not a copyright violation because it is within the “fair use” exception to copyright law. That is what allows you to write an article (or a student to prepare a report) using information they found in books, maybe even using a direct quote. It would be very hard to spread knowledge to others if you could not repeat anything you learned from reading books or watching/listening to television or radio. Add to that what is (sadly) now the largest source of “information,” unfortunately often false - the internet. It is hard to imagine a world in which information cannot be passed around, and the enormous usefulness of these chatbots (we all use them) has become a prime means of doing so. For this reason, I think it would be very difficult for judges to put a stop to chatbots using others' works, even though the chatbots have made it a lot easier for everyone to learn data inside some author's copyrighted book.
When the court reached its verdict in June that led to this settlement, it found three issues. The first was whether Anthropic's using the copyrighted works in this limited manor was legal fair use and the court said yes. Next it tackled the issue of how Anthropic came upon the books. One way was downloading them from “pirate” sites. These are sites located in unknown foreign countries that illegally copy entire books which they then make available in their entirety to others, usually free. They move around and change urls to make it hard for authorities to find them. Anthropic (and others) used books from these pirate sites to create their large language models. This court and others have repeatedly found using these in effect stolen books to train their models is illegal. They had no right to these books.
Then there is the third issue. These were books that Anthropic purchased legally but then made digital copies of them so they could easily be saved and searched. The court ruled that this was legal, being legal “transformative” use. They were just changing the format of a book legally obtained for greater convenience. The authors and publishers lost no sales this way.
The outcome was that using the pirated books was the only count on which the authors won. The judge did not set an amount that Anthropic had to pay, but the two sides reached an agreement between themselves - $1.5 billion. There are several other AI vs. authors and publishers cases out there using pirated books and the AI chatbots are losing them all. Those defendants better be prepared to pay. However, going forward, if this precedent is followed by other courts, the chatbot producers now know how to gain access to books whether the authors are happy or not - buy a copy. Not addressed is whether they could borrow books from a library and copy those. Even if not, there is no obvious reason why they can't buy used copies. There are millions of those everywhere, library fairs, AbeBooks, Better World Books, collectors and libraries with unsalable, essentially worthless copies. Owners are happy to have you cart them away for nothing. Those are all legally obtained books. Usually the most popular books will have cheap used copies available, later editions printed in thousands or hundreds of thousand of copies. The authors will get nothing if the chatbots use copies. It's hard for an author to take on a $183 billion company.
Note: A group of authors recently sued Apple over the same issue. The complaint cites both the question of using copyrighted works for training AI models and for the use of pirated books (the issue on which Anthropic lost).