Skip to Primary Menu Skip to About OSR+ Menu Skip to OSR+ Support Menu Skip to Main Content

SupportAI Use

On Ownership

This one is about the claim: “All AI art is theft.” Or variations of it.

The claim needs unpacking, of course. My goal in unpacking the claim isn’t pedantry, or to wriggle out of the concept of theft by taking the word apart via semantics. 

When someone says “AI art is theft” they are talking about several things, but each of these things can have several meanings. “AI art” for one, which can refer to:

  1. An inference. An image or video output, made entirely with genAI tools, without human intervention beyond prompting. In other words, I typed something or pressed a button, and presto! the output came into existence.
  2. GenAI-assisted content. The end product of a creative process that involved genAI tools. That is, the image or video output is a mixture of things made by humans and things wholly generated by genAI tools, or an image or video generated by genAI tools and later refined by humans with other tools. (This is mostly what we do at OSR+.)
  3. A genAI model. The end result of training on a corpus of reference material: a large file called a model, out of which we inference generations. So if “AI art” here refers to the model, then we are talking about the training process itself, where the model ingests a bunch of data in the form of images in order to develop its own internal understanding of the patterns that make up those images.

Okay, so we have a general idea of what “AI art” refers to. Now on to theft:

  1. There’s “theft” in the legal sense. The unlawful deprivation of property. With it, comes the concept of ownership: that is, this thing is mine, and you are not allowed to have it. Or, in copyright law specifically, I created this thing, and I retain the rights to make copies of the thing, and to make other things based on it (among other rights). This is my derivative right.
  2. There’s theft in a more vague sense, something like, you have stolen economic potential from me. This is entangled with #1 with respect to copyright law, for sure, but I’m not a lawyer so bear with me. I just want to recognize here that sometimes people mean "theft" as in the ability to compete in the market.
  3. Then there’s “theft” in the moral sense, plain old stealing. It’s wrong to take things from people that are not yours.

The State of Affairs in Copyright Law

Too many words have been written already about #1, and what the state of affairs is with respect to copyright law surrounding genAI. So I’ll just summarize what we know already:

  • Images you produce from genAI without adding substantial human input cannot be copyrighted. So if I inference, the images I generate are just that: raw output from a model. That output can’t be copyrighted.
  • However “substantial human input” gives us some leeway, such that if I “substantially” alter that output, then it may qualify as copyrightable. I would imagine that most of the examples of non-slop I shared would fit the bill here. 
  • Lastly, the act of training a model on lawfully obtained copyrighted materials is not copyright infringement (see Bartz vs. Anthopic). That is, if you’ve legally obtained your training materials, it’s considered fair use to train a model on them. Therefore, it is very unlikely to be considered theft (per legal definition #1), since there is a fair use defense precedent to train a model on materials you do not have the derivative rights to, so long as you obtained those materials legally. That is, if I own or have a valid license to 500,000 books, I can train on them, and that’s not illegal. In fact, that’s the spirit of what Google Books did when they were scanning books in Author’s Guild vs. Google to create their searchable index. They partnered with research libraries, who had legal right to the books, and the court ruled that the incorporation of the scanned books in Google’s archive was a transformative fair use.
  • Which leads us to the other outcome of the Anthropic case: that it’s still illegal to download and store copyrighted material you don’t have a license to, even if your ultimate use case is to train on those materials. Anthropic will potentially settle for 1.5 billion dollars for training on a metric shit ton of copyrighted material which it obtained illegally. But one important distinction here: it is settling not because it trained on illegal materials, but because it downloaded copyrighted materials illegally. The act of training was deemed to be fair use.

So we know that training can be considered fair use. What remains then is that we don’t know whether a generation can be considered infringing. For example: if I produce a perfect replica of some specific artist’s work, or a chunk of code that is identical to the materials the model was trained on, is my generation infringing? 

Or more broadly: is every inference infringing, since ultimately the output relies on knowledge of the original material that the model learned to produce it?

This question is an active battleground.

We don’t yet have a sweeping court decision saying that all genAI outputs are theft (or not) in a legal sense. What we do have is guidance from the U.S. Copyright Office and Congress’s own lawyers that says what boils down to “We’ll know it when we see it.” Each generation would be evaluated under the normal four factors judges consider in copyright infringement cases, of which substantial similarity (the degree to which it takes from the protected expression of the original work) and market harm (the effect of the use on the potential market) become vitally important.

So Where Do I Stand on Theft?

To put it as plainly as possible:

I don’t think the use of genAI, when training models on copyrighted materials, or in inference, is theft. Unless, of course, you’re trying to replicate a specific artist’s work, with the intent to compete with them in the market. And that sort of use case, I think, would already be protected by the way the law currently works.

My view, more or less, is in alignment with what we already know to be the case thus far for #1 and #2, since potential economic harm in a vague sense is captured by the idea that if you produce work with genAI that fails the four tests for infringement, that includes causing the creator economic harm.

So the claim “All AI art is theft” rings false to me, at least with respect to the definition of theft in #1 and #2, across all definitions of AI art.

Which leaves us with #3. Is “All AI art theft” in a moral sense?

I don’t mean to spoil the surprise, but my answer is no. I could agree that “Some AI art” is theft, morally. But not all.

Let’s take a step back.

On Remix Culture

I’m reminded of how Lawrence Lessig discussed remix culture way back in the 2000s. He characterized the state of copyright law at the time of his writing as “as silly as a sheriff arresting an airplane for trespass” in Free Culture. Copyright law means well, but it’s always horribly behind the development of technology, and often becomes captured by corporate interests who seek to exploit it against the interests of the rest of us poors (e.g., Disney).

Here’s a TED talk that speaks to this idea concisely. (19 minutes long—watch it. It’s 18 years old, but relevant to our use of genAI today.)

Just as Lessig argues that copyright law is ill-equipped to deal with remix culture by analogy to the laws about the regulation of land in 1945 (where planes become trespassers because they didn’t clear the rights of landowners due to our laws being insufficient to deal with this new technology of flight), I argue that copyright law is ill-equipped to handle this new technology that is genAI.

Fining training companies a trillion dollars ($150,000 per work infringed under statutory damages) is as ridiculous as trying to negotiate payments for licenses to billions of copyright holders. If we were somehow able to negotiate the rights to each of those billions of people for say ChatGPT (which is physically impossible), and we negotiated only 1 penny per work, that would mean a payout of 1.5 billion dollars. If we relied on the WGA’s recommendation of 10 cents per word, then we’d be looking at 23 billion dollars. And that’s just for text. For the image models, if we did a conservative license per image of $1 to $20, that’s 5.8 to 29 billion dollars per model version. And these models are continuously training on new content.

Now, this is back of an envelope math that is probably wrong (again, I deal in words, not numbers), but the point is to illustrate that ought implies can when it comes to making moral calculations. 

So what, you say? These megacorporations can afford it. They make zillions of dollars off our information. In fact, they are more likely to settle (and some already have) for obscene sums in response to these legal challenges, because they can afford to. Why shouldn’t we force them to pay for each and every creative input when they consume to train these models then?

Because I think this view is shortsighted and sets a precedent that will destroy our ability to compete as individual creators.

GenAI is in its infancy. The public consciousness is mostly captured by megacorps with gigantic marketing budgets: Sora, ChatGPT, Midjourney, Gemini, Veo3 and so many others are what the public sees as the examples of genAI right now.

Not the open source tools that anyone who has a powerful enough video card can use.

In time, I believe these tools will mature and become more powerful. Even now, creators using local video generators can compete with creators who use commercial models, despite that the latter technology has insane compute behind it in the form of multi-million dollar data centers. How? Because with time and ingenuity, the open source hivemind optimizes everything. I’ve seen it happen firsthand, where an open source model is released and then almost immediately optimized so that it can run on inferior machines. This makes it possible for ordinary people like you and me to create using state of the art technology on our personal computers.

Lessig says in the video:

In the digital world, the one fact we can’t escape is that every single use of culture produces a copy. Every single use therefore requires permission, and without permission, you are a trespasser. There’s a growing extremism that comes from both sides in response to the law and the use of these technologies.

Sound familiar?

GenAI as Remix Culture

In my view, genAI is the new remix culture, and it’s even less egregious (from a copyright law perspective) than the remix culture Lessig was talking about in the 2000s.

Instead of redubbing footage from an anime with an incongruous but copyrighted track to make something funny and new, it’s taking an atomic fraction of a million and a half creators’ work, plus countless unrelated photographic and illustrative inputs attached to a plethora of licensing, from its training data to inform the production of something new. 

In the former example, two copyright holders are involved: whoever holds the rights to the anime, and whoever holds the rights to the music track. In the latter example, millions upon millions of creators' rights are involved.

Now I am not talking about consent or attribution here.

I’m arguing that AI training involves weaker, more diffuse, and less substitutive relationships to original works than remix culture, which directly reuses identifiable protected expression. Part of the magic trick in 2000s remix culture was in knowing what copyrighted materials were used derivatively in the creation of the remixed end product, and that is not lost on me. With genAI output, the exact influences are usually opaque, and that may feel unsettling if we don’t acknowledge that creatively, our goals in creating genAI output are different than in creating remixes.

When I prompt for Nicolas Delort and Edward Gorey and Rembrandt; apply a LoRA (low rank adaptation—a fine-tune of a genAI model) that replicates generic sketch art scanned from 50 photos by some random kid out of his personal sketchbook that he uploaded to Civitai; add a LoRA I created by scanning images from an art book I own to introduce a new concept into the model; use three random images of a postcard whose colors I liked via IPAdapter to adopt color grades; in-paint into the outputted image by hand using an in-painting model in ComfyUI; then bring it into Photoshop to do further manipulation and apply a generative fill, who should I be making a check out to and for how much?

What exactly have I taken away (in an economic sense) from the several million creators whose work contributed infinitesimal fragments of information to my generation, or from the artists Nicolas Delort and Edward Gorey and Rembrandt, or that kid who shared his sketchbook to create the LoRA, or the five postcard artists whose colorations I adopted, or the unknown number of creators whose work was involved in the training of the inpainting model and whatever black box model Adobe claims has been “ethically trained”?

Can that be quantified? Should it be quantified? 

As someone who rejects the moral reasoning of utilitarians, to me this kind of “sourcing” of ownership stinks of a chain of causality so long it stretches credulity into infinity.

Conversely: if you used a LoRA trained exclusively on say, the art of SamDoesArts, and your goal was to produce work that looks exactly like his work, and then to add insult to injury, you started competing with him in the marketplace, we’d be in total agreement that such a use was not only liable to fail the four tests, but also the moral one.

And so this is why I believe the claim “All AI art is theft” is false also in a moral sense, because it should be clear to you now that there is a difference between the hypothetical “theft” I’ve described above, and the “theft” I’ve posed with respect to SamDoesArts.

Quoting Justice Douglas, Lessig also says:

“The doctrine protecting land all the way to the sky has no place in the modern world, otherwise every transcontinental flight would subject the operator to countless trespass suits." "Common sense" (a rare idea in the law), "revolts at the idea.”

Are you sure?