Where does Midjourney get its Art/Data? Is it Copyrighted?

The artists and the copyright law have been under scrutiny since AI art came into being. This article throws light on Midjourney’s dataset training, copyright regulations, and what the U.S. Copyright Office has to say about this. 

Let us get down to it and analyze everything from scratch. 

Free Midjourney AI produces a proprietary artificial intelligence. Man holding a smartphone iPhone Stock Photo

Where does Midjourney get its Art/Data from?

For generating AI art, millions of datasets are trained for the AI model to understand the input and process it accordingly. This text-to-image AI bot developed for Discord users uses an advanced machine-learning algorithm. 

David Holz, the platform’s founder, stated that hundreds of millions of images were used to train their AI model, and the image can be from anywhere. However, in an interview, he stated that it is difficult to trace back to the owner to whom the image belongs. 

For every AI model training, millions of datasets are used that are available on the Internet. These datasets are primarily open to all and are something every AI engineer uses. So, this is where the creators of the images, like artists and photographers, feel violated that the pictures are being trained for several AI projects without their consent.  

An ethical way would be to use open dataset banks solely for model training. For example, the Harvard Library APIs & Datasets are used by many. However, thousands of images will be available but are still limited for platforms like Midjourney. 

Are the images used to make Midjourney copyrighted?

So far, it has been observed that the images used for training Midjourney’s model have created a hue and cry amongst creators, who took to Twitter to share their views. Some artists felt “unjust” and “robbed” since the images were not copyrighted. Many claim that many projects like this one use up images for their datasets without their consent.

In an interview with Forbes, when asked if he sought consent from living artists, David Holz said, “No. There isn’t really a way to get a hundred million images and know where they’re coming from. It would be cool if images had metadata embedded in them about the copyright owner or something. But that’s not a thing; there’s not a registry. There’s no way to find a picture on the Internet and then automatically trace it to an owner and then have any way of doing anything to authenticate it.

The statement made conspicuously was something quite unsettling amongst creators.

The Databases and the United States Legal Code state that:

“In the United States, facts by themselves are not protected by copyright. Therefore, data, as a collection of facts, is not protected by U.S. copyright law. Databases as a whole can be protected by copyright as a compilation, but only under certain conditions. The first is that mere collection of data is not enough. The arrangement and selection of data must be sufficiently creative or original.”

Can you claim Copyright over images generated on Midjourney?

Midjourney has a provision to copyright images created, provided the user has upgraded and uses a paid plan. There are specific clauses to this which have been explained in detail in this article.

If you want to claim copyright on your Midjourney creations, you can do so as long as you are not a free user.

Will you face any Copyright issues while using Midjourney images?

The company is very clear about its Terms of Service and has clarified that only paid members can apply for copyright, and free users are not eligible to do so. This is because Midjourney holds the copyright for all the images created with its free account. If this is not followed, a person can be taken to court.

The U.S. Copyright Office states that owning AI art is based on authorship. Since AI art generators like Midjourney are viewed as machines, and only humans can be the authors of an artwork, copyright is dependent on authorship.


I. What is Midjourney’s monthly revenue?

According to Ebersweiler, an early beta tester, Midjourney had crossed the $1 million monthly revenue mark by the conclusion of the previous year. The launch of Midjourney coincided with a crucial juncture in Discord’s journey.

II. Does Midjourney steal Art?

In the past, Midjourney has openly acknowledged employing a methodology similar to its competitors, involving web scraping to obtain both images and accompanying text descriptions. This involved harnessing millions of publicly available images for training purposes. Notably, a prevalent technique used by most AI image generators, characterized as diffusion, is employed for the generation of these images in the public source code. Calling this “stealing” is still in grey area.

III. How does Midjourney work?

Midjourney employs a sophisticated approach where it takes your textual description and channels it through a complex Machine Learning (ML) algorithm. This intricate algorithm is designed to analyze, interpret, and generate content based on the input text, providing a refined and contextually relevant output. This process showcases the advanced capabilities of Midjourney’s AI technology in transforming text into visually appealing and informative content.