Previously, I had tried generating AI art using Midjourney AI, with mixed results. Seems that midjourney gave good results for landscapes including furutistic ones, and was good at generating art in different styles such as picasso or van gogh or ghibli.
This time I tried yet another AI art generator called Stable Diffusion, which had its public release recently.
What is Stable Diffusion
Stable diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
The model is based on the arxiv paper available here.
How to use Stable Diffusion to generate art
The API is available here. One can simply go to this link and sign in with github and start using Stable Diffusion to generate art.
To use the stable diffusion API, one needs to have an account on github and sign in. One can then access the free version of the software and start generating art.
Pricing in Stable Diffusion
They have both free plan (I used up my free allowance around maybe 60 images) and paid plans. For paid plans one needs to put their credit card information on stripe, and then it will bill you at the end of the month.
Interface with choices in Stable Diffusion
The interface for generating images in Stable Diffusion has more AI related choices than that of midjourney. The options include, aside from the prompt (the text on which the image is generated), width, height, initial image, number of inference steps and so on.
I went for the default options and changed only the prompts.
Examples of generated images with Stable Diffusion
The results for some of the images I generated (along with the prompts to generate them) are as follows:
Comparison with Midjourney generated images
For comparison, using midjourney and similar prompts with default options, these were the results I got:
Comparison with DALL-E 2
DALL-E 2 (https://labs.openai.com) is another famous AI art generator, from OpenAI, the makers of GPT-3.
One can open a free (might have to join a waiting list first) or paid account with OpenAI to use the software. Once the account is created one may log in.
After log in, one can fill in the prompt to generate the art.
DALL-E 2 then generates four images corresponding to the prompt.
The images are pretty realistic, I found.
Below are the images for comparison for the same prompts previously generated on Stable Diffusion and midjourney.
For the prompt “Mehmet conquers Constantinople” however, I got an error that it does not follow the content policy.
The content policy for using DALL-E 2 seems to be a little more strict than Stable Diffusion and Midjourney.
DALL-E 2 I found to be little weaker for imaginative or futuristic images (such as time machine) and things like war or revolution. For such things, midjourney was the best.
Conclusion
So as one can see, Stable diffusion creates more realistic images rather than more artsy ones in case of midjourney. It is better for images with people doing things.
Midjourney on the other hand is better for landscapes, especially futuristic ones. DALL E 2 seems to be great for amazingly realistic images, as long as they adhere to the content policy, but maybe not so good for imaginative or futuristic images.
All of these are exciting tools for creating art based simply on thoughts, something that people would have considered impossible not so long ago.