Generating AI art using Stable Diffusion

Joy Bose
4 min readSep 7, 2022

--

Previously, I had tried generating AI art using Midjourney AI, with mixed results. Seems that midjourney gave good results for landscapes including furutistic ones, and was good at generating art in different styles such as picasso or van gogh or ghibli.

This time I tried yet another AI art generator called Stable Diffusion, which had its public release recently.

What is Stable Diffusion

Stable diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

The model is based on the arxiv paper available here.

How to use Stable Diffusion to generate art

The API is available here. One can simply go to this link and sign in with github and start using Stable Diffusion to generate art.

To use the stable diffusion API, one needs to have an account on github and sign in. One can then access the free version of the software and start generating art.

Interface of the Stable Diffusion AI art generator

Pricing in Stable Diffusion

They have both free plan (I used up my free allowance around maybe 60 images) and paid plans. For paid plans one needs to put their credit card information on stripe, and then it will bill you at the end of the month.

Pricing of Stable Diffusion Paid plans

Interface with choices in Stable Diffusion

The interface for generating images in Stable Diffusion has more AI related choices than that of midjourney. The options include, aside from the prompt (the text on which the image is generated), width, height, initial image, number of inference steps and so on.

Options for generating an image based on Stable Diffusion model

I went for the default options and changed only the prompts.

Examples of generated images with Stable Diffusion

The results for some of the images I generated (along with the prompts to generate them) are as follows:

Mehmet conquers Constantinople (Stable Diffusion)
Yoga on a Goa beach (Stable Diffusion)
Hypnotherapist and patient in trance (Stable Diffusion)
Cremation of a dead body on Ganges river (Stable Diffusion)

Comparison with Midjourney generated images

For comparison, using midjourney and similar prompts with default options, these were the results I got:

Yoga on a Goa beach (midjourney)
Cremation of dead body on Ganges river (midjourney)
Mehmet conquers Constantinople (midjourney)
Hypnotherapist and patient in trance (midjourney)

Comparison with DALL-E 2

DALL-E 2 (https://labs.openai.com) is another famous AI art generator, from OpenAI, the makers of GPT-3.

One can open a free (might have to join a waiting list first) or paid account with OpenAI to use the software. Once the account is created one may log in.

Main webpage for DALLE 2

After log in, one can fill in the prompt to generate the art.

Prompt with description of the art to be generated

DALL-E 2 then generates four images corresponding to the prompt.

Four images generated for the prompt “Yoga on Goa beach”

The images are pretty realistic, I found.

Below are the images for comparison for the same prompts previously generated on Stable Diffusion and midjourney.

DALL E 2 generated image for Yoga on Goa beach
DALL E 2 image for Hypnotherapist and patient in trance
DALL E 2 generated image for Cremation of dead body on Ganga river

For the prompt “Mehmet conquers Constantinople” however, I got an error that it does not follow the content policy.

Error from DALL E 2 for the prompt Mehmet Conquers Constantinople

The content policy for using DALL-E 2 seems to be a little more strict than Stable Diffusion and Midjourney.

Content policy for DALL E 2

DALL-E 2 I found to be little weaker for imaginative or futuristic images (such as time machine) and things like war or revolution. For such things, midjourney was the best.

DALL E 2 images generated for Design of a time machine

Conclusion

So as one can see, Stable diffusion creates more realistic images rather than more artsy ones in case of midjourney. It is better for images with people doing things.

Midjourney on the other hand is better for landscapes, especially futuristic ones. DALL E 2 seems to be great for amazingly realistic images, as long as they adhere to the content policy, but maybe not so good for imaginative or futuristic images.

All of these are exciting tools for creating art based simply on thoughts, something that people would have considered impossible not so long ago.

--

--

Joy Bose

Working as a software developer in machine learning projects. Interested in the intersection between technology, machine learning, society and well being.