So, I will start by dropping a few keywords here, and what they are about, that will probably help you start, I am compiling this as a guideline on where to start
- VQGAN (Vector Quantized Generative Adversarial Network / neural network) : The software that generates the image
- CLIP (Contrastive Language-Image Pre-training / neural network) : Software to influence a generated image based on input text (User prompt)
- VQGAN+CLIP : Two neural network pieces of software that work in tandem.
- CLIP-Guided-Diffusion: A technique for doing text-to-image synthesis cheaply using pre-trained CLIP and diffusion models.
- Google colab notebook: A tool made by google where you can run python code and utilize google’s GPUs, both paid and free exist