I was not sure what caused this, maybe it was the use of global variables in the clipit code or maybe not. When a second user makes a request to the server while the first task is still processing, somehow the second task will terminate the current process instead of creating a parallel process or queueing. However, I found that there was a concurrency problem when testing it out with multiple users. Now that everything seems to be working, I built the front end and shared the site with my friends. Image and video results emailed to the user. Additionally, FastAPI supports asynchronous functions and is said to be faster than Flask. The main reason is I find FastAPI is faster to write (less code) and it also auto-generates documentation (using Swagger UI) that allows me to test the API with basic UI. I’ve been using FastAPI instead of Flask to quickly build my API. This is when I decided to build an API instead. For example, things such as input validation, custom progress bar, etc are not possible with iframe. I initially tried this method but then realized one important drawback, I cannot personalize any parts that need to interact with the ML app itself. Then, I can embed the Gradio UI on the site using the iframe element. One quick solution I could think of is by creating my demo site separate from the Gradio UI. Creating a custom site with additional features like gallery, login, or even just custom CSS is fairly limited or not possible at all. However, Gradio only works well for demoing a single function.
You can read the source code for more information on the available options.
Additionally, you can also give clipit options such as how many iterations, width, height, generator model, whether you want to generate video or not, and many more. Simply change the text prompt with whatever you want. Once we installed the libraries, we can just import clipit and run these few lines of code to generate your art with VQGAN+CLIP. (NOTE: “!” is a special command in google Colab that means it will run the command in bash instead of python”)
from IPython.utils import io with io.capture_output() as captured: !git clone # !pip install taming-transformers !git clone !rm -Rf clipit !git clone !pip install ftfy regex tqdm omegaconf pytorch-lightning !pip install kornia !pip install imageio-ffmpeg !pip install einops !pip install torch-optimizer !pip install easydict !pip install braceexpand !pip install git+ # ClipDraw deps !pip install svgwrite !pip install svgpathtools !pip install cssutils !pip install numba !pip install torch-tools !pip install visdom !pip install gradio !git clone %cd diffvg # !ls !git submodule update -init -recursive !python setup.py install %cd. Next, we need to set up the codebase and the dependencies first. Steps to change Colab runtime type to GPU. There is even AudioCLIP which uses audio instead of images. Many variants of X + CLIP have come up such as StyleCLIP (StyleGAN + CLIP), CLIPDraw (uses vector art generator), BigGAN + CLIP, and many more. However, you can replace VQGAN with any kind of generator and it can still work really well depending on the generator. VQGAN+CLIP is simply an example of what combining an image generator with CLIP is able to do. CLIP Paper Explanation Video by Yannic Kilcher: CLIP paper explanation.DALL-E Explained by Charlie Snell: Great DALL-E explanations from the basics.The Illustrated VQGAN by LJ Miranda: Explanation on VQGAN with great illustrations.But if you want a deeper explanation on VQGAN, CLIP, or DALL-E, you can refer to these amazing resources that I found. I won’t discuss the inner working of VQGAN or CLIP here as it’s not the focus of this article. This is done throughout many iterations until the generator learns to produce more “accurate” images. The VQGAN model generates images while CLIP guides the process.