Can we have a scalable fastapi service with common cache ?

So as it goes, we were using FastAPI for one of the apps. Our app uses a lot of memory(for ml models).

Premise: I wanted to launch multiple workers of the app as Python is single-threaded and also be able to have a common cache across.

We can use Uvicorn for launching multiple workers of FastAPI. But Uvicorn doesn’t support preload option that is we wanted to load the main app only once and still have multiple workers.
So I had to look at gunicorn and as gunicorn is a wsgi server, we had to use worker type as uvicorn and launch FastAPI.

We can use preload option of gunicorn so that the app loads only once with multiple workers for handling the load. Check this.

Okk ! kool.

But I wanted to have a common data structure (cache) across all the workers, so I instead went with the multiple threads option with just one worker.

Oops, but if we use any worker class other than gthread, gunicorn ignores it as in this case, I had to use uvicorn worker for asgi interface between guvicorn and fastapi.

From here

Threads is only meaningful with the threaded worker. Every other worker type ignores that setting and runs one thread per process.

So, I cannot use threads.

Also if you are using an async framework such as FastAPI, using threads is a bit orthogonal.

Ok, can I use multiple workers with the preload option and have a common data structure which is loaded in the app as a module-level variable.

Oops, but as per this a mutable cache is not possible between workers.

With or without the preload option you will end up with one background thread in each worker because when a process forks it forks all its threads. Whether the threads are created before the fork or after does not matter. In both cases the processes are independent once forked and do not share data structures. If you populate the data at module load time, that initial data will be visible to every worker. Future modifications will not be because they happen in separate processes. To share memory between processes (workers) you need to use a construct for explicitly sharing memory (/dev/shm, filesystem, network cache, db, etc).

You might be able to do the below, but you cannot change the data in itself as a common data structure.

So it's not possible to have a common mutable cache across workers at least in a straightforward way unless you employ other techniques.

Just to end this, let's see how the preload option itself works.I came across the below blog which explains it well.

https://www.joelsleppy.com/blog/gunicorn-application-preloading/

P.S: References

https://github.com/tiangolo/fastapi/issues/2425
levelup.gitconnected.com/supercharging-pyth..
stackoverflow.com/questions/38425620/gunico..