[Python Tips] Caching data with CacheTools
My Python Tips Series
- f-strings in Python 3.6
- Underscores in numeric literals
- A better interactive shell
- Secrets Module - New in 3.6
- PEP 8
- Named Tuples
- Type Annotation
- Jupyter Notebooks
- Getting Help
- Virtual Environments
- Expiring Dict
- DRY Programming
- Knowing what exists
What is cachetools?
"This module provides various memoizing collections and decorators, including variants of the Python 3 Standard Library @lru_cache function decorator."
- Cachetools docs
Cachetools is a great tool when you need to cache frequently used data that has a high creation cost. A good example is in my Global Blacklist API where I fetch tribe data. I don't want to keep hitting the Scotbot API to get the data as it is more resource-intensive than just looking in ram and doesn't change often. By using cachetools I am able to cache the data for an hour and only refresh the data when it isn't in cache anymore or has never been accessed previously.
Cachetools is built on top of the expiring dictionary module.
How to use cachetools
I will walk you through the basic use of the module to get you started, if you have further questions or want to do something more advanced, I recommend checking out the docs.
You will need to make sure you have the cache tools package before using it.
pip install -U cachetools
This will install cachetools and update it if you already have it installed. I recommend using a Python environment but you can also install it globally.
Now that you have cachetools installed, you can import it into your code. For most applications, you will need cached and TTLCache.
from cachetools import cached, TTLCache
Configure your cache
You can have multiple caches, but you need at least one to take advantage of cachetools. The settings for your cache depend on your application and caching needs. If the data doesn't change often, you can typically get away with a longer time to live (TTL) unless it is critical to have up to date data. How much you benefit from caching will depend on how frequently you access the data, how expensive it is to get the data, and how often the data is cached when you require it.
cache = TTLCache(maxsize=100, ttl=3600)
This creates a new cache with a max size of 100 elements and a time to live of 3600 seconds (1 hour). If you use more than 100 elements, the older elements will be removed from the cache. Don't use my suggested settings here, decide based on your application needs.
The two things to consider is how many data elements will you typically need to cache and how expensive is it to get that data.
Decorate your functions
If you don't understand decorators, I recommend reading up on them to become familiar. In short, a decorate is syntactic sugar to wrap another function with a helper function.
Any function that returns a result you want cached, you can decorate with your cache.
It is really that simple. Let me give you an example.
@cached(cache) def load_data(): # run slow data to get all user data user_data = db.fetch_user_data() return user_data
If you have not run the function before for that particular user or the data is older than 1 hour, it will run the slow database query to fetch the data and return it. If the data is already cached and is not older than 1 hour, you will receive the data from the cache instead.
For the global blacklist, the data I gather takes 1-10 seconds in some cases and is only done once per hour thanks to caching. I use two forms of caching in the Global Blacklist API and cachetools is only used for collecting Tribes data, I use simple memory caching of variables for most of the data.
Putting it all together
from cachetools import cached, TTLCache cache = TTLCache(maxsize=100, ttl=3600) @cached(cache) def load_data(): # run slow data to get all user data user_data = db.fetch_user_data() return user_data