Cache implementation using weakref
Fri 30 April 2021
Bird's cache (Photo credit: Wikipedia)
This article presents a quick example of how to use weakref to implement a home-made cache mechanism.
Let's use the following use case:
- Let's consider items:
- items can be stored on a storage
- items can be retrieved from storage
- items are identify by an ID
- All processing on items take as input an iterable over items
Items considered
As I'm lazy and I don't want to setup a database, I'll use Q-items and reuse some functions from a previous article. Items must expose methods to build from storage and to store into storage. I'll call them from_storage() and to_storage().
Let's consider a QItem. The from_database() takes the item from an external resource (wikidata). The to_storage() is a dummy function as we don't want to modify wikidata. In real life, this function should check the storage and either store the item or update it if it exists.
class QItem:
def __init__(self, q_num):
self.q_num = q_num
self.values = {}
def from_storage(self):
"""Build item from storage"""
self.values = wikidata_to_dict(get_item(self.q_num))
def to_storage(self):
"""Store this item to storage.
If item already exists in storage, update it
"""
# nothing to do here, we won't try to modify wikidata
print(f"storing {self.q_num}")
def iter_properties(self):
"""Whatever method to pretend this item is not useless."""
for k, v in self.values.items():
yield {k: v}
def get_any_property(self):
"""Whatever method to pretend this item is not useless."""
return next(self.iter_properties())
Cache implementation
Let's consider a collection of items. This collection consist of all items IDs and a dict of items. The dict acts as a cache. If the item is in the dict, then it is returned, otherwise it is build from the storage (based on its ID) and put into the dict. Memory freeing is handle by weakref. The set of ids is used to keep all dict keys that have been put in the dict.
from weakref import WeakValueDictionary
class WikidataCollection:
def __init__(self):
self.items = WeakValueDictionary()
self.ids = set()
def get(self, q_num):
if q_num not in self.ids:
raise ValueError("unknown item")
try:
# item in cache
return self.items[q_num]
except KeyError:
# get item form elsewhere (e.g.database)
q_item = QItem(q_num)
q_item.from_storage()
self.items[q_num] = q_item
self.ids.add(q_num)
return q_item
def set_item(self, q_num, q_item):
"""Add q_item with id q_num in cache"""
q_item.to_storage()
self.items[q_num] = q_item
self.ids.add(q_num)
def iter_items(self):
for q_num in self.ids:
yield self.get(q_num)
I tried to keep this implementation as simple as possible in order to be able to adapt it as easily as possible to other objects.
Now let's use it. First, let's create a collection and populate it:
from time import sleep
def get_qitem(q_num):
qitem = QItem(q_num)
qitem.from_storage()
sleep(1) # don't overload wikidata
return qitem
q_collection = WikidataCollection()
# populate collection
for num in range(42, 56):
if num in (47, 50):
continue
q_num = "Q" + str(num)
q_collection.set_item(q_num, get_qitem(q_num))
Now you can iterate over qitems belonging to the q_collection. Note that the WeakValueDict q_collection.items can have less items than the set of ids q_collection.ids.
[qitem.get_any_property() for qitem in q_collection.iter_items()]