hybrid_cache#

tpcp.caching.hybrid_cache(joblib_memory: Memory = Memory(location=None), lru_cache_maxsize: int | None | bool = False)[source]#

Cache a function using joblib memory and a lru cache at the same time.

This function attempts to be the best of both worlds and uses joblib.Memory to cache function calls between runs and a lru_cache to cache function calls during a run.

When the cached function is called, the lookup will work as follows:

  1. Is the function result in the lrucache? If yes, return it.

  2. Is the function result in the joblib memory? If yes, return it and cache it in the lru cache.

  3. Call the function and cache it in the joblib memory and the lru cache. Return the result.

It further solves one of the issues that you might run into with lru_cache, that it is difficult to create a wrapped function during runtime, as calling lru_cache directly will create a new cache for each call. We work around this by using a global cache that stores the wrapped functions. The cache key is a tuple of the function name and a hash of the function, the joblib memory and the lru_cache paras. This means, if you create a new cache with different cache parameters, you will get a new cache, but if you call staggered_cache with the same parameters, you will get the same object back.

You can access this global cache via the __cache_registry__ attribute of this function (staggered_cache.__cache_registry__).

Parameters:
joblib_memory

The joblib memory object that is used to cache the results. Memory(None) is equivalent to no caching.

lru_cache_maxsize

The maximum number of entries in the cache. If None, the cache will grow without limit. If False, no lru_cache is used.

Returns:
caching_decorator

A decorator that can be used to cache a function with the given parameters.

Examples

>>> import pandas as pd
>>> from tpcp.caching import hybrid_cache
>>> from joblib import Memory
>>>
>>> @hybrid_cache(Memory(".cache"), lru_cache_maxsize=1)
... def add(a: pd.DataFrame, b: pd.DataFrame):
...     return a + b
>>> df1 = pd.DataFrame({"a": [1, 2, 3]})
>>> df2 = pd.DataFrame({"a": [4, 5, 6]})
>>> add(df1, df2)

Examples using tpcp.caching.hybrid_cache#

Caching

Caching