Caching Model Outputs
Overview
Some AIDE models support caching their outputs to disk to avoid recomputing expensive transformations. This is made available with the CacheMixin
class, which is inherited by models that support caching. You can check if a model supports caching by checking if it inherits from CacheMixin
:
from aide_predict.bespoke_models.base import CacheMixin
assert isinstance(model, CacheMixin) # True if model supports caching
Using Caches
Caching is enabled by default for models that support it. To explicitly control caching:
from aide_predict import ESM2Embedding
# Disable caching
model = ESM2Embedding(use_cache=False)
# Enable caching (default)
model = ESM2Embedding(use_cache=True)
How It Works
Each protein sequence gets a unique hash based on its sequence, ID, and structure (if present)
Outputs are stored in HDF5 format for efficient retrieval
Cache also hashes the model parameters, so if model parameters change it will not use previous cache values
Stores metadata in SQLite for quick cache checking
Caches are stored in the model’s metadata folder
Models Supporting Caching
You can check if a model supports caching by checking if it inherits from CacheMixin
:
from aide_predict.bespoke_models.base import CacheMixin
isinstance(model, CacheMixin) # True if model supports caching
NOTE: When wrapping a new model, it is recommended that CacheMixin
be inherited first behind ProteinModelWrapper
. This ensures that the final model outputs after any processing conducted by other mixins is what get cached, preventing any unnecessary recomputation.
Cache Location
Caches are stored in a cache
subdirectory of the model’s metadata folder:
# Specify cache location
model = ESM2Embedding(metadata_folder="my_model")
# Creates: my_model/cache/cache.db (metadata)
# my_model/cache/embeddings.h5 (outputs)
# Random temporary directory if not specified
model = ESM2Embedding()