Caching Model Outputs

Overview

Some AIDE models support caching their outputs to disk to avoid recomputing expensive transformations. This is made available with the CacheMixin class, which is inherited by models that support caching. You can check if a model supports caching by checking if it inherits from CacheMixin:

from aide_predict.bespoke_models.base import CacheMixin
assert isinstance(model, CacheMixin)  # True if model supports caching

Using Caches

Caching is enabled by default for models that support it. To explicitly control caching:

from aide_predict import ESM2Embedding

# Disable caching
model = ESM2Embedding(use_cache=False)

# Enable caching (default)
model = ESM2Embedding(use_cache=True)

How It Works

  • Each protein sequence gets a unique hash based on its sequence, ID, and structure (if present)

  • Outputs are stored in HDF5 format for efficient retrieval

  • Cache also hashes the model parameters, so if model parameters change it will not use previous cache values

  • Stores metadata in SQLite for quick cache checking

  • Caches are stored in the model’s metadata folder

Models Supporting Caching

You can check if a model supports caching by checking if it inherits from CacheMixin:

from aide_predict.bespoke_models.base import CacheMixin

isinstance(model, CacheMixin)  # True if model supports caching

NOTE: When wrapping a new model, it is recommended that CacheMixin be inherited first behind ProteinModelWrapper. This ensures that the final model outputs after any processing conducted by other mixins is what get cached, preventing any unnecessary recomputation.

Cache Location

Caches are stored in a cache subdirectory of the model’s metadata folder:

# Specify cache location
model = ESM2Embedding(metadata_folder="my_model")
# Creates: my_model/cache/cache.db (metadata)
#         my_model/cache/embeddings.h5 (outputs)

# Random temporary directory if not specified
model = ESM2Embedding()