CcdCache

class openff.pablo.ccd.CcdCache(library_paths: Iterable[Path | str], cache_path: Path | str | None = None, preload: list[str] = [], patches: Iterable[Mapping[str, Callable[[ResidueDefinition], list[ResidueDefinition]]]] = {}, extra_definitions: Mapping[str, Iterable[ResidueDefinition]] = {}, auto_download: bool = False)[source]

Bases: Mapping[str, tuple[ResidueDefinition, …]]

Caches, patches, and presents the CCD as a Python Mapping.

This class is a wrapper around a dict that stores residue definitions. When a residue is requested via the indexing syntax (for example, my_ccd_cache["ALA"]), this dictionary is checked first. If the residue is not present and the auto_download attribute is truthy, the CCD is then checked. If the residue cannot be retrieved from the inner dict or the CCD, a KeyError is raised.

Iterating over the mapping, checking its length, using the in operator, or otherwise treating the CcdCache as a mapping other than with the indexing syntax works only on the inner dict. As a result, accessing a residue via indexing may return a value even if these other methods suggest it won’t.

CcdCache can apply patches to the entries it downloads from the CCD. This is used to work around known errors, deficiencies and inconsistencies in the CCD definitions. Patches are specified as functions that take a single residue definition and return a list of them.

The extra_definitions and the with_() and with_replaced methods allow custom definitions to be added to a CcdCache. These custom definitions are not patched. Since they are stored alongside cached entries in the inner dictionary, custom definitions supercede any that have not already been downloaded from the CCD.

When a CCD entry is downloaded, the corresponding CIF file is stored in the cache_path. This means that each entry will be downloaded only once, even across multiple Python invocations. All entries in the cache are loaded (and patches applied) when a new CcdCache is created. CcdCache assumes that the files in the cache path were downloaded from the CCD and may do unexpected things if they are edited by hand.

Users may provide additional CCD entries by specifying library paths. By default, this is used to ship commonly used residues with Pablo. At the moment, patches are applied to files in library paths, but it is likely that in the future they won’t be and residues shipped with Pablo will be shipped pre-patched to speed up load times. Like with the cache, files from library paths are loaded when a CcdCache is created.

Accessing the CCD requires internet access. Without internet access, entries from the cache or library paths can still be loaded, as can any entries added to an instance of this class.

Parameters:
  • library_paths – Paths to search for user-provided or packaged CCD entries. All paths are searched.

  • cache_path – The path to which to download CCD entries. This path is searched in addition to library_paths. If None, no on-disk cache is used, though the inner dictionary still functions as an in-memory cache.

  • preload – A list of residue names to ensure are present when initializing the class. If absent from the library or cache directory, these are downloaded from the CCD even if auto_download is falsy. Note that a download failure will raise an error when the corresponding residue is first requested, not at instantiation.

  • patches – Functions to call on ResidueDefinitions downloaded from the CCD before they are returned or added to the inner dict. An iterable of maps from residue names each to a single callable. Each map is applied to residues with the given name in the order they are iterated over. Any patches corresponding to key "*" will be applied to all residues before the more specific patches in its map.

  • extra_definitions – Additional residue definitions to add to the cache. Note that patches are not applied to these definitions.

  • auto_download – If True, automatically attempt to download unknown residues from the CCD. If False, raise KeyError when residues absent from the cache are requested.

Instance and Static Methods

get_from_ccd

Retrieve a residue definition from the CCD, adding it to the cache.

with_

Get a copy of this CcdCache with additional definitions added.

with_crosslink

Add a custom crosslink between residues.

with_patch

Add a patch to the residues loaded via a copy of this CcdCache.

with_replaced

Get a copy of this CcdCache with some definitions replaced.

with_varied_protonation

Get a copy of self with all combinations of some protonation states.

with_virtual_sites

Copy self, adding new residue definitions requiring some virtual sites.

with_vsite_water

Copy self, adding new definitions for common multisite water models.

without

Get a copy of this CcdCache lacking any definitions with some names.

get_from_ccd(res_name: str, patch: bool = True) list[ResidueDefinition][source]

Retrieve a residue definition from the CCD, adding it to the cache.

Downloads the residue definition from the CCD, applies any patches, adds it to the cache, and returns the new, patched residue definitions. Does not return any definitions existing in the cache. Always downloads, even when the entry is already in the cache or auto_download is falsy.

Parameters:
  • res_name – The 3-letter code for the residue.

  • patch – If True, apply patches before adding to the cache and returning. If False, do not apply patches.

with_(definitions: Mapping[str, Sequence[ResidueDefinition]] | Sequence[ResidueDefinition]) Self[source]

Get a copy of this CcdCache with additional definitions added.

Definitions may be supplied as a mapping from residue names to sequences of residue definitions, or as a sequence of residue definitions. In the latter case, the residue names are taken from the residue definitions themselves.

Note that patches are not applied to the new definitions.

Examples

Add a custom definition to the STD_CCD_CACHE. We use a 4-letter residue code as they are supported by Pablo’s PDB reader and do not clash with the CCD’s definitions.

>>> from openff.pablo import STD_CCD_CACHE, ResidueDefinition
>>> my_ccd_cache = STD_CCD_CACHE.with_([
...     ResidueDefinition.from_smiles(
...         "[H:1][O:2][O:3][H:4]",
...         {1: "H1", 2: "O1", 3: "O2", 4: "H2"},
...         "HOOH",
...     )
... ])

Add protonation variants of a residue by specifying acidic and basic atoms.

>>> from openff.pablo import STD_CCD_CACHE, ResidueDefinition
>>>
>>> # Get the GABA (γ-amino butanoic acid) residue definition from CCD
>>> gaba_resdef = STD_CCD_CACHE["ABU"][0]
>>>
>>> # Generate the variants and add them to a new cache
>>> my_ccd_cache = STD_CCD_CACHE.with_({
...     "ABU": gaba_resdef.vary_protonation(
...         acidic=["HXT"], # Atom name of abstractable proton
...         basic=[("N", "H3")], # Atom to protonate, name of new proton
...     )[1:], # Skip the first entry, which is already in the cache
... })
>>> # Should have added three variants - positive, negative, zwitterion
>>> len(my_ccd_cache["ABU"]) - len(STD_CCD_CACHE["ABU"])
3
with_crosslink(*, residues: tuple[str, str], linking_atoms: tuple[str, str], leaving_atoms: tuple[Collection[str], Collection[str]], bond_order: int, aromatic: bool = False, stereo: Literal['E', 'Z'] | None = None) Self

Add a custom crosslink between residues.

Parameters:
  • residues – The names of the residues the crosslink should be formed between. May have 1 entry for a homodimer or 2 entries for a heterodimer.

  • linking_atoms – The atom names between which the crosslink bond is formed. Should have one corresponding entry for each residue.

  • leaving_atoms – The atom names that are absent from the PDB file when the crosslink is in place. These atom names are replaced by the crosslink bond. Should have 1 entry for a homodimer or 2 entries for a heterodimer. Each entry should include all atom names that are absent when the crosslink exists. For example, for a disulfide bond between two cysteines, the leaving atoms are [["HG"]]. For a peptide bond between two alpha amino acids, the leaving atoms are [["H2"], ["OXT", "HXT"]], though peptide bonds are typically modelled with linking bonds rather than crosslinks.

  • bond_order – The bond order of the crosslink; 1 for a single bond, 2 for a double bond, etc.

  • aromaticTrue if the bond is aromatic; False otherwise.

  • stereo – The stereochemistry of the crosslink bond. For a bond without stereochemistry, choose None.

with_patch(residue_name: str, patch: Callable[[ResidueDefinition], list[ResidueDefinition]]) Self[source]

Add a patch to the residues loaded via a copy of this CcdCache.

The patch is added to a copy of the CcdCache, and the copy is returned. The original CcdCache is left unmodified.

The patch function is called on each residue definition stored under the given residue name. The returned residue definitions are concatenated and replace the originals. Patches can therefore add, modify, split, or replace residue definitions depending on whether they include the original definition in the output.

The patch is applied to all definitions in the cache when this function is applied, as well as any definitions downloaded from the CCD in the future. It is not applied to definitions added by the other CcdCache.with_*() methods.

with_replaced(definitions: Mapping[str, Sequence[ResidueDefinition]] | Sequence[ResidueDefinition]) Self[source]

Get a copy of this CcdCache with some definitions replaced.

Similar to with_, but does not retain existing definitions for the specified residue names. All residue names that are keys of a definitions mapping or are residue names in a definitions sequence are removed from the new CcdCache before adding the new definitions.

Note that patches are not applied to the new definitions.

See also

with_, without

with_varied_protonation(residue_name: str, *, acidic: Iterable[str] = (), basic: Iterable[tuple[str, str]] = ()) Self[source]

Get a copy of self with all combinations of some protonation states.

Note that all combinations of protonations and deprotonations are generated; this means that if acidic has length n and basic has length m, 2**(n+m) variants will be generated for each existing variant.

If no variants at all are generated, PabloError is raised. Otherwise, whatever variants make sense are created for each existing variant.

This method will download the given residue name from the CCD if it is not already in the cache.

Parameters:
  • residue_name – The name of the residue to generate alternate protonation states for.

  • acidic – Existing hydrogen atoms that can be removed to form a new protonation state. Each element specifies an atom name to remove, decrementing the formal charge on the neighbouring heavy atom. Multiply bonded, unbonded, missing, or non-hydrogen atoms are skipped unless no variants at all are generated.

  • basic – Existing non-hydrogen atoms that can be protonated to form a new protonation state, as well as the canonical name of the new hydrogen. Each tuple specifies an atom name to protonate (increment the formal charge and form a bond) and the name of the added proton. Unknown heavy atoms and new atom names that clash with existing names raise are skipped unless no variants at all are generated.

with_virtual_sites(residue_name: str, virtual_sites: Iterable[str]) Self[source]

Copy self, adding new residue definitions requiring some virtual sites.

The new definition is added to a copy of the CcdCache, and the copy is returned. The original CcdCache is left unmodified.

A new residue definition is added for each definition currently stored in the cache under the given name. The new definition requires that all the given virtual site names be present in order for it to match, and it discards the corresponding ATOM/HETATM records.

This method works by adding a patch. It will affect any residue definition already added to the cache under the given name, or any definition downloaded in the future, but not any definition added in the future by the with_ or with_replaced methods.

with_vsite_water() Self[source]

Copy self, adding new definitions for common multisite water models.

The new definitions are added to a copy of the CcdCache, and the copy is returned. The original CcdCache is left unmodified.

The new definitions require that all the virtual site names be present in order for them to match, and they discard the corresponding ATOM/HETATM records. The name for the 4-point model virtual site is EPW, and for the 5-point model EP1 and EP2.

This method works by adding a patch. It will affect any 3-atom residue definitions already added to the cache under the names HOH, WAT, or SOL, or any so-named definition downloaded in the future, but not any definition added in the future by the with_ or with_replaced methods.

without(residue_names: Iterable[str]) Self[source]

Get a copy of this CcdCache lacking any definitions with some names.

All definitions for each of the given residue names will not be present in the new cache. Note that residues that are in the CCD will still be returned when they are requested, as long as they can be re-downloaded or found in the cache.

See also

with_replaced, with_