Code Structure and Architecture#
This page provides a detailed overview of the aiida-atomistic code architecture, explaining the design decisions, class hierarchies, and how different components work together.
Design Philosophy#
The aiida-atomistic package follows a site-centric architecture where:
Sites are fundamental units: Each atomic site contains all its properties (position, symbol, charge, magnetic moment, etc.)
Arrays are computed fields: Properties like
positions,charges,magmomsare automatically generated from individual sitesKinds are derived: Groups of sites with identical properties (except position) are automatically detected and grouped
Dual mutability system: Separate mutable and immutable classes provide flexibility while maintaining AiiDA compatibility
This design makes it easy to add/modify individual atoms while maintaining data integrity and enabling efficient storage.
Class Hierarchy#
Overview#
BaseModel (Pydantic)
├── Site
│ ├── FrozenSite (immutable variant)
│ └── Kind (extends Site with position array)
│
├── StructureBaseModel
│ ├── MutableStructureModel
│ └── ImmutableStructureModel
│
└── Data (AiiDA)
├── StructureData (immutable, database-storable)
│ └── properties: ImmutableStructureModel
└── StructureBuilder (mutable, for building/modifying)
└── properties: MutableStructureModel
Two Parallel Paths#
AiiDA Path:
StructureData(immutable) → stored in database, provenance trackedPython Path:
StructureBuilder(mutable) → for building and modifying structures
You can easily convert between them:
# Immutable → Mutable
builder = structure_data.get_value()
# Mutable → Immutable
structure_data = StructureData.from_builder(builder)
Core Components#
1. Site Model (site.py)#
The Site class represents a single atomic site with all its properties.
Key Fields:
class Site(BaseModel):
symbol: Union[str, List[str]] # Element symbol(s)
position: np.ndarray # 3D position [x, y, z]
mass: Optional[float] # Atomic mass
charge: Optional[float] # Charge
magmom: Optional[np.ndarray] # Magnetic moment vector [mx, my, mz]
magnetization: Optional[float] # Scalar magnetization
weight: Optional[Tuple[float, ...]] # For alloys/vacancies
kind_name: Optional[str] # Kind identifier
Special Features:
Alloy support: Multiple symbols with weights (e.g.,
["Si", "Ge"]withweights=(0.5, 0.5))Vacancy support: Weights summing to < 1 indicate vacancies
Automatic mass: Calculated from element symbols if not provided
Validation: Ensures positions are 3D, symbols are valid elements and so on
Immutable Variant:
FrozenSite is an immutable version where all fields are frozen and cannot be modified after creation.
2. Structure Model (models.py)#
The StructureBaseModel represents a complete atomic structure.
Global Properties:
class StructureBaseModel(BaseModel):
pbc: list[bool] # Periodic boundary conditions [x, y, z]
cell: np.ndarray # 3×3 lattice vectors matrix
sites: list[Site] # List of atomic sites
tot_magnetization: Optional[float]
tot_charge: Optional[float]
hubbard: Optional[Hubbard] # Hubbard parameters
custom: Optional[dict] # Custom properties
Computed Fields (automatically calculated from sites):
positions→ Array of all site positionssymbols→ List of chemical symbolsmasses,charges,magmoms,magnetizations,weights→ Property arrayskind_names→ List mapping each site to its kindkinds→ List ofKindobjects (grouped sites)formula→ Chemical formulacell_volume→ Unit cell volumedimensionality→ 0D/1D/2D/3D classificationis_alloy→ Whether structure contains alloyshas_vacancies→ Whether structure has vacancies
Two Variants:
MutableStructureModel:
_mutable = True, sites can be modifiedImmutableStructureModel:
_mutable = False, all properties frozen
3. The Kinds System#
What are Kinds?
A “kind” represents a group of sites that share ALL properties except position. For example, in a water molecule, both H atoms have the same properties (mass, charge, etc.) but different positions, so they belong to the same kind.
Why Kinds?
Storage efficiency: Store one kind definition + N positions instead of N identical site definitions
Computational efficiency: Perform operations on groups of identical atoms
Physical meaning: Distinguish chemically identical but structurally distinct atoms (e.g., bulk vs. surface atoms)
Automatic Detection:
The generate_kinds() method automatically detects kinds by grouping sites with identical properties:
builder = StructureBuilder(sites=[...])
builder.generate_kinds() # Automatically assigns kind_names
Kind Model:
class Kind(Site):
positions: np.ndarray # All positions for this kind [(x1,y1,z1), (x2,y2,z2), ...]
site_indices: List[int] # Indices of sites belonging to this kind
Example:
# Input: 3 sites
sites = [
{"symbol": "H", "position": [0, 0, 0], "charge": 0.4},
{"symbol": "H", "position": [1, 0, 0], "charge": 0.4}, # Same as site 0
{"symbol": "O", "position": [0, 1, 0], "charge": -0.8}
]
# After kind detection → 2 kinds:
# Kind 1: symbol="H", charge=0.4, positions=[[0,0,0], [1,0,0]], site_indices=[0,1]
# Kind 2: symbol="O", charge=-0.8, positions=[[0,1,0]], site_indices=[2]
4. Validation System#
The package implements multi-layer validation to ensure data integrity:
Layer 1: Pydantic Field Validators
Type checking (e.g.,
positionmust be array-like)Shape validation (e.g.,
cellmust be 3×3)Value constraints (e.g.,
mass> 0)
Layer 2: Pydantic Model Validators
check_minimal_requirements: Validates structure completenesscheck_is_alloy: Detects and configures alloy sitesMutual exclusivity checks (e.g., can’t have both
magmomandmagnetization)
Layer 3: Custom Validation Methods
validate_kinds(): Ensures all sites with the samekind_namehave identical properties_check_valid_sites(): Checks that sites aren’t too close together (prevents unphysical structures)
Layer 4: Setter Validation
Length matching (e.g.,
chargesarray must match number of sites)Type coercion
Automatic updates to dependent fields
Configurable Tolerances:
When comparing properties for kind validation, the following tolerances are used:
_DEFAULT_THRESHOLDS = {
"charges": 0.1,
"masses": 1e-4,
"magmoms": 1e-4,
}
Immutability Implementation#
AiiDA requires stored data nodes to be immutable. The package implements this through multiple protection layers:
Layer 1: Pydantic Configuration#
class ImmutableStructureModel(StructureBaseModel):
model_config = ConfigDict(frozen=True) # Pydantic-level immutability
Layer 2: Custom __setattr__#
def __setattr__(self, key, value):
if key in self.model_fields:
raise ValueError("StructureData is immutable. Use get_value() for mutable copy.")
super().__setattr__(key, value)
Layer 3: Frozen NumPy Arrays#
@field_validator('position', 'magmom', mode='before')
def ensure_numpy_array(cls, v):
array_v = np.asarray(v)
array_v.flags.writeable = False # Make array read-only
return array_v
Layer 4: FrozenList and FrozenSite#
class FrozenList(list):
def __setitem__(self, index, value):
raise ValueError("This list is immutable. Use setter methods on mutable copy.")
Layer 5: Nested Freezing#
All nested structures (lists of sites, dictionaries, etc.) are recursively frozen.
Working with Immutable Structures#
# Get immutable structure from database
structure = load_node(pk)
# Cannot modify directly
structure.properties.pbc[0] = False # ❌ Raises ValueError
# Get mutable copy for modifications
builder = structure.get_value()
builder.set_pbc([True, True, False]) # ✅ Works
builder.sites[0].charge = -1.0 # ✅ Works
# Convert back to immutable for storage
new_structure = StructureData.from_builder(builder)
new_structure.store()
Storage Architecture#
Database vs Repository Storage#
Currently, all data is stored in the AiiDA database attributes for simplicity and queryability:
# Without kinds (site-based storage)
{
"pbc": [true, true, true],
"cell": [[3.0, 0, 0], [0, 3.0, 0], [0, 0, 3.0]],
"positions": [[0,0,0], [1,0,0], [0,1,0]],
"symbols": ["H", "H", "O"],
"charges": [0.4, 0.4, -0.8],
...
}
# With kinds (compressed storage)
{
"pbc": [true, true, true],
"cell": [[3.0, 0, 0], [0, 3.0, 0], [0, 0, 3.0]],
"kind_names": ["H1", "H1", "O1"],
"site_indices": [[0, 1], [2]],
"positions": [[[0,0,0], [1,0,0]], [[0,1,0]]], # Grouped by kind
"symbols": ["H", "O"], # One per kind
"charges": [0.4, -0.8], # One per kind
...
}
Compression Benefits:
For a structure with many identical atoms (e.g., 1000 water molecules = 3000 atoms but only 2 kinds), kinds-based storage significantly reduces database size.
Loading Process#
When loading from the database, the data is automatically decompressed:
Check if
kind_namesexists in attributesIf yes: decompress kinds → sites using
rebuild_site_lists_from_kind_lists()Build
Siteobjects from expanded propertiesCreate
ImmutableStructureModelwith reconstructed sites
Key Insight: Storage is optimized (kinds-based), but the in-memory representation is always site-based for ease of use.
Getter and Setter Mixins#
GetterMixin (getter_mixin.py)#
Available for both StructureData and StructureBuilder, provides:
Convenience Properties:
structure.cell # Access cell directly
structure.pbc # Access PBC
structure.sites # Access sites list
structure.kinds # Access kinds (if detected)
structure.formula # Get chemical formula
Conversion Methods:
atoms = structure.to_ase() # → ASE Atoms
pmg_struct = structure.to_pymatgen() # → pymatgen Structure
structure.to_file('output.cif') # Write to file
Factory Methods:
structure = StructureData.from_ase(atoms)
structure = StructureData.from_pymatgen(pmg_struct)
structure = StructureData.from_file('input.cif')
Query Methods:
structure.get_supported_properties() # All available properties
structure.get_defined_properties() # Properties actually set
Validation:
structure.validate_kinds() # Check kinds consistency
Serialization:
data_dict = structure.to_dict()
dump = structure.properties.model_dump()
SetterMixin (setter_mixin.py)#
Only available for StructureBuilder (mutable structures), provides modification methods:
Setting Properties:
builder.set_cell([[3, 0, 0], [0, 3, 0], [0, 0, 3]])
builder.set_pbc([True, True, False])
builder.set_charges([0.4, 0.4, -0.8])
builder.set_magmoms([[0, 0, 1], [0, 0, 1], [0, 0, 0]])
Adding/Removing Atoms:
builder.append_atom(symbol='H', position=[0, 0, 0], charge=0.4)
builder.remove_sites([0, 2]) # Remove sites by index
Removing Properties:
builder.remove_charges() # Remove all charges
builder.remove_magmoms() # Remove all magnetic moments
Updating Sites:
builder.update_sites(site_indices=[0, 1], charge=0.5) # Update specific sites
Kind Generation:
builder.generate_kinds(tolerance={'charges': 0.1}) # Auto-detect kinds
Performance Considerations#
Computational Complexity#
Kind detection: O(N) where N = number of sites
Site validation: O(N²) for pairwise distance checking (vectorized)
Computed fields: Cached after first access (O(1) subsequent access)
Kinds compression/decompression: O(N × K) where K = number of kinds
Optimization Features#
Automatic Caching:
All @computed_field properties are automatically cached by Pydantic:
positions = structure.properties.positions # Computed once
positions2 = structure.properties.positions # Retrieved from cache
Efficient Copying:
The efficient_copy() utility selectively copies only mutable parts, avoiding deep copy overhead.
Vectorized Operations: NumPy operations are used throughout for array manipulations (e.g., distance calculations).
Best Practices for Large Structures#
For structures with >10,000 atoms:
Use kinds compression (automatically enabled when kinds are detected)
Consider disabling
_check_valid_sitesvalidation if sites are trustedUse batch operations instead of modifying sites one-by-one
String Representations#
The package provides informative __repr__ methods for debugging and inspection:
Site Representation#
site = Site(symbol='Fe', position=[0, 0, 0], charge=1.0, magnetization=2.5)
print(site)
# Output: Site(Fe @ [0.000, 0.000, 0.000], charge=1.00, mag=2.50)
Structure Representation#
structure = StructureData(...)
print(structure.properties)
# Output:
# | formula: H2O, sites: 3, dimensionality: 3D, V=27.00 A^3 | Sites: [
# Site(H @ [0.000, 0.000, 0.000], kind=H1)
# Site(H @ [1.000, 0.000, 0.000], kind=H1)
# Site(O @ [0.500, 0.866, 0.000], kind=O1)
# ]
Summary#
The aiida-atomistic architecture provides:
✅ Flexible: Site-centric design makes modifications intuitive ✅ Safe: Multiple immutability layers protect stored data ✅ Efficient: Kinds compression and computed field caching optimize performance ✅ Compatible: Seamless integration with ASE, pymatgen, and AiiDA ✅ Extensible: Clear patterns for adding new properties ✅ Validated: Multi-layer validation ensures data integrity
This design balances ease of use with the strict requirements of scientific data management and provenance tracking in AiiDA.