389 lines
13 KiB
ReStructuredText
389 lines
13 KiB
ReStructuredText
.. index:: fitpod
|
|
|
|
fitpod command
|
|
==============
|
|
|
|
Syntax
|
|
""""""
|
|
|
|
.. code-block:: LAMMPS
|
|
|
|
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
|
|
|
|
* fitpod = style name of this command
|
|
* Ta_param.pod = an input file that describes proper orthogonal descriptors (PODs)
|
|
* Ta_data.pod = an input file that specifies DFT data used to fit a POD potential
|
|
* Ta_coefficients.pod (optional) = an input file that specifies trainable coefficients of a POD potential
|
|
|
|
Examples
|
|
""""""""
|
|
|
|
.. code-block:: LAMMPS
|
|
|
|
fitpod Ta_param.pod Ta_data.pod
|
|
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
|
|
|
|
Description
|
|
"""""""""""
|
|
.. versionadded:: 22Dec2022
|
|
|
|
Fit a machine-learning interatomic potential (ML-IAP) based on proper
|
|
orthogonal descriptors (POD); please see :ref:`(Nguyen and Rohskopf)
|
|
<Nguyen20222a>`, :ref:`(Nguyen2023) <Nguyen20232a>`, :ref:`(Nguyen2024)
|
|
<Nguyen20242a>`, and :ref:`(Nguyen and Sema) <Nguyen20243a>` for details.
|
|
The fitted POD potential can be used to run MD simulations via
|
|
:doc:`pair_style pod <pair_pod>`.
|
|
|
|
Two input files are required for this command. The first input file
|
|
describes a POD potential parameter settings, while the second input
|
|
file specifies the DFT data used for the fitting procedure. All keywords
|
|
except *species* have default values. If a keyword is not set in the
|
|
input file, its default value is used. The table below has one-line
|
|
descriptions of all the keywords that can be used in the first input
|
|
file (i.e. ``Ta_param.pod``)
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:widths: 40 9 10 41
|
|
|
|
* - Keyword
|
|
- Default
|
|
- Type
|
|
- Description
|
|
* - species
|
|
- (none)
|
|
- STRING
|
|
- Chemical symbols for all elements in the system and have to match XYZ training files.
|
|
* - pbc
|
|
- 1 1 1
|
|
- INT
|
|
- three integer constants specify boundary conditions
|
|
* - rin
|
|
- 0.5
|
|
- REAL
|
|
- a real number specifies the inner cut-off radius
|
|
* - rcut
|
|
- 5.0
|
|
- REAL
|
|
- a real number specifies the outer cut-off radius
|
|
* - bessel_polynomial_degree
|
|
- 4
|
|
- INT
|
|
- the maximum degree of Bessel polynomials
|
|
* - inverse_polynomial_degree
|
|
- 8
|
|
- INT
|
|
- the maximum degree of inverse radial basis functions
|
|
* - number_of_environment_clusters
|
|
- 1
|
|
- INT
|
|
- the number of clusters for environment-adaptive potentials
|
|
* - number_of_principal_components
|
|
- 2
|
|
- INT
|
|
- the number of principal components for dimensionality reduction
|
|
* - onebody
|
|
- 1
|
|
- BOOL
|
|
- turns on/off one-body potential
|
|
* - twobody_number_radial_basis_functions
|
|
- 8
|
|
- INT
|
|
- number of radial basis functions for two-body potential
|
|
* - threebody_number_radial_basis_functions
|
|
- 6
|
|
- INT
|
|
- number of radial basis functions for three-body potential
|
|
* - threebody_angular_degree
|
|
- 5
|
|
- INT
|
|
- angular degree for three-body potential
|
|
* - fourbody_number_radial_basis_functions
|
|
- 4
|
|
- INT
|
|
- number of radial basis functions for four-body potential
|
|
* - fourbody_angular_degree
|
|
- 3
|
|
- INT
|
|
- angular degree for four-body potential
|
|
* - fivebody_number_radial_basis_functions
|
|
- 0
|
|
- INT
|
|
- number of radial basis functions for five-body potential
|
|
* - fivebody_angular_degree
|
|
- 0
|
|
- INT
|
|
- angular degree for five-body potential
|
|
* - sixbody_number_radial_basis_functions
|
|
- 0
|
|
- INT
|
|
- number of radial basis functions for six-body potential
|
|
* - sixbody_angular_degree
|
|
- 0
|
|
- INT
|
|
- angular degree for six-body potential
|
|
* - sevenbody_number_radial_basis_functions
|
|
- 0
|
|
- INT
|
|
- number of radial basis functions for seven-body potential
|
|
* - sevenbody_angular_degree
|
|
- 0
|
|
- INT
|
|
- angular degree for seven-body potential
|
|
|
|
Note that both the number of radial basis functions and angular degree
|
|
must decrease as the body order increases. The next table describes all
|
|
keywords that can be used in the second input file (i.e. ``Ta_data.pod``
|
|
in the example above):
|
|
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:widths: 38 9 10 43
|
|
|
|
* - Keyword
|
|
- Default
|
|
- Type
|
|
- Description
|
|
* - file_format
|
|
- extxyz
|
|
- STRING
|
|
- only the extended xyz format (extxyz) is currently supported
|
|
* - file_extension
|
|
- xyz
|
|
- STRING
|
|
- extension of the data files
|
|
* - path_to_training_data_set
|
|
- (none)
|
|
- STRING
|
|
- specifies the path to training data files in double quotes
|
|
* - path_to_test_data_set
|
|
- ""
|
|
- STRING
|
|
- specifies the path to test data files in double quotes
|
|
* - path_to_environment_configuration_set
|
|
- ""
|
|
- STRING
|
|
- specifies the path to environment configuration files in double quotes
|
|
* - fraction_training_data_set
|
|
- 1.0
|
|
- REAL
|
|
- a real number (<= 1.0) specifies the fraction of the training set used to fit POD
|
|
* - randomize_training_data_set
|
|
- 0
|
|
- BOOL
|
|
- turns on/off randomization of the training set
|
|
* - fraction_test_data_set
|
|
- 1.0
|
|
- REAL
|
|
- a real number (<= 1.0) specifies the fraction of the test set used to validate POD
|
|
* - randomize_test_data_set
|
|
- 0
|
|
- BOOL
|
|
- turns on/off randomization of the test set
|
|
* - fitting_weight_energy
|
|
- 100.0
|
|
- REAL
|
|
- a real constant specifies the weight for energy in the least-squares fit
|
|
* - fitting_weight_force
|
|
- 1.0
|
|
- REAL
|
|
- a real constant specifies the weight for force in the least-squares fit
|
|
* - fitting_regularization_parameter
|
|
- 1.0e-10
|
|
- REAL
|
|
- a real constant specifies the regularization parameter in the least-squares fit
|
|
* - error_analysis_for_training_data_set
|
|
- 0
|
|
- BOOL
|
|
- turns on/off error analysis for the training data set
|
|
* - error_analysis_for_test_data_set
|
|
- 0
|
|
- BOOL
|
|
- turns on/off error analysis for the test data set
|
|
* - basename_for_output_files
|
|
- pod
|
|
- STRING
|
|
- a basename string added to the output files
|
|
* - precision_for_pod_coefficients
|
|
- 8
|
|
- INT
|
|
- number of digits after the decimal points for numbers in the coefficient file
|
|
* - group_weights
|
|
- global
|
|
- STRING
|
|
- ``table`` uses group weights defined for each group named by filename
|
|
|
|
All keywords except *path_to_training_data_set* have default values. If
|
|
a keyword is not set in the input file, its default value is used. After
|
|
successful training, a number of output files are produced, if enabled:
|
|
|
|
* ``<basename>_training_errors.pod`` reports the errors in energy and forces for the training data set
|
|
* ``<basename>_training_analysis.pod`` reports detailed errors for all training configurations
|
|
* ``<basename>_test_errors.pod`` reports errors for the test data set
|
|
* ``<basename>_test_analysis.pod`` reports detailed errors for all test configurations
|
|
* ``<basename>_coefficients.pod`` contains the coefficients of the POD potential
|
|
|
|
After training the POD potential, ``Ta_param.pod`` and
|
|
``<basename>_coefficients.pod`` are the two files needed to use the POD
|
|
potential in LAMMPS. See :doc:`pair_style pod <pair_pod>` for using the
|
|
POD potential. Examples about training and using POD potentials are
|
|
found in the directory lammps/examples/PACKAGES/pod and the Github repo
|
|
https://github.com/cesmix-mit/pod-examples.
|
|
|
|
Loss Function Group Weights
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The *group_weights* keyword in the ``data.pod`` file is responsible for
|
|
weighting certain groups of configurations in the loss function. For
|
|
example:
|
|
|
|
.. code-block:: LAMMPS
|
|
|
|
group_weights table
|
|
Displaced_A15 100.0 1.0
|
|
Displaced_BCC 100.0 1.0
|
|
Displaced_FCC 100.0 1.0
|
|
Elastic_BCC 100.0 1.0
|
|
Elastic_FCC 100.0 1.0
|
|
GSF_110 100.0 1.0
|
|
GSF_112 100.0 1.0
|
|
Liquid 100.0 1.0
|
|
Surface 100.0 1.0
|
|
Volume_A15 100.0 1.0
|
|
Volume_BCC 100.0 1.0
|
|
Volume_FCC 100.0 1.0
|
|
|
|
This will apply an energy weight of ``100.0`` and a force weight of
|
|
``1.0`` for all groups in the ``Ta`` example. The groups are named by
|
|
their respective filename. If certain groups are left out of this table,
|
|
then the globally defined weights from the ``fitting_weight_energy`` and
|
|
``fitting_weight_force`` keywords will be used.
|
|
|
|
POD Potential
|
|
"""""""""""""
|
|
|
|
We consider a multi-element system of *N* atoms with :math:`N_{\rm e}`
|
|
unique elements. We denote by :math:`\boldsymbol r_n` and :math:`Z_n`
|
|
position vector and type of an atom *n* in the system,
|
|
respectively. Note that we have :math:`Z_n \in \{1, \ldots, N_{\rm e}
|
|
\}`, :math:`\boldsymbol R = (\boldsymbol r_1, \boldsymbol r_2, \ldots,
|
|
\boldsymbol r_N) \in \mathbb{R}^{3N}`, and :math:`\boldsymbol Z = (Z_1,
|
|
Z_2, \ldots, Z_N) \in \mathbb{N}^{N}`. The total energy of the
|
|
POD potential is expressed as :math:`E(\boldsymbol R, \boldsymbol Z) =
|
|
\sum_{i=1}^N E_i(\boldsymbol R_i, \boldsymbol Z_i)`, where
|
|
|
|
.. math::
|
|
|
|
E_i(\boldsymbol R_i, \boldsymbol Z_i) \ = \ \sum_{m=1}^M c_m \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)
|
|
|
|
|
|
Here :math:`c_m` are trainable coefficients and
|
|
:math:`\mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)` are per-atom
|
|
POD descriptors. Summing the per-atom descriptors over :math:`i` yields
|
|
the global descriptors :math:`d_m(\boldsymbol R, \boldsymbol Z) =
|
|
\sum_{i=1}^N \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)`. It
|
|
thus follows that :math:`E(\boldsymbol R, \boldsymbol Z) = \sum_{m=1}^M
|
|
c_m d_m(\boldsymbol R, \boldsymbol Z)`.
|
|
|
|
The per-atom POD descriptors include one, two, three, four, five, six,
|
|
and seven-body descriptors, which can be specified in the first input
|
|
file. Furthermore, the per-atom POD descriptors also depend on the
|
|
number of environment clusters specified in the first input file.
|
|
Please see :ref:`(Nguyen2024) <Nguyen20242a>` and :ref:`(Nguyen and Sema)
|
|
<Nguyen20243a>` for the detailed description of the per-atom POD
|
|
descriptors.
|
|
|
|
Training
|
|
""""""""
|
|
|
|
A POD potential is trained using the least-squares regression against
|
|
density functional theory (DFT) data. Let :math:`J` be the number of
|
|
training configurations, with :math:`N_j` being the number of atoms in
|
|
the j-th configuration. The training configurations are extracted from
|
|
the extended XYZ files located in a directory (i.e.,
|
|
path_to_training_data_set in the second input file). Let
|
|
:math:`\{E^{\star}_j\}_{j=1}^{J}` and :math:`\{\boldsymbol
|
|
F^{\star}_j\}_{j=1}^{J}` be the DFT energies and forces for :math:`J`
|
|
configurations. Next, we calculate the global descriptors and their
|
|
derivatives for all training configurations. Let :math:`d_{jm}, 1 \le m
|
|
\le M`, be the global descriptors associated with the j-th
|
|
configuration, where :math:`M` is the number of global descriptors. We
|
|
then form a matrix :math:`\boldsymbol A \in \mathbb{R}^{J \times M}`
|
|
with entries :math:`A_{jm} = d_{jm}/ N_j` for :math:`j=1,\ldots,J` and
|
|
:math:`m=1,\ldots,M`. Moreover, we form a matrix :math:`\boldsymbol B
|
|
\in \mathbb{R}^{\mathcal{N} \times M}` by stacking the derivatives of
|
|
the global descriptors for all training configurations from top to
|
|
bottom, where :math:`\mathcal{N} = 3\sum_{j=1}^{J} N_j`.
|
|
|
|
The coefficient vector :math:`\boldsymbol c` of the POD potential is
|
|
found by solving the following least-squares problem
|
|
|
|
.. math::
|
|
|
|
{\min}_{\boldsymbol c \in \mathbb{R}^{M}} \ w_E \|\boldsymbol A \boldsymbol c - \bar{\boldsymbol E}^{\star} \|^2 + w_F \|\boldsymbol B \boldsymbol c + \boldsymbol F^{\star} \|^2 + w_R \|\boldsymbol c \|^2,
|
|
|
|
where :math:`w_E` and :math:`w_F` are weights for the energy
|
|
(*fitting_weight_energy*) and force (*fitting_weight_force*),
|
|
respectively; and :math:`w_R` is the regularization parameter
|
|
(*fitting_regularization_parameter*). Here :math:`\bar{\boldsymbol
|
|
E}^{\star} \in \mathbb{R}^{J}` is a vector of with entries
|
|
:math:`\bar{E}^{\star}_j = E^{\star}_j/N_j` and :math:`\boldsymbol
|
|
F^{\star}` is a vector of :math:`\mathcal{N}` entries obtained by
|
|
stacking :math:`\{\boldsymbol F^{\star}_j\}_{j=1}^{J}` from top to
|
|
bottom.
|
|
|
|
Validation
|
|
""""""""""
|
|
|
|
POD potential can be validated on a test dataset in a directory
|
|
specified by setting path_to_test_data_set in the second input file. It
|
|
is possible to validate the POD potential after the training is
|
|
complete. This is done by providing the coefficient file as an input to
|
|
:doc:`fitpod <fitpod_command>`, for example,
|
|
|
|
.. code-block:: LAMMPS
|
|
|
|
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
|
|
|
|
Restrictions
|
|
""""""""""""
|
|
|
|
This command is part of the ML-POD package. It is only enabled if
|
|
LAMMPS was built with that package. See the :doc:`Build package
|
|
<Build_package>` page for more info.
|
|
|
|
Related commands
|
|
""""""""""""""""
|
|
|
|
:doc:`pair_style pod <pair_pod>`,
|
|
:doc:`compute pod/atom <compute_pod_atom>`,
|
|
:doc:`compute podd/atom <compute_pod_atom>`,
|
|
:doc:`compute pod/local <compute_pod_atom>`,
|
|
:doc:`compute pod/global <compute_pod_atom>`
|
|
|
|
Default
|
|
"""""""
|
|
|
|
The keyword defaults are also given in the description of the input files.
|
|
|
|
----------
|
|
|
|
.. _Nguyen20222a:
|
|
|
|
**(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).
|
|
|
|
.. _Nguyen20232a:
|
|
|
|
**(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023).
|
|
|
|
.. _Nguyen20242a:
|
|
|
|
**(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024).
|
|
|
|
.. _Nguyen20243a:
|
|
|
|
**(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).
|
|
|
|
|