various documentation fixups, dedup references, wrap paragraphs, adjust underlines, add missing index

This commit is contained in:
Axel Kohlmeyer
2024-06-26 07:26:03 -04:00
parent 8173142950
commit 44b66cb56b
4 changed files with 131 additions and 102 deletions

View File

@ -10,10 +10,10 @@ compute podd/atom command
========================= =========================
compute pod/local command compute pod/local command
======================= =========================
compute pod/global command compute pod/global command
======================= ==========================
Syntax Syntax
"""""" """"""
@ -50,41 +50,50 @@ Description
Define a computation that calculates a set of quantities related to the Define a computation that calculates a set of quantities related to the
POD descriptors of the atoms in a group. These computes are used POD descriptors of the atoms in a group. These computes are used
primarily for calculating the dependence of energy and force components primarily for calculating the dependence of energy and force components
on the linear coefficients in the :doc:`pod pair_style on the linear coefficients in the :doc:`pod pair_style <pair_pod>`,
<pair_pod>`, which is useful when training a POD potential to match which is useful when training a POD potential to match target data. POD
target data. POD descriptors of an atom are characterized by the descriptors of an atom are characterized by the radial and angular
radial and angular distribution of neighbor atoms. The detailed distribution of neighbor atoms. The detailed mathematical definition is
mathematical definition is given in the papers by :ref:`(Nguyen and Rohskopf) <Nguyen20222>`, given in the papers by :ref:`(Nguyen and Rohskopf) <Nguyen20222c>`,
:ref:`(Nguyen2023) <Nguyen20232>`, :ref:`(Nguyen2024) <Nguyen20242>`, and :ref:`(Nguyen and Sema) <Nguyen20243>`. :ref:`(Nguyen2023) <Nguyen20232c>`, :ref:`(Nguyen2024) <Nguyen20242c>`,
and :ref:`(Nguyen and Sema) <Nguyen20243c>`.
Compute *pod/atom* calculates the per-atom POD descriptors. Compute *pod/atom* calculates the per-atom POD descriptors.
Compute *podd/atom* calculates derivatives of the per-atom POD descriptors with respect to atom positions. Compute *podd/atom* calculates derivatives of the per-atom POD
descriptors with respect to atom positions.
Compute *pod/local* calculates the per-atom POD descriptors and their derivatives with respect to atom positions. Compute *pod/local* calculates the per-atom POD descriptors and their
derivatives with respect to atom positions.
Compute *pod/global* calculates the global POD descriptors and their derivatives with respect to atom positions. Compute *pod/global* calculates the global POD descriptors and their
derivatives with respect to atom positions.
Examples how to use Compute POD commands are found in the directory lammps/examples/PACKAGES/pod. Examples how to use Compute POD commands are found in the directory
``examples/PACKAGES/pod``.
---------- ----------
Output info Output info
""""""""""" """""""""""
Compute *pod/atom* produces an 2D array of size :math:`N \times M`, where :math:`N` is the number of atoms Compute *pod/atom* produces an 2D array of size :math:`N \times M`,
and :math:`M` is the number of descriptors. Each column corresponds to a particular POD descriptor. where :math:`N` is the number of atoms and :math:`M` is the number of
descriptors. Each column corresponds to a particular POD descriptor.
Compute *podd/atom* produces an 2D array of size :math:`N \times (M * 3 N)`. Each column Compute *podd/atom* produces an 2D array of size :math:`N \times (M * 3
corresponds to a particular derivative of a POD descriptor. N)`. Each column corresponds to a particular derivative of a POD
descriptor.
Compute *pod/local* produces an 2D array of size :math:`(1 + 3N) \times (M * N)`. Compute *pod/local* produces an 2D array of size :math:`(1 + 3N) \times
The first row contains the per-atom descriptors, and the last 3N rows contain the derivatives (M * N)`. The first row contains the per-atom descriptors, and the last
of the per-atom descriptors with respect to atom positions. 3N rows contain the derivatives of the per-atom descriptors with respect
to atom positions.
Compute *pod/global* produces an 2D array of size :math:`(1 + 3N) \times (M)`. Compute *pod/global* produces an 2D array of size :math:`(1 + 3N) \times
The first row contains the global descriptors, and the last 3N rows contain the derivatives (M)`. The first row contains the global descriptors, and the last 3N
of the global descriptors with respect to atom positions. rows contain the derivatives of the global descriptors with respect to
atom positions.
Restrictions Restrictions
"""""""""""" """"""""""""
@ -107,19 +116,19 @@ none
---------- ----------
.. _Nguyen20222: .. _Nguyen20222c:
**(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023). **(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).
.. _Nguyen20232: .. _Nguyen20232c:
**(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023). **(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023).
.. _Nguyen20242: .. _Nguyen20242c:
**(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024). **(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024).
.. _Nguyen20243: .. _Nguyen20243c:
**(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024). **(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).

View File

@ -1,7 +1,7 @@
.. index:: fitpod .. index:: fitpod
fitpod command fitpod command
====================== ==============
Syntax Syntax
"""""" """"""
@ -28,15 +28,19 @@ Description
.. versionadded:: 22Dec2022 .. versionadded:: 22Dec2022
Fit a machine-learning interatomic potential (ML-IAP) based on proper Fit a machine-learning interatomic potential (ML-IAP) based on proper
orthogonal descriptors (POD); please see :ref:`(Nguyen and Rohskopf) <Nguyen20222>`, orthogonal descriptors (POD); please see :ref:`(Nguyen and Rohskopf)
:ref:`(Nguyen2023) <Nguyen20232>`, :ref:`(Nguyen2024) <Nguyen20242>`, and :ref:`(Nguyen and Sema) <Nguyen20243>` for details. <Nguyen20222a>`, :ref:`(Nguyen2023) <Nguyen20232a>`, :ref:`(Nguyen2024)
The fitted POD potential can be used to run MD simulations via :doc:`pair_style pod <pair_pod>`. <Nguyen20242a>`, and :ref:`(Nguyen and Sema) <Nguyen20243a>` for details.
The fitted POD potential can be used to run MD simulations via
:doc:`pair_style pod <pair_pod>`.
Two input files are required for this command. The first input file describes a POD potential parameter Two input files are required for this command. The first input file
settings, while the second input file specifies the DFT data used for describes a POD potential parameter settings, while the second input
the fitting procedure. All keywords except *species* have default values. If a keyword is not file specifies the DFT data used for the fitting procedure. All keywords
set in the input file, its default value is used. The table below has one-line descriptions of all the keywords that can except *species* have default values. If a keyword is not set in the
be used in the first input file (i.e. ``Ta_param.pod``) input file, its default value is used. The table below has one-line
descriptions of all the keywords that can be used in the first input
file (i.e. ``Ta_param.pod``)
.. list-table:: .. list-table::
:header-rows: 1 :header-rows: 1
@ -127,8 +131,10 @@ be used in the first input file (i.e. ``Ta_param.pod``)
- INT - INT
- angular degree for seven-body potential - angular degree for seven-body potential
Note that both the number of radial basis functions and angular degree must decrease as the body order increases. The next table describes all keywords that can be used in the second input file Note that both the number of radial basis functions and angular degree
(i.e. ``Ta_data.pod`` in the example above): must decrease as the body order increases. The next table describes all
keywords that can be used in the second input file (i.e. ``Ta_data.pod``
in the example above):
.. list-table:: .. list-table::
@ -218,17 +224,19 @@ successful training, a number of output files are produced, if enabled:
* ``<basename>_test_analysis.pod`` reports detailed errors for all test configurations * ``<basename>_test_analysis.pod`` reports detailed errors for all test configurations
* ``<basename>_coefficients.pod`` contains the coefficients of the POD potential * ``<basename>_coefficients.pod`` contains the coefficients of the POD potential
After training the POD potential, ``Ta_param.pod`` and ``<basename>_coefficients.pod`` After training the POD potential, ``Ta_param.pod`` and
are the two files needed to use the POD potential in LAMMPS. ``<basename>_coefficients.pod`` are the two files needed to use the POD
See :doc:`pair_style pod <pair_pod>` for using the POD potential. Examples potential in LAMMPS. See :doc:`pair_style pod <pair_pod>` for using the
about training and using POD potentials are found in the directory POD potential. Examples about training and using POD potentials are
lammps/examples/PACKAGES/pod and the Github repo https://github.com/cesmix-mit/pod-examples. found in the directory lammps/examples/PACKAGES/pod and the Github repo
https://github.com/cesmix-mit/pod-examples.
Loss Function Group Weights Loss Function Group Weights
^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``group_weights`` keyword in the ``data.pod`` file is responsible for weighting certain groups The *group_weights* keyword in the ``data.pod`` file is responsible for
of configurations in the loss function. For example: weighting certain groups of configurations in the loss function. For
example:
.. code-block:: LAMMPS .. code-block:: LAMMPS
@ -246,9 +254,10 @@ of configurations in the loss function. For example:
Volume_BCC 100.0 1.0 Volume_BCC 100.0 1.0
Volume_FCC 100.0 1.0 Volume_FCC 100.0 1.0
This will apply an energy weight of ``100.0`` and a force weight of ``1.0`` for all groups in the This will apply an energy weight of ``100.0`` and a force weight of
``Ta`` example. The groups are named by their respective filename. If certain groups are left out of ``1.0`` for all groups in the ``Ta`` example. The groups are named by
this table, then the globally defined weights from the ``fitting_weight_energy`` and their respective filename. If certain groups are left out of this table,
then the globally defined weights from the ``fitting_weight_energy`` and
``fitting_weight_force`` keywords will be used. ``fitting_weight_force`` keywords will be used.
POD Potential POD Potential
@ -269,38 +278,43 @@ POD potential is expressed as :math:`E(\boldsymbol R, \boldsymbol Z) =
E_i(\boldsymbol R_i, \boldsymbol Z_i) \ = \ \sum_{m=1}^M c_m \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i) E_i(\boldsymbol R_i, \boldsymbol Z_i) \ = \ \sum_{m=1}^M c_m \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)
Here :math:`c_m` are trainable coefficients and :math:`\mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)` Here :math:`c_m` are trainable coefficients and
are per-atom POD descriptors. Summing the per-atom descriptors over :math:`i` yields the :math:`\mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)` are per-atom
global descriptors :math:`d_m(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)`. POD descriptors. Summing the per-atom descriptors over :math:`i` yields
It thus follows that :math:`E(\boldsymbol R, \boldsymbol Z) = the global descriptors :math:`d_m(\boldsymbol R, \boldsymbol Z) =
\sum_{m=1}^M c_m d_m(\boldsymbol R, \boldsymbol Z)`. \sum_{i=1}^N \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)`. It
thus follows that :math:`E(\boldsymbol R, \boldsymbol Z) = \sum_{m=1}^M
c_m d_m(\boldsymbol R, \boldsymbol Z)`.
The per-atom POD descriptors include one, two, three, four, five, six, and seven-body The per-atom POD descriptors include one, two, three, four, five, six,
descriptors, which can be specified in the first input file. Furthermore, the per-atom POD descriptors and seven-body descriptors, which can be specified in the first input
also depend on the number of environment clusters specified in the first input file. file. Furthermore, the per-atom POD descriptors also depend on the
Please see :ref:`(Nguyen2024) <Nguyen20242>` and :ref:`(Nguyen and Sema) <Nguyen20243>` for the detailed description of the per-atom POD descriptors. number of environment clusters specified in the first input file.
Please see :ref:`(Nguyen2024) <Nguyen20242a>` and :ref:`(Nguyen and Sema)
<Nguyen20243a>` for the detailed description of the per-atom POD
descriptors.
Training Training
"""""""" """"""""
POD potential is trained using the least-squares regression against A POD potential is trained using the least-squares regression against
density functional theory (DFT) data. Let :math:`J` be the number of density functional theory (DFT) data. Let :math:`J` be the number of
training configurations, with :math:`N_j` being the number of atoms in training configurations, with :math:`N_j` being the number of atoms in
the j-th configuration. The training configurations are extracted from the j-th configuration. The training configurations are extracted from
the extended XYZ files located in a directory (i.e., path_to_training_data_set the extended XYZ files located in a directory (i.e.,
in the second input file). Let :math:`\{E^{\star}_j\}_{j=1}^{J}` and path_to_training_data_set in the second input file). Let
:math:`\{\boldsymbol F^{\star}_j\}_{j=1}^{J}` be the DFT energies and :math:`\{E^{\star}_j\}_{j=1}^{J}` and :math:`\{\boldsymbol
forces for :math:`J` configurations. Next, we calculate the global F^{\star}_j\}_{j=1}^{J}` be the DFT energies and forces for :math:`J`
descriptors and their derivatives for all training configurations. Let configurations. Next, we calculate the global descriptors and their
:math:`d_{jm}, 1 \le m \le M`, be the global descriptors associated with derivatives for all training configurations. Let :math:`d_{jm}, 1 \le m
the j-th configuration, where :math:`M` is the number of global \le M`, be the global descriptors associated with the j-th
descriptors. We then form a matrix :math:`\boldsymbol A \in configuration, where :math:`M` is the number of global descriptors. We
\mathbb{R}^{J \times M}` with entries :math:`A_{jm} = d_{jm}/ N_j` for then form a matrix :math:`\boldsymbol A \in \mathbb{R}^{J \times M}`
:math:`j=1,\ldots,J` and :math:`m=1,\ldots,M`. Moreover, we form a with entries :math:`A_{jm} = d_{jm}/ N_j` for :math:`j=1,\ldots,J` and
matrix :math:`\boldsymbol B \in \mathbb{R}^{\mathcal{N} \times M}` by :math:`m=1,\ldots,M`. Moreover, we form a matrix :math:`\boldsymbol B
stacking the derivatives of the global descriptors for all training \in \mathbb{R}^{\mathcal{N} \times M}` by stacking the derivatives of
configurations from top to bottom, where :math:`\mathcal{N} = the global descriptors for all training configurations from top to
3\sum_{j=1}^{J} N_j`. bottom, where :math:`\mathcal{N} = 3\sum_{j=1}^{J} N_j`.
The coefficient vector :math:`\boldsymbol c` of the POD potential is The coefficient vector :math:`\boldsymbol c` of the POD potential is
found by solving the following least-squares problem found by solving the following least-squares problem
@ -311,20 +325,22 @@ found by solving the following least-squares problem
where :math:`w_E` and :math:`w_F` are weights for the energy where :math:`w_E` and :math:`w_F` are weights for the energy
(*fitting_weight_energy*) and force (*fitting_weight_force*), (*fitting_weight_energy*) and force (*fitting_weight_force*),
respectively; and :math:`w_R` is the regularization parameter (*fitting_regularization_parameter*). Here :math:`\bar{\boldsymbol E}^{\star} \in respectively; and :math:`w_R` is the regularization parameter
\mathbb{R}^{J}` is a vector of with entries :math:`\bar{E}^{\star}_j = (*fitting_regularization_parameter*). Here :math:`\bar{\boldsymbol
E^{\star}_j/N_j` and :math:`\boldsymbol F^{\star}` is a vector of E}^{\star} \in \mathbb{R}^{J}` is a vector of with entries
:math:`\mathcal{N}` entries obtained by stacking :math:`\{\boldsymbol :math:`\bar{E}^{\star}_j = E^{\star}_j/N_j` and :math:`\boldsymbol
F^{\star}_j\}_{j=1}^{J}` from top to bottom. F^{\star}` is a vector of :math:`\mathcal{N}` entries obtained by
stacking :math:`\{\boldsymbol F^{\star}_j\}_{j=1}^{J}` from top to
bottom.
Validation Validation
"""""""""" """"""""""
POD potential can be validated on a test dataset in a directory specified POD potential can be validated on a test dataset in a directory
by setting path_to_test_data_set in the second input file. It is possible to specified by setting path_to_test_data_set in the second input file. It
validate the POD potential after the training is complete. This is done by is possible to validate the POD potential after the training is
providing the coefficient file as an input to :doc:`fitpod <fitpod_command>`, complete. This is done by providing the coefficient file as an input to
for example, :doc:`fitpod <fitpod_command>`, for example,
.. code-block:: LAMMPS .. code-block:: LAMMPS
@ -353,19 +369,19 @@ The keyword defaults are also given in the description of the input files.
---------- ----------
.. _Nguyen20222: .. _Nguyen20222a:
**(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023). **(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).
.. _Nguyen20232: .. _Nguyen20232a:
**(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023). **(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023).
.. _Nguyen20242: .. _Nguyen20242a:
**(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024). **(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024).
.. _Nguyen20243: .. _Nguyen20243a:
**(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024). **(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).

View File

@ -1,4 +1,5 @@
.. index:: pair_style pod .. index:: pair_style pod
.. index:: pair_style pod/kk
pair_style pod command pair_style pod command
======================== ========================
@ -26,23 +27,25 @@ Description
.. versionadded:: 22Dec2022 .. versionadded:: 22Dec2022
Pair style *pod* defines the proper orthogonal descriptor (POD) Pair style *pod* defines the proper orthogonal descriptor (POD)
potential :ref:`(Nguyen and Rohskopf) <Nguyen20222>`, potential :ref:`(Nguyen and Rohskopf) <Nguyen20222b>`,
:ref:`(Nguyen2023) <Nguyen20232>`, :ref:`(Nguyen2024) <Nguyen20242>`, and :ref:`(Nguyen and Sema) <Nguyen20243>`. :ref:`(Nguyen2023) <Nguyen20232b>`, :ref:`(Nguyen2024) <Nguyen20242b>`,
The :doc:`fitpod <fitpod_command>` is used to fit the POD potential. and :ref:`(Nguyen and Sema) <Nguyen20243b>`. The :doc:`fitpod
<fitpod_command>` is used to fit the POD potential.
Only a single pair_coeff command is used with the *pod* style which Only a single pair_coeff command is used with the *pod* style which
specifies a POD parameter file followed by a coefficient file, specifies a POD parameter file followed by a coefficient file, a
a projection matrix file, and a centroid file. projection matrix file, and a centroid file.
The POD parameter file (``Ta_param.pod``) can contain blank and comment lines The POD parameter file (``Ta_param.pod``) can contain blank and comment
(start with #) anywhere. Each non-blank non-comment line must contain lines (start with #) anywhere. Each non-blank non-comment line must
one keyword/value pair. See :doc:`fitpod <fitpod_command>` for the description contain one keyword/value pair. See :doc:`fitpod <fitpod_command>` for
of all the keywords that can be assigned in the parameter file. the description of all the keywords that can be assigned in the
parameter file.
The coefficient file (``Ta_coefficients.pod``) contains coefficients for the The coefficient file (``Ta_coefficients.pod``) contains coefficients for
POD potential. The top of the coefficient file can contain any number of the POD potential. The top of the coefficient file can contain any
blank and comment lines (start with #), but follows a strict format number of blank and comment lines (start with #), but follows a strict
after that. The first non-blank non-comment line must contain: format after that. The first non-blank non-comment line must contain:
* model_coefficients: *ncoeff* *nproj* *ncentroid* * model_coefficients: *ncoeff* *nproj* *ncentroid*
@ -124,19 +127,19 @@ none
---------- ----------
.. _Nguyen20222: .. _Nguyen20222b:
**(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023). **(Nguyen and Rohskopf)** Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).
.. _Nguyen20232: .. _Nguyen20232b:
**(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023). **(Nguyen2023)** Nguyen, Physical Review B, 107(14), 144103, (2023).
.. _Nguyen20242: .. _Nguyen20242b:
**(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024). **(Nguyen2024)** Nguyen, Journal of Computational Physics, 113102, (2024).
.. _Nguyen20243: .. _Nguyen20243b:
**(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024). **(Nguyen and Sema)** Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).

View File

@ -3816,6 +3816,7 @@ typeJ
typelabel typelabel
typeN typeN
typesafe typesafe
typestr
Tz Tz
Tzou Tzou
ub ub