From adb3343a173079a31a450470201bf85af9f6dfb0 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Wed, 25 Sep 2024 23:14:50 -0400 Subject: [PATCH 01/13] Start general document about file formats --- doc/src/Run_formats.rst | 119 ++++++++++++++++++++ doc/src/Run_head.rst | 10 +- doc/utils/sphinx-config/false_positives.txt | 1 + 3 files changed, 126 insertions(+), 4 deletions(-) create mode 100644 doc/src/Run_formats.rst diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst new file mode 100644 index 0000000000..896ddcade2 --- /dev/null +++ b/doc/src/Run_formats.rst @@ -0,0 +1,119 @@ +File formats used by LAMMPS +=========================== + +This page provides a general overview of the kinds of files and file +formats that LAMMPS is reading and writing. + +Character Encoding +^^^^^^^^^^^^^^^^^^ + +For text files, LAMMPS uses `ASCII character encoding +`_ which represents the digits 0 to +9, the lower and upper case letters a to z, some common punctuation and +other symbols and a few whitespace characters including a regular "space +character", "line feed", "carriage return", "tabulator". These are all +represented by bytes with a value smaller than 128 and only 95 of those +128 values represent printable characters. This is sufficient to represent +most English text, but misses accented characters or umlauts or Greek +symbols and more. + +Modern text often uses `UTF-8 character encoding +`_ instead. This is a way to +represent many more different characters as defined by the Unicode +standard. This is compatible with ASCII, since the first 128 values are +identical with the ASCIII encoding. It is important to note, however, +that there are Unicode characters that look similar or even identical to +ASCII characters, but have a different representation. As a general +rule, these characters are not be correctly recognized by LAMMPS. For +some parts of LAMMPS' text processing, there are translation tables with +known "lookalike" characters that will transparently substitute them +with their ASCII equivalents. Non-ASCII lookalike characters are often +used by web browsers or PDF viewers to improve the readability of +text. Thus, when using copy-n-paste to transfer text from such an +application to your input file, you may unintentionally create text that +is not fully in ASCII encoding and may cause errors when LAMMPS is +trying to read it. + +Lines with non-printable and non-ASCII characters in text files can be +detected for example with: + +.. code-block:: bash + + env LC_ALL=C grep -n '[^ -~]' some_file.txt + +Number Formatting +^^^^^^^^^^^^^^^^^ + +Different countries and languages have different conventions to format +numbers. While in some regions commas are used for fractions and points +to indicate thousand, million and so on, this is reversed in other +regions. Modern operating systems have facilities to adjust input and +output accordingly. The exact rules are often applied according to the +value of the ``$LANG`` environment variable (e.g. "en_US.utf8"). + +For the sake of simplicity of the implementation and transferability of +results, LAMMPS does not support this and instead expects numbers being +formatted in the generic or "C" locale. The "C" locale has no +punctuation for thousand, million and so on and uses a decimal point for +fractions. One thousand would be represented as "1000.0" and not as +"1,000.0" nor as "1.000,0". + +LAMMPS also only accepts integer numbers when an integer is required, +so using "1.0" is not accepted; you have to use "1" instead. + +For floating point numbers in scientific notation, the Fortran double +precision notation "1.1d3" is not accepted either; you have to use +"1100", "1100.0" or "1.1e3". + +Input file +^^^^^^^^^^ + +A LAMMPS input file is a text file with commands. It is read +line-by-line and each line is processed *immediately*. Before looking +for commands and executing them, there is a pre-processing step where +`${variable}` and `$(expression)` constructs are expanded or evaluated +and lines that end in the ampersand character '&' are combined with the +next line (similar to Fortran 90 free format source code). + +The LAMMPS input syntax has minimal support for conditionals and loops, +but if more complex operations are required, it is recommended to use +the library interface, e.g. :doc:`from Python using the LAMMPS Python +module `. + +There is a frequent misconception about the :doc:`if command `: +this is a command for conditional execution **outside** a run or +minimization. To trigger actions on specific conditions **during** +a run is a non-trivial operation that usually requires adopting one +of the available fix commands or creating a new one. + +LAMMPS commands can change the internal state and thus the order of +commands matters and reordering them can produce different results. + +Each line must have an "end-of-line" character (line feed or carriage +return plus line feed). Some text editors do not automatically insert +one which may have the result that LAMMPS ignores the last command. +It is thus recommended, to always have an empty line at the end of an +input file. + +The specific details describing how LAMMPS input is processed and parsed +are explained in :doc:`Commands_parse`. + +Data file +^^^^^^^^^ + + +Molecule file +^^^^^^^^^^^^^ + + +Potential file +^^^^^^^^^^^^^^ + + +Restart file +^^^^^^^^^^^^ + + +Dump file +^^^^^^^^^ + diff --git a/doc/src/Run_head.rst b/doc/src/Run_head.rst index 5da5942d9b..6739df5cbb 100644 --- a/doc/src/Run_head.rst +++ b/doc/src/Run_head.rst @@ -1,10 +1,11 @@ Run LAMMPS ********** -These pages explain how to run LAMMPS once you have :doc:`installed an executable ` or :doc:`downloaded the source code ` -and :doc:`built an executable `. The :doc:`Commands ` -doc page describes how input scripts are structured and the commands -they can contain. +These pages explain how to run LAMMPS once you have :doc:`installed an +executable ` or :doc:`downloaded the source code ` and +:doc:`built an executable `. The :doc:`Commands ` doc +page describes how input scripts are structured and the commands they +can contain. .. toctree:: :maxdepth: 1 @@ -12,4 +13,5 @@ they can contain. Run_basics Run_options Run_output + Run_formats Run_windows diff --git a/doc/utils/sphinx-config/false_positives.txt b/doc/utils/sphinx-config/false_positives.txt index 70d6b4e323..694e4ec871 100644 --- a/doc/utils/sphinx-config/false_positives.txt +++ b/doc/utils/sphinx-config/false_positives.txt @@ -3930,6 +3930,7 @@ username usleep usolve usr +utf util utils utsa From c36e1a9c8e4acffc3d4b985fa28430091e785be2 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Sun, 29 Sep 2024 21:41:54 -0400 Subject: [PATCH 02/13] save current status to git --- doc/src/Run_formats.rst | 144 +++++++++++++++++++++++++++++++++++----- 1 file changed, 128 insertions(+), 16 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index 896ddcade2..6e3c87cf79 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -24,18 +24,18 @@ standard. This is compatible with ASCII, since the first 128 values are identical with the ASCIII encoding. It is important to note, however, that there are Unicode characters that look similar or even identical to ASCII characters, but have a different representation. As a general -rule, these characters are not be correctly recognized by LAMMPS. For -some parts of LAMMPS' text processing, there are translation tables with -known "lookalike" characters that will transparently substitute them -with their ASCII equivalents. Non-ASCII lookalike characters are often -used by web browsers or PDF viewers to improve the readability of -text. Thus, when using copy-n-paste to transfer text from such an +rule, these characters are not correctly recognized by LAMMPS. For some +parts of LAMMPS' text processing, translation tables with known +"lookalike" characters are used that transparently substitute non-ASCII +characters with their ASCII equivalents. Non-ASCII lookalike characters +are often used by web browsers or PDF viewers to improve the readability +of text. Thus, when using copy-n-paste to transfer text from such an application to your input file, you may unintentionally create text that -is not fully in ASCII encoding and may cause errors when LAMMPS is -trying to read it. +is not exclusively using ASCII encoding and may cause errors when LAMMPS +is trying to read it. Lines with non-printable and non-ASCII characters in text files can be -detected for example with: +detected for example with a (Linux) command like the following: .. code-block:: bash @@ -71,9 +71,55 @@ Input file A LAMMPS input file is a text file with commands. It is read line-by-line and each line is processed *immediately*. Before looking for commands and executing them, there is a pre-processing step where -`${variable}` and `$(expression)` constructs are expanded or evaluated +comments (text starting with a pound sign '#') are removed, +`${variable}` and `$(expression)` constructs are expanded or evaluated, and lines that end in the ampersand character '&' are combined with the -next line (similar to Fortran 90 free format source code). +next line (similar to Fortran 90 free format source code). After the +pre-processing, lines are split into "words" and the first word must be a +:doc:`command ` and everything . Below are some example lines: + +.. code-block:: LAMMPS + + # full line comment + + # some global settings + units lj + atom_style atomic + # ^^ command ^^ argument(s) + + variable x index 1 # may be overridden from command line with -var x + variable xx equal 20*$x # variable "xx" is always 20 time "x" + + lattice fcc 0.8442 + + # multi-line command, uses spacing from "lattice" command + region box block 0.0 ${xx} & + 0.0 40.0 & + 0.0 30.0 + # create simulation box and fillwith atoms according to lattice setting + create_box 1 box + create_atoms 1 box + + # set force field and parameters + mass 1 1.0 + pair_style lj/cut 2.5 + pair_coeff 1 1 1.0 1.0 2.5 + + # run simulation + fix 1 all nve + run 1000 + +The pivotal command in this example input is the :doc:`create_box +command `. It defines the simulation system and many +parameters that go with it: units, atom style, number of atom types (and +other types) and more. Those settings are *locked in* after the box is +created. Commands that change these kind of settings are only allowed +**before** a simulation box is created and many other commands are only +allowed **after** the simulation box is defined (e.g. :doc:`pair_coeff +`). Very few commands (e.g. :doc:`pair_style `) +may be used in either part of the input. The :doc:`read_data +` and :doc:`read_restart ` commands also create +the system box and thus have a similar pivotal function. The LAMMPS input syntax has minimal support for conditionals and loops, but if more complex operations are required, it is recommended to use @@ -86,14 +132,16 @@ minimization. To trigger actions on specific conditions **during** a run is a non-trivial operation that usually requires adopting one of the available fix commands or creating a new one. -LAMMPS commands can change the internal state and thus the order of -commands matters and reordering them can produce different results. +LAMMPS commands change the internal state and thus the order of commands +matters and reordering them can produce different results. For example, +the region defined by the :doc:`region command ` in the example +above depends on the :doc:`lattice setting ` and thus its +dimensions will be different depending on the order of the two commands. Each line must have an "end-of-line" character (line feed or carriage return plus line feed). Some text editors do not automatically insert -one which may have the result that LAMMPS ignores the last command. -It is thus recommended, to always have an empty line at the end of an -input file. +one which may cause LAMMPS to ignore the last command. It is thus +recommended, to always have an empty line at the end of an input file. The specific details describing how LAMMPS input is processed and parsed are explained in :doc:`Commands_parse`. @@ -101,6 +149,70 @@ are explained in :doc:`Commands_parse`. Data file ^^^^^^^^^ +A LAMMPS data file contains a description of a system suitable for +reading with the :doc:`read_data command `. This is commonly +used for setting up more complex and particularly molecular systems +which can be difficult to achieve with the commands :doc:`create_box +` and :doc:`create_atoms ` alone. Also, data +files can be used as a portable alternatives to a :doc:`binary restart +file `. A restart file can be converted into a data file +from the :doc:`command line `. + +The file is generally structured into a header section at the very +beginning of the file and multiple titled sections like "Atoms", +Masses", "Pair Coeffs", and so on. The data file **always** starts +with a "title" line, which will be **ignored** by LAMMPS. Omitting +the title line can lead to unexpected behavior as then a line of +the header with an actual setting may be ignored. This is often a +line with the "atoms" keyword, which results in LAMMPS assuming that +there are no atoms in the data file and thus throwing an error on the +contents of the "Atoms" section. The title line may contain some +keywords that can be used by external programs to convey information +about the system, that is not required and not read by LAMMPS. + +Data files may contain comments, which start with the pound sign '#'. +There must be at least one blank between a valid keyword and the pound +sign. + +.. code-block:: bash + + LAMMPS Title line (ignored) + # full line comment + + 10 atoms # comment + 4 atom types + + -36.840194 64.211560 xlo xhi + -41.013691 68.385058 ylo yhi + -29.768095 57.139462 zlo zhi + + Masses + + 1 12.0110 + 2 12.0110 + 3 15.9990 + 4 1.0080 + + Pair Coeffs + + 1 0.110000 3.563595 0.110000 3.563595 + 2 0.080000 3.670503 0.010000 3.385415 + 3 0.120000 3.029056 0.120000 2.494516 + 4 0.022000 2.351973 0.022000 2.351973 + + Atoms # full + + 1 1 1 0.560 43.99993 58.52678 36.78550 0 0 0 + 2 1 2 -0.270 45.10395 58.23499 35.86693 0 0 0 + 3 1 3 -0.510 43.81519 59.54928 37.43995 0 0 0 + 4 1 4 0.090 45.71714 57.34797 36.13434 0 0 0 + 5 1 4 0.090 45.72261 59.13657 35.67007 0 0 0 + 6 1 4 0.090 44.66624 58.09539 34.85538 0 0 0 + 7 1 3 -0.470 43.28193 57.47427 36.91953 0 0 0 + 8 1 4 0.070 42.07157 57.45486 37.62418 0 0 0 + 9 1 1 0.510 42.19985 57.57789 39.12163 0 0 0 + 10 1 1 0.510 41.88641 58.62251 39.70398 0 0 0 + # ^^atomID ^^molID ^^type ^^charge ^^xcoord ^^ycoord ^^ycoord ^^image^^flags Molecule file ^^^^^^^^^^^^^ From d658c589f73d003611da890df3c388c9b0fc6ec8 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Tue, 25 Mar 2025 21:47:04 -0400 Subject: [PATCH 03/13] update formulations some more --- doc/src/Run_formats.rst | 74 ++++++++++++++++++++++------------------- 1 file changed, 40 insertions(+), 34 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index 6e3c87cf79..a8749d1f38 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -7,32 +7,33 @@ formats that LAMMPS is reading and writing. Character Encoding ^^^^^^^^^^^^^^^^^^ -For text files, LAMMPS uses `ASCII character encoding -`_ which represents the digits 0 to -9, the lower and upper case letters a to z, some common punctuation and -other symbols and a few whitespace characters including a regular "space -character", "line feed", "carriage return", "tabulator". These are all -represented by bytes with a value smaller than 128 and only 95 of those -128 values represent printable characters. This is sufficient to represent -most English text, but misses accented characters or umlauts or Greek -symbols and more. +For processing text files, the LAMMPS source code assumes `ASCII +character encoding `_ which +represents the digits 0 to 9, the lower and upper case letters a to z, +some common punctuation and other symbols and a few whitespace +characters including a regular "space character", "line feed", "carriage +return", "tabulator". These are all represented by single bytes with a +value smaller than 128 and only 95 of those 128 values represent +printable characters. This is sufficient to represent most English +text, but misses accented characters or umlauts or Greek symbols and +more. Modern text often uses `UTF-8 character encoding -`_ instead. This is a way to +`_ instead. This is a way to represent many more different characters as defined by the Unicode standard. This is compatible with ASCII, since the first 128 values are -identical with the ASCIII encoding. It is important to note, however, -that there are Unicode characters that look similar or even identical to -ASCII characters, but have a different representation. As a general -rule, these characters are not correctly recognized by LAMMPS. For some -parts of LAMMPS' text processing, translation tables with known -"lookalike" characters are used that transparently substitute non-ASCII -characters with their ASCII equivalents. Non-ASCII lookalike characters -are often used by web browsers or PDF viewers to improve the readability -of text. Thus, when using copy-n-paste to transfer text from such an -application to your input file, you may unintentionally create text that -is not exclusively using ASCII encoding and may cause errors when LAMMPS -is trying to read it. +identical with the ASCII encoding. It is important to note, however, +that there are Unicode characters that *look* similar to ASCII +characters, but have a different binary representation. As a general +rule, these characters may not be correctly recognized by LAMMPS. For +some parts of LAMMPS' text processing, translation tables with known +"lookalike" characters are used. Those transparently substitute +non-ASCII characters with their ASCII equivalents. Non-ASCII lookalike +characters are often used by web browsers or PDF viewers to improve the +readability of text. Thus, when using copy-n-paste to transfer text +from such an application to your input file, you may unintentionally +create text that is not exclusively using ASCII encoding and may cause +errors when LAMMPS is trying to read it. Lines with non-printable and non-ASCII characters in text files can be detected for example with a (Linux) command like the following: @@ -48,22 +49,27 @@ Different countries and languages have different conventions to format numbers. While in some regions commas are used for fractions and points to indicate thousand, million and so on, this is reversed in other regions. Modern operating systems have facilities to adjust input and -output accordingly. The exact rules are often applied according to the -value of the ``$LANG`` environment variable (e.g. "en_US.utf8"). +output accordingly that are collectively referred to as "native language +support" (NLS). The exact rules are often applied according to the +value of the ``$LANG`` environment variable (e.g. "en_US.utf8" for +English text in UTF-8 encoding). For the sake of simplicity of the implementation and transferability of results, LAMMPS does not support this and instead expects numbers being formatted in the generic or "C" locale. The "C" locale has no punctuation for thousand, million and so on and uses a decimal point for fractions. One thousand would be represented as "1000.0" and not as -"1,000.0" nor as "1.000,0". +"1,000.0" nor as "1.000,0". Having native language support enabled for +a locale other than "C" will result in different behavior when converting +or formatting numbers that can trigger unexpected errors. -LAMMPS also only accepts integer numbers when an integer is required, -so using "1.0" is not accepted; you have to use "1" instead. +LAMMPS also only accepts integer numbers when an integer is required, so +using floating point equivalents like "1.0" are not accepted; you *must* +use "1" instead. For floating point numbers in scientific notation, the Fortran double -precision notation "1.1d3" is not accepted either; you have to use -"1100", "1100.0" or "1.1e3". +precision notation "1.1d3" is not accepted; you have to use "1100", +"1100.0" or "1.1e3". Input file ^^^^^^^^^^ @@ -71,12 +77,13 @@ Input file A LAMMPS input file is a text file with commands. It is read line-by-line and each line is processed *immediately*. Before looking for commands and executing them, there is a pre-processing step where -comments (text starting with a pound sign '#') are removed, +comments (non-quoted text starting with a pound sign '#') are removed, `${variable}` and `$(expression)` constructs are expanded or evaluated, and lines that end in the ampersand character '&' are combined with the next line (similar to Fortran 90 free format source code). After the pre-processing, lines are split into "words" and the first word must be a -:doc:`command ` and everything . Below are some example lines: +:doc:`command ` and everything else is considered argument. +Below are some example lines: .. code-block:: LAMMPS @@ -92,11 +99,11 @@ pre-processing, lines are split into "words" and the first word must be a lattice fcc 0.8442 - # multi-line command, uses spacing from "lattice" command + # multi-line command, uses spacing from "lattice" command, else add "units box" to command region box block 0.0 ${xx} & 0.0 40.0 & 0.0 30.0 - # create simulation box and fillwith atoms according to lattice setting + # create simulation box and fill with atoms according to lattice setting create_box 1 box create_atoms 1 box @@ -228,4 +235,3 @@ Restart file Dump file ^^^^^^^^^ - From bc1b22a2f867db4f26bc8ed47338493c1cc65e8d Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Tue, 25 Mar 2025 23:37:32 -0400 Subject: [PATCH 04/13] finish (for now) the summary of the data file format --- doc/src/Run_formats.rst | 101 +++++++++++++++++++++++++++++++++++----- 1 file changed, 89 insertions(+), 12 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index a8749d1f38..54830549e8 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -167,19 +167,38 @@ from the :doc:`command line `. The file is generally structured into a header section at the very beginning of the file and multiple titled sections like "Atoms", -Masses", "Pair Coeffs", and so on. The data file **always** starts -with a "title" line, which will be **ignored** by LAMMPS. Omitting -the title line can lead to unexpected behavior as then a line of -the header with an actual setting may be ignored. This is often a -line with the "atoms" keyword, which results in LAMMPS assuming that -there are no atoms in the data file and thus throwing an error on the -contents of the "Atoms" section. The title line may contain some -keywords that can be used by external programs to convey information -about the system, that is not required and not read by LAMMPS. +Masses", "Pair Coeffs", and so on. Header keywords can only be used +*before* the first title section. + +The data file **always** starts with a "title" line, which will be +**ignored** by LAMMPS. Omitting the title line can lead to unexpected +behavior as then a line of the header with an actual setting may be +ignored. This is often a line with the "atoms" keyword, which results +in LAMMPS assuming that there are no atoms in the data file and thus +throwing an error on the contents of the "Atoms" section. The title +line may contain some keywords that can be used by external programs to +convey information about the system (included as comments), that is not +required and not read by LAMMPS. + +The line following a section title is also **ignored**. Skipping it +will lead to short reads and thus errors. The number of lines in titled +sections depends on header keywords, like the number of atom types, the +number of atoms, the number of bond types, or the number of bonds and so +on. The data in those sections has to be complete. A special case are +the "Pair Coeffs" and "PairIJ Coeffs" sections; the former is for force +fields and pair styles that use mixing of non-bonded potential +parameters, the latter for pair styles and force fields requiring +explicit coefficients. Thus with *N* being the number of atom types, +the "Pair Coeffs" section has *N* entries while "PairIJ Coeffs" has +:math:`N \cdot (N-1)` entries. Internally, these sections will be +converted to :doc:`pair_coeff ` commands. Thus the +corresponding :doc:`pair style ` must have been set *before* +the :doc:`read_data command ` reads the data file. Data files may contain comments, which start with the pound sign '#'. There must be at least one blank between a valid keyword and the pound -sign. +sign. Below is a simple example case of a data file for :doc:`atom style +full `. .. code-block:: bash @@ -200,7 +219,7 @@ sign. 3 15.9990 4 1.0080 - Pair Coeffs + Pair Coeffs # this section is optional 1 0.110000 3.563595 0.110000 3.563595 2 0.080000 3.670503 0.010000 3.385415 @@ -219,11 +238,69 @@ sign. 8 1 4 0.070 42.07157 57.45486 37.62418 0 0 0 9 1 1 0.510 42.19985 57.57789 39.12163 0 0 0 10 1 1 0.510 41.88641 58.62251 39.70398 0 0 0 - # ^^atomID ^^molID ^^type ^^charge ^^xcoord ^^ycoord ^^ycoord ^^image^^flags + # ^^atomID ^^molID ^^type ^^charge ^^xcoord ^^ycoord ^^ycoord ^^image^^flags (optional) + Velocities # this section is optional + + 1 0.0050731 -0.00398928 0.00391473 + 2 -0.0175184 0.0173484 -0.00489207 + 3 0.00597225 -0.00202006 0.00166454 + 4 -0.010395 -0.0082582 0.00316419 + 5 -0.00390877 0.00470331 -0.00226911 + 6 -0.00111157 -0.00374545 -0.0169374 + 7 0.00209054 -0.00594936 -0.000124563 + 8 0.00635002 -0.0120093 -0.0110999 + 9 -0.004955 -0.0123375 0.000403422 + 10 0.00265028 -0.00189329 -0.00293198 + +The common problem is processing the "Atoms" section, since its format depends +on the :doc:`atom style ` used and that setting must be done in the +input file *before* reading the data file. To assist with detecting incompatible +data files, a comment is appended to the "Atoms" title indicating the atom style +used (or intended) when *writing* the data file. For example below is the same +section for :doc:`atom style charge `, which omits the molecule ID +column. + +.. code-block:: bash + + Atoms # charge + + 1 1 0.560 43.99993 58.52678 36.78550 + 2 2 -0.270 45.10395 58.23499 35.86693 + 3 3 -0.510 43.81519 59.54928 37.43995 + 4 4 0.090 45.71714 57.34797 36.13434 + 5 4 0.090 45.72261 59.13657 35.67007 + 6 4 0.090 44.66624 58.09539 34.85538 + 7 3 -0.470 43.28193 57.47427 36.91953 + 8 4 0.070 42.07157 57.45486 37.62418 + 9 1 0.510 42.19985 57.57789 39.12163 + 10 1 0.510 41.88641 58.62251 39.70398 + # ^^atomID ^^type ^^charge ^^xcoord ^^ycoord ^^ycoord + +Another source of confusion about the "Atoms" section format is the +ordering of columns. The three atom style variants `atom_style full`, +`atom_style hybrid charge molecular`, and `atom_style hybrid molecular +charge` all carry the same per-atom information, but in the data file +the Atoms section has the columns 'Atom-ID Molecule-ID Atom-type Charge +X Y Z' for atom style full, but hybrid atom styles the first columns are +always 'Atom-ID Atom-type X Y Z' and then followed by any *additional* +data added by the hybrid styles, and thus 'Charge Molecule-ID' for the +first hybrid style and 'Molecule-ID Charge' in the second hybrid style +variant. Finally, an alternative to a hybrid atom style is to use fix +property/atom, e.g. to add molecule IDs to atom style charge. In this +case the "Atoms" section is formatted according to atom style charge and +a new section, "Molecules" is added that contains lines with 'Atom-ID +Molecule-ID', one for each atom in the system. For adding charges +to atom style molecular with fix property/atom, the "Atoms" section is +now formatted according to the atom style and a "Charges" section is +added. + Molecule file ^^^^^^^^^^^^^ +Molecule files look quite similar to data files but they do not have a +compatible format, i.e. one cannot use a data file as molecule file and +vice versa. Potential file ^^^^^^^^^^^^^^ From 194b3408f7f835f8c944cdb35710b44c79ebb80a Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Wed, 26 Mar 2025 09:51:09 -0400 Subject: [PATCH 05/13] add section about molecule files --- doc/src/Run_formats.rst | 73 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 4 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index 54830549e8..2a8713c40d 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -294,14 +294,79 @@ Molecule-ID', one for each atom in the system. For adding charges to atom style molecular with fix property/atom, the "Atoms" section is now formatted according to the atom style and a "Charges" section is added. - + Molecule file ^^^^^^^^^^^^^ -Molecule files look quite similar to data files but they do not have a -compatible format, i.e. one cannot use a data file as molecule file and -vice versa. +Molecule files for use with the :doc:`molecule command ` look +quite similar to data files but they do not have a compatible format, +i.e. one cannot use a data file as molecule file and vice versa. Below +is a simple example for a water molecule (SPC/E model). Same as a data +file, there is an ignored title line and you can use comments. However, +there is no information about the number of types or the box dimensions. +These are set when the simulation box is created. Thus the header only +has the count of atoms, bonds, and so on. +While there also is a header part and sections and the sections must +come after the header, the (required) section names are may be +different. There is no "Atoms" section and the section format is +independent of the atom style. Its information is split across multiple +sections, like "Coords", "Types", and "Charges". Note that no "Masses" +section is needed here. The atom masses are by default tied to the atom +type and set with a data file or the :doc:`mass command `. A +"Masses" section would only be required for atom styles with per-atom +masses, e.g. atom style sphere. + +Since the entire file is a 'molecule', LAMMPS will assign a new +molecule-ID (if supported by the atom style) when atoms are instantiated +from a molecule file, e.g. with the :doc:`create_atoms command +`. It is possible to include a "Molecules" section, in +case the atoms belong to multiple 'molecules'. Atom-IDs and +molecule-IDs in the molecule file are relative for the file (starting +from 1) and will be translated into actual atom-IDs also when the +molecule is created. + +.. code-block:: bash + + # Water molecule. SPC/E model. + + 3 atoms + 2 bonds + 1 angles + + Coords + + 1 1.12456 0.09298 1.27452 + 2 1.53683 0.75606 1.89928 + 3 0.49482 0.56390 0.65678 + + Types + + 1 1 + 2 2 + 3 2 + + Charges + + 1 -0.8472 + 2 0.4236 + 3 0.4236 + + Bonds + + 1 1 1 2 + 2 1 1 3 + + Angles + + 1 1 2 1 3 + + +There are also optional sections, e.g. about :doc:`SHAKE ` and +:doc:`special bonds `. Those are only needed if the molecule +command is issues *before* the simulation box is defined. Otherwise, the +molecule command can derive the required settings internally. + Potential file ^^^^^^^^^^^^^^ From dcbc3c9dbc391c20b3692edfb681cce89c6ff85d Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Wed, 26 Mar 2025 09:52:19 -0400 Subject: [PATCH 06/13] whitespace --- doc/src/Run_formats.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index 2a8713c40d..a906425168 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -360,13 +360,13 @@ molecule is created. Angles 1 1 2 1 3 - + There are also optional sections, e.g. about :doc:`SHAKE ` and :doc:`special bonds `. Those are only needed if the molecule command is issues *before* the simulation box is defined. Otherwise, the molecule command can derive the required settings internally. - + Potential file ^^^^^^^^^^^^^^ From 7f0b71f7c016e3dd405ac39764561c70270cfaf4 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Wed, 26 Mar 2025 09:00:25 -0400 Subject: [PATCH 07/13] spelling --- doc/src/Errors_details.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/Errors_details.rst b/doc/src/Errors_details.rst index 9ca69c5068..c94d96b405 100644 --- a/doc/src/Errors_details.rst +++ b/doc/src/Errors_details.rst @@ -471,7 +471,7 @@ pair-wise additive pair styles like :doc:`Lennard-Jones `, :doc:`Morse `, :doc:`Born-Meyer-Huggins `, and similar. Such required callbacks have not been implemented for many-body potentials so one would have to implement them to add -compatiability with these computes (which may be difficult to do in a +compatibility with these computes (which may be difficult to do in a generic fashion). Whether this warning indicates that contributions to the computed properties are missing depends on the groups used. At any rate, careful testing of the results is advised when this warning @@ -931,7 +931,7 @@ the documentation carefully. XXX command before simulation box is defined -------------------------------------------- -This error occurs when trying to excute a LAMMPS command that requires +This error occurs when trying to execute a LAMMPS command that requires information about the system dimensions, or the number atom, bond, angle, dihedral, or improper types, or the number of atoms or similar data that is only available *after* the simulation box has been created. @@ -943,7 +943,7 @@ created ` for additional information. XXX command after simulation box is defined -------------------------------------------- -This error occurs when trying to excute a LAMMPS command that changes a +This error occurs when trying to execute a LAMMPS command that changes a global setting *after* it is locked in when the simulation box is created (for instance defining the :doc:`atom style `, :doc:`dimension `, :doc:`newton `, or :doc:`units From 2542b989ee3aaf8c5b4df82b0b074018e89e098d Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Wed, 26 Mar 2025 19:59:52 -0400 Subject: [PATCH 08/13] small tweak --- doc/src/Run_formats.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index a906425168..a67a28af3b 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -1,9 +1,16 @@ + File formats used by LAMMPS =========================== This page provides a general overview of the kinds of files and file formats that LAMMPS is reading and writing. +.. contents:: On this page + :depth: 2 + :backlinks: top + +------------------- + Character Encoding ^^^^^^^^^^^^^^^^^^ From 738fb4a502e2e35e9910b93b29eafffc28f1eb0e Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Thu, 27 Mar 2025 16:29:13 -0400 Subject: [PATCH 09/13] add info about restart files --- doc/src/Run_formats.rst | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index a67a28af3b..fe48347edc 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -374,13 +374,46 @@ There are also optional sections, e.g. about :doc:`SHAKE ` and command is issues *before* the simulation box is defined. Otherwise, the molecule command can derive the required settings internally. -Potential file -^^^^^^^^^^^^^^ - +Native Dump file +^^^^^^^^^^^^^^^^ Restart file ^^^^^^^^^^^^ +LAMMPS restart files are binary files and not available in text format. +They can be identified by the first few bytes that contain the (C-style) +string "LammpS RestartT" as `magic string +`_. This is followed by a +16-bit integer of the number 1 used for detecting whether the computer +writing the restart has the same `endianness +`_ as the computer reading it. +If not the file cannot be read correctly. This is followed by a 32-bit +integer indicating the file format revision (currently 3), which can be +used to implement backward compatibility for reading older revisions. + +This information has been added to the `Unix "file" command's +` "magic" file so that restart files +can be identified without opening them. If you have a fairly recent +version, it should already be included. If you have an older version, +the LAMMPS source package :ref:`contains a file with the necessary +additions `. + +The rest of the file is organized in sections of a 32-bit signed integer +constant indicating the kind of content and the corresponding value (or +values). If those values are arrays (including C-style strings), then +the integer constant is followed by a 32-bit integer indicating the +length of the array. This mechanism will read the data regardless of +the ordering of the sections. Symbolic names of the section constants +are in the ``lmprestart.h`` header file. + +LAMMPS restart files are not expected to be portable between platforms +or LAMMPS versions, but changes to the file format are rare. + Dump file ^^^^^^^^^ + + +Potential files +^^^^^^^^^^^^^^^ + From 9661c21052eee23a945d701b5a3d862e41cdb9da Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Fri, 28 Mar 2025 09:56:06 -0400 Subject: [PATCH 10/13] comment out possible additional sections --- doc/src/Run_formats.rst | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index fe48347edc..59a29f6bbb 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -410,10 +410,9 @@ LAMMPS restart files are not expected to be portable between platforms or LAMMPS versions, but changes to the file format are rare. -Dump file -^^^^^^^^^ - - -Potential files -^^^^^^^^^^^^^^^ +.. Dump file +.. ^^^^^^^^^ +.. +.. Potential files +.. ^^^^^^^^^^^^^^^ From 4dbf18e2c94b946109dd0b8e23fe098bfd8f5355 Mon Sep 17 00:00:00 2001 From: Jacob Gissinger Date: Fri, 28 Mar 2025 23:15:33 -0400 Subject: [PATCH 11/13] small suggested changes --- doc/src/Run_formats.rst | 115 ++++++++++++++++++++-------------------- 1 file changed, 57 insertions(+), 58 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index 59a29f6bbb..17c37a8e2c 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -19,25 +19,25 @@ character encoding `_ which represents the digits 0 to 9, the lower and upper case letters a to z, some common punctuation and other symbols and a few whitespace characters including a regular "space character", "line feed", "carriage -return", "tabulator". These are all represented by single bytes with a -value smaller than 128 and only 95 of those 128 values represent -printable characters. This is sufficient to represent most English +return", "tabulator". These characters are all represented by single bytes +with a value smaller than 128 and only 95 of those 128 values represent +printable characters. This list is sufficient to represent most English text, but misses accented characters or umlauts or Greek symbols and more. Modern text often uses `UTF-8 character encoding -`_ instead. This is a way to -represent many more different characters as defined by the Unicode -standard. This is compatible with ASCII, since the first 128 values are +`_ instead. This encoding is a way +to represent many more different characters as defined by the Unicode +standard. UFT-8 is compatible with ASCII, since the first 128 values are identical with the ASCII encoding. It is important to note, however, that there are Unicode characters that *look* similar to ASCII characters, but have a different binary representation. As a general rule, these characters may not be correctly recognized by LAMMPS. For some parts of LAMMPS' text processing, translation tables with known -"lookalike" characters are used. Those transparently substitute +"lookalike" characters are used. The tables are used to substitute non-ASCII characters with their ASCII equivalents. Non-ASCII lookalike characters are often used by web browsers or PDF viewers to improve the -readability of text. Thus, when using copy-n-paste to transfer text +readability of text. Thus, when using copy and paste to transfer text from such an application to your input file, you may unintentionally create text that is not exclusively using ASCII encoding and may cause errors when LAMMPS is trying to read it. @@ -85,11 +85,11 @@ A LAMMPS input file is a text file with commands. It is read line-by-line and each line is processed *immediately*. Before looking for commands and executing them, there is a pre-processing step where comments (non-quoted text starting with a pound sign '#') are removed, -`${variable}` and `$(expression)` constructs are expanded or evaluated, +``${variable}`` and ``$(expression)`` constructs are expanded or evaluated, and lines that end in the ampersand character '&' are combined with the -next line (similar to Fortran 90 free format source code). After the -pre-processing, lines are split into "words" and the first word must be a -:doc:`command ` and everything else is considered argument. +next line (similar to Fortran 90 free-format source code). After the +pre-processing, lines are split into "words" and evaluated. The first word +must be a :doc:`command ` and all following words are arguments. Below are some example lines: .. code-block:: LAMMPS @@ -102,11 +102,12 @@ Below are some example lines: # ^^ command ^^ argument(s) variable x index 1 # may be overridden from command line with -var x - variable xx equal 20*$x # variable "xx" is always 20 time "x" + variable xx equal 20*$x # variable "xx" is always 20 times "x" lattice fcc 0.8442 - # multi-line command, uses spacing from "lattice" command, else add "units box" to command + # example of a command written across multiple lines + # the "region" command uses spacing from "lattice" command, unless "units box" is specified region box block 0.0 ${xx} & 0.0 40.0 & 0.0 30.0 @@ -144,7 +145,7 @@ There is a frequent misconception about the :doc:`if command `: this is a command for conditional execution **outside** a run or minimization. To trigger actions on specific conditions **during** a run is a non-trivial operation that usually requires adopting one -of the available fix commands or creating a new one. +of the available "fix" commands or creating a new "fix" command. LAMMPS commands change the internal state and thus the order of commands matters and reordering them can produce different results. For example, @@ -155,7 +156,7 @@ dimensions will be different depending on the order of the two commands. Each line must have an "end-of-line" character (line feed or carriage return plus line feed). Some text editors do not automatically insert one which may cause LAMMPS to ignore the last command. It is thus -recommended, to always have an empty line at the end of an input file. +recommended to always have an empty line at the end of an input file. The specific details describing how LAMMPS input is processed and parsed are explained in :doc:`Commands_parse`. @@ -164,33 +165,32 @@ Data file ^^^^^^^^^ A LAMMPS data file contains a description of a system suitable for -reading with the :doc:`read_data command `. This is commonly -used for setting up more complex and particularly molecular systems -which can be difficult to achieve with the commands :doc:`create_box -` and :doc:`create_atoms ` alone. Also, data -files can be used as a portable alternatives to a :doc:`binary restart -file `. A restart file can be converted into a data file -from the :doc:`command line `. +reading with the :doc:`read_data command `. Data files are +commonly used for setting up complex molecular systems that can be +difficult to achieve with the commands :doc:`create_box ` +and :doc:`create_atoms ` alone. Also, data files can be +used as a portable alternatives to a :doc:`binary restart file `. +A restart file can be converted into a data file from the +:doc:`command line `. -The file is generally structured into a header section at the very -beginning of the file and multiple titled sections like "Atoms", -Masses", "Pair Coeffs", and so on. Header keywords can only be used -*before* the first title section. +Data files have a header section at the very beginning of the file and +multiple titled sections such as "Atoms", Masses", "Pair Coeffs", and so on. +Header keywords can only be used *before* the first title section. The data file **always** starts with a "title" line, which will be **ignored** by LAMMPS. Omitting the title line can lead to unexpected -behavior as then a line of the header with an actual setting may be -ignored. This is often a line with the "atoms" keyword, which results -in LAMMPS assuming that there are no atoms in the data file and thus -throwing an error on the contents of the "Atoms" section. The title -line may contain some keywords that can be used by external programs to -convey information about the system (included as comments), that is not -required and not read by LAMMPS. +behavior because a line of the header with an actual setting may be +ignored. In this case, the mistakenly ignored line often contains the +"atoms" keyword, which results in LAMMPS assuming that there are no atoms +in the data file and thus throwing an error on the contents of the "Atoms" +section. The title line may contain some keywords that can be used by +external programs to convey information about the system (included as +comments), that is not required and not read by LAMMPS. -The line following a section title is also **ignored**. Skipping it -will lead to short reads and thus errors. The number of lines in titled +The line following a section title is also **ignored**. An error will occur +if an empty line is not placed after a section title. The number of lines in titled sections depends on header keywords, like the number of atom types, the -number of atoms, the number of bond types, or the number of bonds and so +number of atoms, the number of bond types, the number of bonds, and so on. The data in those sections has to be complete. A special case are the "Pair Coeffs" and "PairIJ Coeffs" sections; the former is for force fields and pair styles that use mixing of non-bonded potential @@ -261,10 +261,10 @@ full `. 10 0.00265028 -0.00189329 -0.00293198 The common problem is processing the "Atoms" section, since its format depends -on the :doc:`atom style ` used and that setting must be done in the +on the :doc:`atom style ` used, and that setting must be done in the input file *before* reading the data file. To assist with detecting incompatible data files, a comment is appended to the "Atoms" title indicating the atom style -used (or intended) when *writing* the data file. For example below is the same +used (or intended) when *writing* the data file. For example, below is an "Atoms" section for :doc:`atom style charge `, which omits the molecule ID column. @@ -287,11 +287,11 @@ column. Another source of confusion about the "Atoms" section format is the ordering of columns. The three atom style variants `atom_style full`, `atom_style hybrid charge molecular`, and `atom_style hybrid molecular -charge` all carry the same per-atom information, but in the data file +charge` all carry the same per-atom information. However, in data files, the Atoms section has the columns 'Atom-ID Molecule-ID Atom-type Charge -X Y Z' for atom style full, but hybrid atom styles the first columns are -always 'Atom-ID Atom-type X Y Z' and then followed by any *additional* -data added by the hybrid styles, and thus 'Charge Molecule-ID' for the +X Y Z' for atom style full, but for hybrid atom styles the first columns are +always 'Atom-ID Atom-type X Y Z' followed by any *additional* +data added by the hybrid styles, for example, 'Charge Molecule-ID' for the first hybrid style and 'Molecule-ID Charge' in the second hybrid style variant. Finally, an alternative to a hybrid atom style is to use fix property/atom, e.g. to add molecule IDs to atom style charge. In this @@ -307,27 +307,26 @@ Molecule file Molecule files for use with the :doc:`molecule command ` look quite similar to data files but they do not have a compatible format, -i.e. one cannot use a data file as molecule file and vice versa. Below +i.e., one cannot use a data file as molecule file and vice versa. Below is a simple example for a water molecule (SPC/E model). Same as a data file, there is an ignored title line and you can use comments. However, there is no information about the number of types or the box dimensions. -These are set when the simulation box is created. Thus the header only -has the count of atoms, bonds, and so on. +These parameters are set when the simulation box is created. Thus the +header only has the count of atoms, bonds, and so on. -While there also is a header part and sections and the sections must -come after the header, the (required) section names are may be -different. There is no "Atoms" section and the section format is -independent of the atom style. Its information is split across multiple -sections, like "Coords", "Types", and "Charges". Note that no "Masses" -section is needed here. The atom masses are by default tied to the atom -type and set with a data file or the :doc:`mass command `. A +Molecule files have a header followed by sections, but the section names are +different than those of a data file. There is no "Atoms" section and the +section format is independent of the atom style. Its information is split +across multiple sections, like "Coords", "Types", and "Charges". Note that +no "Masses" section is needed here. The atom masses are by default tied to +the atom type and set with a data file or the :doc:`mass command `. A "Masses" section would only be required for atom styles with per-atom masses, e.g. atom style sphere. Since the entire file is a 'molecule', LAMMPS will assign a new molecule-ID (if supported by the atom style) when atoms are instantiated from a molecule file, e.g. with the :doc:`create_atoms command -`. It is possible to include a "Molecules" section, in +`. It is possible to include a "Molecules" section in case the atoms belong to multiple 'molecules'. Atom-IDs and molecule-IDs in the molecule file are relative for the file (starting from 1) and will be translated into actual atom-IDs also when the @@ -370,8 +369,8 @@ molecule is created. There are also optional sections, e.g. about :doc:`SHAKE ` and -:doc:`special bonds `. Those are only needed if the molecule -command is issues *before* the simulation box is defined. Otherwise, the +:doc:`special bonds `. Those sections are only needed if the molecule +command is issued *before* the simulation box is defined. Otherwise, the molecule command can derive the required settings internally. Native Dump file @@ -383,11 +382,11 @@ Restart file LAMMPS restart files are binary files and not available in text format. They can be identified by the first few bytes that contain the (C-style) string "LammpS RestartT" as `magic string -`_. This is followed by a +`_. This string is followed by a 16-bit integer of the number 1 used for detecting whether the computer writing the restart has the same `endianness `_ as the computer reading it. -If not the file cannot be read correctly. This is followed by a 32-bit +If not, the file cannot be read correctly. This integer is followed by a 32-bit integer indicating the file format revision (currently 3), which can be used to implement backward compatibility for reading older revisions. From 990007c87b5c4938cc981a4ea3556eee85b715b3 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Fri, 28 Mar 2025 23:55:38 -0400 Subject: [PATCH 12/13] whitespace, rewrap, and comments --- doc/src/Run_formats.rst | 152 ++++++++++++++++++++-------------------- 1 file changed, 75 insertions(+), 77 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index 17c37a8e2c..ff74bf1c56 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -19,18 +19,18 @@ character encoding `_ which represents the digits 0 to 9, the lower and upper case letters a to z, some common punctuation and other symbols and a few whitespace characters including a regular "space character", "line feed", "carriage -return", "tabulator". These characters are all represented by single bytes -with a value smaller than 128 and only 95 of those 128 values represent -printable characters. This list is sufficient to represent most English -text, but misses accented characters or umlauts or Greek symbols and -more. +return", "tabulator". These characters are all represented by single +bytes with a value smaller than 128 and only 95 of those 128 values +represent printable characters. This list is sufficient to represent +most English text, but misses accented characters or umlauts or Greek +symbols and more. Modern text often uses `UTF-8 character encoding `_ instead. This encoding is a way to represent many more different characters as defined by the Unicode -standard. UFT-8 is compatible with ASCII, since the first 128 values are -identical with the ASCII encoding. It is important to note, however, -that there are Unicode characters that *look* similar to ASCII +standard. UFT-8 is compatible with ASCII, since the first 128 values +are identical with the ASCII encoding. It is important to note, +however, that there are Unicode characters that *look* similar to ASCII characters, but have a different binary representation. As a general rule, these characters may not be correctly recognized by LAMMPS. For some parts of LAMMPS' text processing, translation tables with known @@ -67,8 +67,8 @@ formatted in the generic or "C" locale. The "C" locale has no punctuation for thousand, million and so on and uses a decimal point for fractions. One thousand would be represented as "1000.0" and not as "1,000.0" nor as "1.000,0". Having native language support enabled for -a locale other than "C" will result in different behavior when converting -or formatting numbers that can trigger unexpected errors. +a locale other than "C" will result in different behavior when +converting or formatting numbers that can trigger unexpected errors. LAMMPS also only accepts integer numbers when an integer is required, so using floating point equivalents like "1.0" are not accepted; you *must* @@ -81,16 +81,16 @@ precision notation "1.1d3" is not accepted; you have to use "1100", Input file ^^^^^^^^^^ -A LAMMPS input file is a text file with commands. It is read +A LAMMPS input file is a text file with commands. It is read line-by-line and each line is processed *immediately*. Before looking for commands and executing them, there is a pre-processing step where comments (non-quoted text starting with a pound sign '#') are removed, -``${variable}`` and ``$(expression)`` constructs are expanded or evaluated, -and lines that end in the ampersand character '&' are combined with the -next line (similar to Fortran 90 free-format source code). After the -pre-processing, lines are split into "words" and evaluated. The first word -must be a :doc:`command ` and all following words are arguments. -Below are some example lines: +``${variable}`` and ``$(expression)`` constructs are expanded or +evaluated, and lines that end in the ampersand character '&' are +combined with the next line (similar to Fortran 90 free-format source +code). After the pre-processing, lines are split into "words" and +evaluated. The first word must be a :doc:`command ` and +all following words are arguments. Below are some example lines: .. code-block:: LAMMPS @@ -166,38 +166,38 @@ Data file A LAMMPS data file contains a description of a system suitable for reading with the :doc:`read_data command `. Data files are -commonly used for setting up complex molecular systems that can be +commonly used for setting up complex molecular systems that can be difficult to achieve with the commands :doc:`create_box ` and :doc:`create_atoms ` alone. Also, data files can be -used as a portable alternatives to a :doc:`binary restart file `. -A restart file can be converted into a data file from the +used as a portable alternatives to a :doc:`binary restart file +`. A restart file can be converted into a data file from the :doc:`command line `. Data files have a header section at the very beginning of the file and -multiple titled sections such as "Atoms", Masses", "Pair Coeffs", and so on. -Header keywords can only be used *before* the first title section. +multiple titled sections such as "Atoms", Masses", "Pair Coeffs", and so +on. Header keywords can only be used *before* the first title section. The data file **always** starts with a "title" line, which will be **ignored** by LAMMPS. Omitting the title line can lead to unexpected behavior because a line of the header with an actual setting may be ignored. In this case, the mistakenly ignored line often contains the -"atoms" keyword, which results in LAMMPS assuming that there are no atoms -in the data file and thus throwing an error on the contents of the "Atoms" -section. The title line may contain some keywords that can be used by -external programs to convey information about the system (included as -comments), that is not required and not read by LAMMPS. +"atoms" keyword, which results in LAMMPS assuming that there are no +atoms in the data file and thus throwing an error on the contents of the +"Atoms" section. The title line may contain some keywords that can be +used by external programs to convey information about the system +(included as comments), that is not required and not read by LAMMPS. -The line following a section title is also **ignored**. An error will occur -if an empty line is not placed after a section title. The number of lines in titled -sections depends on header keywords, like the number of atom types, the -number of atoms, the number of bond types, the number of bonds, and so -on. The data in those sections has to be complete. A special case are -the "Pair Coeffs" and "PairIJ Coeffs" sections; the former is for force -fields and pair styles that use mixing of non-bonded potential -parameters, the latter for pair styles and force fields requiring -explicit coefficients. Thus with *N* being the number of atom types, -the "Pair Coeffs" section has *N* entries while "PairIJ Coeffs" has -:math:`N \cdot (N-1)` entries. Internally, these sections will be +The line following a section title is also **ignored**. An error will +occur if an empty line is not placed after a section title. The number +of lines in titled sections depends on header keywords, like the number +of atom types, the number of atoms, the number of bond types, the number +of bonds, and so on. The data in those sections has to be complete. A +special case are the "Pair Coeffs" and "PairIJ Coeffs" sections; the +former is for force fields and pair styles that use mixing of non-bonded +potential parameters, the latter for pair styles and force fields +requiring explicit coefficients. Thus with *N* being the number of atom +types, the "Pair Coeffs" section has *N* entries while "PairIJ Coeffs" +has :math:`N \cdot (N-1)` entries. Internally, these sections will be converted to :doc:`pair_coeff ` commands. Thus the corresponding :doc:`pair style ` must have been set *before* the :doc:`read_data command ` reads the data file. @@ -260,12 +260,13 @@ full `. 9 -0.004955 -0.0123375 0.000403422 10 0.00265028 -0.00189329 -0.00293198 -The common problem is processing the "Atoms" section, since its format depends -on the :doc:`atom style ` used, and that setting must be done in the -input file *before* reading the data file. To assist with detecting incompatible -data files, a comment is appended to the "Atoms" title indicating the atom style -used (or intended) when *writing* the data file. For example, below is an "Atoms" -section for :doc:`atom style charge `, which omits the molecule ID +The common problem is processing the "Atoms" section, since its format +depends on the :doc:`atom style ` used, and that setting +must be done in the input file *before* reading the data file. To +assist with detecting incompatible data files, a comment is appended to +the "Atoms" title indicating the atom style used (or intended) when +*writing* the data file. For example, below is an "Atoms" section for +:doc:`atom style charge `, which omits the molecule ID column. .. code-block:: bash @@ -289,18 +290,17 @@ ordering of columns. The three atom style variants `atom_style full`, `atom_style hybrid charge molecular`, and `atom_style hybrid molecular charge` all carry the same per-atom information. However, in data files, the Atoms section has the columns 'Atom-ID Molecule-ID Atom-type Charge -X Y Z' for atom style full, but for hybrid atom styles the first columns are -always 'Atom-ID Atom-type X Y Z' followed by any *additional* -data added by the hybrid styles, for example, 'Charge Molecule-ID' for the +X Y Z' for atom style full, but for hybrid atom styles the first columns +are always 'Atom-ID Atom-type X Y Z' followed by any *additional* data +added by the hybrid styles, for example, 'Charge Molecule-ID' for the first hybrid style and 'Molecule-ID Charge' in the second hybrid style variant. Finally, an alternative to a hybrid atom style is to use fix property/atom, e.g. to add molecule IDs to atom style charge. In this case the "Atoms" section is formatted according to atom style charge and a new section, "Molecules" is added that contains lines with 'Atom-ID -Molecule-ID', one for each atom in the system. For adding charges -to atom style molecular with fix property/atom, the "Atoms" section is -now formatted according to the atom style and a "Charges" section is -added. +Molecule-ID', one for each atom in the system. For adding charges to +atom style molecular with fix property/atom, the "Atoms" section is now +formatted according to the atom style and a "Charges" section is added. Molecule file ^^^^^^^^^^^^^ @@ -314,14 +314,14 @@ there is no information about the number of types or the box dimensions. These parameters are set when the simulation box is created. Thus the header only has the count of atoms, bonds, and so on. -Molecule files have a header followed by sections, but the section names are -different than those of a data file. There is no "Atoms" section and the -section format is independent of the atom style. Its information is split -across multiple sections, like "Coords", "Types", and "Charges". Note that -no "Masses" section is needed here. The atom masses are by default tied to -the atom type and set with a data file or the :doc:`mass command `. A -"Masses" section would only be required for atom styles with per-atom -masses, e.g. atom style sphere. +Molecule files have a header followed by sections, but the section names +are different than those of a data file. There is no "Atoms" section +and the section format is independent of the atom style. Its +information is split across multiple sections, like "Coords", "Types", +and "Charges". Note that no "Masses" section is needed here. The atom +masses are by default tied to the atom type and set with a data file or +the :doc:`mass command `. A "Masses" section would only be +required for atom styles with per-atom masses, e.g. atom style sphere. Since the entire file is a 'molecule', LAMMPS will assign a new molecule-ID (if supported by the atom style) when atoms are instantiated @@ -368,32 +368,31 @@ molecule is created. 1 1 2 1 3 -There are also optional sections, e.g. about :doc:`SHAKE ` and -:doc:`special bonds `. Those sections are only needed if the molecule -command is issued *before* the simulation box is defined. Otherwise, the -molecule command can derive the required settings internally. - -Native Dump file -^^^^^^^^^^^^^^^^ +There are also optional sections, e.g. about :doc:`SHAKE ` +and :doc:`special bonds `. Those sections are only needed +if the molecule command is issued *before* the simulation box is +defined. Otherwise, the molecule command can derive the required +settings internally. Restart file ^^^^^^^^^^^^ LAMMPS restart files are binary files and not available in text format. They can be identified by the first few bytes that contain the (C-style) -string "LammpS RestartT" as `magic string -`_. This string is followed by a -16-bit integer of the number 1 used for detecting whether the computer -writing the restart has the same `endianness +string ``LammpS RestartT`` as `magic string +`_. This string is followed +by a 16-bit integer of the number 1 used for detecting whether the +computer writing the restart has the same `endianness `_ as the computer reading it. -If not, the file cannot be read correctly. This integer is followed by a 32-bit -integer indicating the file format revision (currently 3), which can be -used to implement backward compatibility for reading older revisions. +If not, the file cannot be read correctly. This integer is followed by +a 32-bit integer indicating the file format revision (currently 3), +which can be used to implement backward compatibility for reading older +revisions. This information has been added to the `Unix "file" command's ` "magic" file so that restart files can be identified without opening them. If you have a fairly recent -version, it should already be included. If you have an older version, +version, it should already be included. If you have an older version, the LAMMPS source package :ref:`contains a file with the necessary additions `. @@ -409,9 +408,8 @@ LAMMPS restart files are not expected to be portable between platforms or LAMMPS versions, but changes to the file format are rare. -.. Dump file -.. ^^^^^^^^^ +.. Native Dump file +.. ^^^^^^^^^^^^^^^^ .. .. Potential files .. ^^^^^^^^^^^^^^^ - From 7ff9ee51e58a45e4f503d0e54a01e227818cc090 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Sat, 29 Mar 2025 15:56:34 -0400 Subject: [PATCH 13/13] small tweaks --- doc/src/Run_formats.rst | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/doc/src/Run_formats.rst b/doc/src/Run_formats.rst index ff74bf1c56..d03227f091 100644 --- a/doc/src/Run_formats.rst +++ b/doc/src/Run_formats.rst @@ -314,23 +314,25 @@ there is no information about the number of types or the box dimensions. These parameters are set when the simulation box is created. Thus the header only has the count of atoms, bonds, and so on. -Molecule files have a header followed by sections, but the section names -are different than those of a data file. There is no "Atoms" section -and the section format is independent of the atom style. Its -information is split across multiple sections, like "Coords", "Types", -and "Charges". Note that no "Masses" section is needed here. The atom -masses are by default tied to the atom type and set with a data file or -the :doc:`mass command `. A "Masses" section would only be -required for atom styles with per-atom masses, e.g. atom style sphere. +Molecule files have a header followed by sections (just as in data +files), but the section names are different than those of a data file. +There is no "Atoms" section and the section formats in molecule files is +independent of the atom style. Its information is split across multiple +sections, like "Coords", "Types", and "Charges". Note that no "Masses" +section is needed here. The atom masses are by default tied to the atom +type and set with a data file or the :doc:`mass command `. A +"Masses" section would only be required for atom styles with per-atom +masses, e.g. atom style sphere, where in data files you would provide +the density and the diameter instead of the mass. Since the entire file is a 'molecule', LAMMPS will assign a new molecule-ID (if supported by the atom style) when atoms are instantiated from a molecule file, e.g. with the :doc:`create_atoms command -`. It is possible to include a "Molecules" section in -case the atoms belong to multiple 'molecules'. Atom-IDs and -molecule-IDs in the molecule file are relative for the file (starting -from 1) and will be translated into actual atom-IDs also when the -molecule is created. +`. It is possible to include a "Molecules" section to +indicate that the atoms belong to multiple 'molecules'. Atom-IDs and +molecule-IDs in the molecule file are relative for the file +(i.e. starting from 1) and will be translated into actual atom-IDs also +when the atoms from the molecule are created. .. code-block:: bash @@ -407,7 +409,6 @@ are in the ``lmprestart.h`` header file. LAMMPS restart files are not expected to be portable between platforms or LAMMPS versions, but changes to the file format are rare. - .. Native Dump file .. ^^^^^^^^^^^^^^^^ ..