add not input parsing section about handling of UFT-8 characters

This commit is contained in:
Axel Kohlmeyer
2021-01-25 21:44:02 -05:00
parent a119164496
commit afc58659db

View File

@ -162,3 +162,26 @@ LAMMPS:
triple quotes can be nested in the usual manner. See the doc pages
for those commands for examples. Only one of level of nesting is
allowed, but that should be sufficient for most use cases.
.. admonition:: ASCII versus UTF-8
:class: note
LAMMPS expects and processes 7-bit ASCII format text internally.
Many modern environments use UTF-8 encoding, which is a superset
of the 7-bit ASCII character table and thus mostly compatible.
However, there are several non-ASCII characters that can look
very similar to their ASCII equivalents or are invisible (so they
look like a blank), but are encoded differently. Web browsers,
PDF viewers, document editors are known to sometimes replace one
with the other for a better looking output. However, that can
lead to problems, for instance, when using cut-n-paste of input
file examples from web pages, or when using a document editor
(not a dedicated plain text editor) for writing LAMMPS inputs.
LAMMPS will try to detect this and substitute the non-ASCII
characters with their ASCII equivalents where known. There also
is going to be a warning printed, if this occurs. It is
recommended to avoid such characters altogether in LAMMPS input,
data and potential files. The replacement tables are likely
incomplete and dependent on users reporting problems processing
correctly looking input containing UTF-8 encoded non-ASCII
characters.