From afc58659db0ed6c62e1825ff1c20c6962569129e Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Mon, 25 Jan 2021 21:44:02 -0500 Subject: [PATCH] add not input parsing section about handling of UFT-8 characters --- doc/src/Commands_parse.rst | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/doc/src/Commands_parse.rst b/doc/src/Commands_parse.rst index 37283823d7..64d5100715 100644 --- a/doc/src/Commands_parse.rst +++ b/doc/src/Commands_parse.rst @@ -162,3 +162,26 @@ LAMMPS: triple quotes can be nested in the usual manner. See the doc pages for those commands for examples. Only one of level of nesting is allowed, but that should be sufficient for most use cases. + +.. admonition:: ASCII versus UTF-8 + :class: note + + LAMMPS expects and processes 7-bit ASCII format text internally. + Many modern environments use UTF-8 encoding, which is a superset + of the 7-bit ASCII character table and thus mostly compatible. + However, there are several non-ASCII characters that can look + very similar to their ASCII equivalents or are invisible (so they + look like a blank), but are encoded differently. Web browsers, + PDF viewers, document editors are known to sometimes replace one + with the other for a better looking output. However, that can + lead to problems, for instance, when using cut-n-paste of input + file examples from web pages, or when using a document editor + (not a dedicated plain text editor) for writing LAMMPS inputs. + LAMMPS will try to detect this and substitute the non-ASCII + characters with their ASCII equivalents where known. There also + is going to be a warning printed, if this occurs. It is + recommended to avoid such characters altogether in LAMMPS input, + data and potential files. The replacement tables are likely + incomplete and dependent on users reporting problems processing + correctly looking input containing UTF-8 encoded non-ASCII + characters.