Updated README and cleaned up

2024-07-21 10:06:37 -05:00
parent e8f09bfb02
commit f23835932c
2 changed files with 345 additions and 377 deletions
--- a/tools/regression-tests/README
+++ b/tools/regression-tests/README
@ -1,20 +1,36 @@
-The script `run_tests.py` in this folder is used to perform regression tests using in-place example scripts. 
+The script `run_tests.py` in this folder is used to perform regression tests
 using in-place example scripts.
-What this single script does is to launch the selected LAMMPS binary using a testing configuration defined in a `.yaml` file (e.g., `config.yaml`) for the set of input scripts inside the given `examples/` subfolders, and compare the output thermo with that in the existing log file with the same number of procs. If there are multiple log files with the same input script (e.g., `log.melt.*.g++.1` and `log.melt.*.g++.4`), the one with highest number of procs is chosen.
+What this single script does is to launch the selected LAMMPS binary
 using a testing configuration defined in a `.yaml` file (e.g., `config.yaml`)
 for the set of input scripts inside the given `examples/` subfolders,
 and compare the output thermo with that in the existing log file with the same number of procs.
 If there are multiple log files with the same input script (e.g., `log.melt.*.g++.1` and `log.melt.*.g++.4`),
 the one with the highest number of procs is chosen.
-The output includes the number of passed and failed tests and an `output.xml` file in the JUnit XML format for downstream reporting. The output and error of any crashed runs are logged.
+The output includes the number of passed and failed tests and
 an `output.xml` file in the JUnit XML format for downstream reporting.
 The output and error of any crashed runs are logged.
-A test with an input script is considered passed when the given LAMMPS binary produces thermo output quantities consistent with those in the reference log file within the specified tolerances in the test configuration `.yaml` file.
+A test with an input script is considered passed when the given LAMMPS binary produces
 thermo output quantities consistent with those in the reference log file
 within the specified tolerances in the test configuration `.yaml` file.
 With the current features, users can:
-    + launch tests with `mpirun` with multiple procs
+    + specify which LAMMPS binary version to test (e.g., the version from a commit, or those from `lammps-testing`)
    + specify which LAMMPS binary version to test (e.g., the version with their new code or those from `lammps-testing`)
    + specify the examples subfolders (thus the reference log files) seperately (e.g. from other LAMMPS versions or commits)
    + specify tolerances for individual quantities for any input script to override the global values
    + launch tests with `mpirun` with all supported command line features (multiple procs, multiple paritions, and suffices)
    + skip certain input files if not interested, or no reference log file exists
    + simplify the main LAMMPS builds, as long as a LAMMPS binary is available
 Limitations:
    - input scripts use thermo style multi (e.g., examples/peptide) do not work with the expected thermo output format
    - input scripts that require partition runs (e.g. examples/neb) need a separate config file, e.g. "args: --partition 2x1"
    - testing accelerator packages (GPU, INTEL, KOKKOS, OPENMP) need separate config files, "args: -sf omp -pk omp 4"
 TODO:
    + keep track of the testing progress to resume the testing from the last checkpoint
@ -22,6 +38,14 @@ TODO:
      split the list of input scripts into separate runs (there are 800+ input script under the top-level examples)
    + be able to be invoked from run_tests in the lammps-testing infrastruture
 The following Python packages need to be installed into an activated environment:
    python3 -m venv testing-env
    source testing-env/bin/activate
    pip install numpy pyyaml junit_xml
 Example uses:
    1. Simple use with the provided `tools/regression-tests/config.yaml` and the `examples/` folder at the top level:
@ -38,6 +62,10 @@ Example uses:
          --example-folders="/path/to/examples/folder1;/path/to/examples/folder2" \
          --config-file=/path/to/config/file/config.yaml 
    4) Test a LAMMPS binary with the whole top-level /examples folder in a LAMMPS source tree
       python3 run_tests.py --lmp-bin=/path/to/lmp_binary --example-top-level=/path/to/lammps/examples          
 An example of the test configuration `config.yaml` is given as below.
  ---
--- a/tools/regression-tests/run_tests.py
+++ b/tools/regression-tests/run_tests.py
@ -1,21 +1,35 @@
 #!/usr/bin/env python3
 '''
-
+UPDATE: July 21, 2024:
 pip install numpy pyyaml junit_xml
 UPDATE: July 5, 2024:
  Launching the LAMMPS binary under testing using a configuration defined in a yaml file (e.g. config.yaml).
  Comparing the output thermo with that in the existing log file (with the same nprocs)
    + data in the log files are extracted and converted into yaml data structure
    + using the in place input scripts, no need to add REG markers to the input scripts
-  This way we can:
+
-    + launch tests with mpirun with multiple procs
+With the current features, users can:
-    + specify what LAMMPS binary version to test (e.g., testing separate builds)
+    + specify which LAMMPS binary version to test (e.g., the version from a commit, or those from `lammps-testing`)
-    + simplify the build configuration (no need to build the Python module)
+    + specify the examples subfolders (thus the reference log files) seperately (e.g. from other LAMMPS versions or commits)
    + specify tolerances for individual quantities for any input script to override the global values
    + launch tests with `mpirun` with all supported command line features (multiple procs, multiple paritions, and suffices)
    + skip certain input files if not interested, or no reference log file exists
    + simplify the main LAMMPS builds, as long as a LAMMPS binary is available
 Limitations:
    - input scripts use thermo style multi (e.g., examples/peptide) do not work with the expected thermo output format
    - input scripts that require partition runs (e.g. examples/neb) need a separate config file, e.g. "args: --partition 2x1"
    - testing accelerator packages (GPU, INTEL, KOKKOS, OPENMP) need separate config files, "args: -sf omp -pk omp 4"
 TODO:
-    + distribute the input list across multiple processes via multiprocessing
+    + keep track of the testing progress to resume the testing from the last checkpoint
    + distribute the input list across multiple processes via multiprocessing, or
      split the list of input scripts into separate runs (there are 800+ input script under the top-level examples)
    + be able to be invoked from run_tests in the lammps-testing infrastruture
 The following Python packages need to be installed into an activated environment:
    python3 -m venv testing-env
    source testing-env/bin/activate
    pip install numpy pyyaml junit_xml
 Example usage:
    1) Simple use (using the provided tools/regression-tests/config.yaml and the examples/ folder at the top level)
@ -26,6 +40,8 @@ UPDATE: July 5, 2024:
       python3 run_tests.py --lmp-bin=/path/to/lmp_binary \
          --example-folders="/path/to/examples/folder1;/path/to/examples/folder2" \
          --config-file=/path/to/config/file/config.yaml 
    4) Test a LAMMPS binary with the whole top-level /examples folder in a LAMMPS source tree
       python3 run_tests.py --lmp-bin=/path/to/lmp_binary --example-top-level=/path/to/lammps/examples
 '''
 import os
@ -39,8 +55,8 @@ from multiprocessing import Pool
 import logging
 # need "pip install numpy pyyaml"
 import yaml
 import numpy as np
 import yaml
 # need "pip install junit_xml"
 from junit_xml import TestSuite, TestCase
@ -60,11 +76,13 @@ class TestResult:
 '''
  get the thermo output from a log file with thermo style yaml
  yamlFileName: input YAML file with thermo structured
      as described in https://docs.lammps.org/Howto_structured_data.html
  return: thermo, which is a list containing a dictionary for each run
      where the tag "keywords" maps to the list of thermo header strings
-    and the tag “data” has a list of lists where the outer list represents the lines
+      and the tag data has a list of lists where the outer list represents the lines
      of output and the inner list the values of the columns matching the header keywords for that step.
 '''
 def extract_thermo(yamlFileName):
@ -78,7 +96,7 @@ def extract_thermo(yamlFileName):
 '''
-  Convert an existing log.lammps file into a thermo yaml style log
+    Convert an existing log file into a thermo yaml style log
    inputFileName = a provided log file in an examples folder (e.g. examples/melt/log.8Apr21.melt.g++.4)
    return a YAML data structure as if loaded from a thermo yaml file
 '''
@ -200,8 +218,9 @@ def divide_into_N(original_list, N):
        b.append(l)
    return b
 '''
    process the #REG markers in an input script, add/replace with what follows each marker
    inputFileName:  LAMMPS input file with comments #REG:ADD and #REG:SUB as markers
    outputFileName: modified input file ready for testing
 '''
@ -271,11 +290,13 @@ def has_markers(inputFileName):
    Iterate over a list of input files using the given lmp_binary, the testing configuration
    return test results, as a list of TestResult instances
    To map a function to individual workers:
    def func(input1, input2, output):
        # do smth
        return result
-  # args is a list of Ncores tuples, each tuple contains the arguments passed to the function executed by a worker
+    # args is a list of num_workers tuples, each tuple contains the arguments passed to the function executed by a worker
    args = []
    for i in range(num_workers):
        args.append((input1, input2, output))
@ -284,7 +305,7 @@ def has_markers(inputFileName):
        results = pool.starmap(func, args)
 '''
-def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False):
+def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False, output=None):
    EPSILON = np.float64(config['epsilon'])
    nugget = float(config['nugget'])
@ -292,7 +313,7 @@ def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False)
    num_passed = 0
    test_id = 0
-  # using REG-commented input scripts
+    # using REG-commented input scripts, now turned off (False)
    using_markers = False
    # iterate over the input scripts
@ -330,6 +351,7 @@ def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False)
                str_t = "\nRunning " + input_test + f" ({test_id+1}/{num_tests})"
        else:
            input_test = input
        print(str_t)
        print(f"-"*len(str_t))
        logger.info(str_t)
@ -368,7 +390,7 @@ def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False)
            thermo_ref = extract_data_to_yaml(thermo_ref_file)
            num_runs_ref = len(thermo_ref)
        else:
-        logger.info(f"Cannot find reference log file with {pattern}.")
+            logger.info(f"Cannot find a reference log file {thermo_ref_file} for {input_test}.")
            # try to read in the thermo yaml output from the working directory
            thermo_ref_file = 'thermo.' + input + '.yaml'
            file_exist = os.path.isfile(thermo_ref_file)
@ -382,10 +404,6 @@ def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False)
                test_id = test_id + 1
                continue
    # using the LAMMPS python module (for single-proc runs)
    #  lmp = lammps()
    #  lmp.file(input_test)
        # or more customizable with config.yaml
        cmd_str, output, error, returncode = execute(lmp_binary, config, input_test)
@ -400,12 +418,11 @@ def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False)
            test_id = test_id + 1
            continue
-    # process thermo output
+        # process thermo output from the run
        thermo = extract_data_to_yaml("log.lammps")
        num_runs = len(thermo)
        if num_runs == 0:
            logger.info(f"The run terminated with {input_test} gives the following output:\n")
            logger.info(f"\n{output}")
            if "Unrecognized" in output:
@ -534,7 +551,8 @@ def iterate(lmp_binary, input_list, config, results, removeAnnotatedInput=False)
 '''
  TODO:
-    - automate annotating the example input scripts if thermo style is multi (e.g. examples/peptide)
+
 '''
 if __name__ == "__main__":
@ -627,24 +645,15 @@ if __name__ == "__main__":
        p = subprocess.run(cmd_str, shell=True, text=True, capture_output=True)
        input_list = p.stdout.split('\n')
        input_list.remove("")
        # find out which folder to cd into to run the input script
        for input in input_list:
            folder = input.rsplit('/', 1)[0]
            folder_list.append(folder)
        print(f"There are {len(input_list)} input scripts in total under the {example_toplevel} folder.")
        # divide the list of input scripts into num_workers chunks
        sublists = divide_into_N(input_list, num_workers)
        '''
        # getting the list of all the subfolders
        cmd_str = f"ls -d {example_toplevel} "
        p = subprocess.run(cmd_str, shell=True, text=True, capture_output=True)
        folder_list = p.stdout.split('\n')
        folder_list.remove("")
        print(f"There are {len(folder_list)} subfolders in total under the {example_toplevel} folder.")
        # divide the list of subfolders into num_workers chunks
        sublists = divide_into_N(folder_list, num_workers)
        '''
    # if only statistics, not running anything
    if dry_run == True:
@ -655,10 +664,12 @@ if __name__ == "__main__":
    test_cases = []
    # if the example folders are not specified from the command-line argument --example-folders
-    # then use the --example-top-folder
+    # then use the path from --example-top-folder
    if len(example_subfolders) == 0:
-        # input file list
+        # get the input file list, for now the first in the sublist
        # TODO: generate a list of tuples, each tuple contains a folder list for a worker,
        #       then use multiprocessing.Pool starmap()
        folder_list = []
        for input in sublists[0]:
            folder = input.rsplit('/', 1)[0]
@ -668,82 +679,9 @@ if __name__ == "__main__":
        example_subfolders = folder_list
        '''
        example_subfolders = sublists[0]
        '''
        '''
        example_subfolders.append("../../examples/melt")
        example_subfolders.append('../../examples/flow')
        example_subfolders.append('../../examples/indent')
        example_subfolders.append('../../examples/shear')
        example_subfolders.append('../../examples/steinhardt')
        # prd  log file parsing issue
        # neb  log file parsing issue
        # snap log files obsolete?
        # append the example subfolders depending on the installed packages
        if 'ASPHERE' in packages:
            #example_subfolders.append('../../examples/ASPHERE/ellipsoid')
            example_subfolders.append('../../examples/ellipse')
        if 'CORESHELL' in packages:
            example_subfolders.append('../../examples/coreshell')
        if 'MOLECULE' in packages:
            example_subfolders.append('../../examples/micelle')
            # peptide thermo_style as multi
            #example_subfolders.append('../../examples/peptide')
        if 'GRANULAR' in packages:
            example_subfolders.append('../../examples/granular')
            example_subfolders.append('../../examples/pour')
        if 'AMOEBA' in packages:
            example_subfolders.append('../../examples/amoeba')
        if 'BODY' in packages:
            example_subfolders.append('../../examples/body')
        if 'BPM' in packages:
            example_subfolders.append('../../examples/bpm/impact')
            example_subfolders.append('../../examples/bpm/pour')
        if 'COLLOID' in packages:
            example_subfolders.append('../../examples/colloid')
        if 'CRACK' in packages:
            example_subfolders.append('../../examples/crack')
        if 'DIELECTRIC' in packages:
            example_subfolders.append('../../examples/PACKAGES/dielectric')
        if 'DIPOLE' in packages:
            example_subfolders.append('../../examples/dipole')
        if 'DPD-BASIC' in packages:
            example_subfolders.append('../../examples/PACKAGES/dpd-basic/dpd')
            example_subfolders.append('../../examples/PACKAGES/dpd-basic/dpdext')
            example_subfolders.append('../../examples/PACKAGES/dpd-basic/dpd_tstat')
            example_subfolders.append('../../examples/PACKAGES/dpd-basic/dpdext_tstat')
        if 'MANYBODY' in packages:
            example_subfolders.append('../../examples/tersoff')
            example_subfolders.append('../../examples/vashishta')
            example_subfolders.append('../../examples/threebody')
        if 'RIGID' in packages:
            example_subfolders.append('../../examples/rigid')
        if 'SNAP' in packages:
            example_subfolders.append('../../examples/snap')
        if 'SRD' in packages:
            example_subfolders.append('../../examples/srd')
        '''
    all_results = []
-    # default setting
+
    # default setting is to use inplace_input
    if inplace_input == True:
        # save current working dir
        p = subprocess.run("pwd", shell=True, text=True, capture_output=True)
@ -754,7 +692,6 @@ if __name__ == "__main__":
        # change dir to a folder under examples/, need to use os.chdir()
        # TODO: loop through the subfolders under examples/, depending on the installed packages
        '''
        args = []
        for i in range(num_workers):
@ -766,7 +703,6 @@ if __name__ == "__main__":
        total_tests = 0
        passed_tests = 0
        for directory in example_subfolders:
            p = subprocess.run("pwd", shell=True, text=True, capture_output=True)
@ -786,6 +722,7 @@ if __name__ == "__main__":
            num_passed = iterate(lmp_binary, input_list, config, results)
            passed_tests += num_passed
            # append the results to the all_results list
            all_results.extend(results)
            # get back to the working dir
@ -798,11 +735,14 @@ if __name__ == "__main__":
        results = []
        passed_tests = iterate(input_list, config, results)
        all_results.extend(results)
    # print out summary
    print("Summary:")
    print(f" - {passed_tests} numerical tests passed / {total_tests} tests")
    print(f" - Details are given in {output_file}.")
-
+    # optional: need to check if junit_xml packaged is already installed in the env
    #   generate a JUnit XML file 
    with open(output_file, 'w') as f:
        test_cases = []