diff --git a/doc/Section_accelerate.html b/doc/Section_accelerate.html index 4b4838ba70..b64c88ca40 100644 --- a/doc/Section_accelerate.html +++ b/doc/Section_accelerate.html @@ -190,55 +190,16 @@ from the GPU package, you can either append "gpu" to the style name switch, or use the suffix command.
-The fix gpu command controls the GPU selection and -initialization steps. +
The package gpu command must be used near the beginning +of your script to control the GPU selection and initialization steps. +It also enables asynchronous splitting of force computations between +the CPUs and GPUs.
-The format for the fix is: -
-fix fix-ID all gpu mode first last split --
where fix-ID is the name for the fix. The gpu fix must be the first -fix specified for a given run, otherwise LAMMPS will exit with an -error. The gpu fix does not have any effect on runs that do not use -GPU acceleration, so there should be no problem specifying the fix -first in any input script. -
-The mode setting can be either "force" or "force/neigh". In the -former, neighbor list calculation is performed on the CPU using the -standard LAMMPS routines. In the latter, the neighbor list calculation -is performed on the GPU. The GPU neighbor list can be used for better -performance, however, it cannot not be used with a triclinic box or -with hybrid pair styles. -
-There are cases when it may be more efficient to select the CPU for -neighbor list builds. If a non-GPU enabled style (e.g. a fix or -compute) requires a neighbor list, it will also be built using CPU -routines. Redundant CPU and GPU neighbor list calculations will -typically be less efficient. -
-The first setting is the ID (as reported by -lammps/lib/gpu/nvc_get_devices) of the first GPU that will be used on -each node. The last setting is the ID of the last GPU that will be -used on each node. If you have only one GPU per node, first and -last will typically both be 0. Selecting a non-sequential set of GPU -IDs (e.g. 0,1,3) is not currently supported. -
-The split setting is the fraction of particles whose forces, -torques, energies, and/or virials will be calculated on the GPU. This -can be used to perform CPU and GPU force calculations simultaneously, -e.g. on a hybrid node with a multicore CPU and a GPU(s). If split -is negative, the software will attempt to calculate the optimal -fraction automatically every 25 timesteps based on CPU and GPU -timings. Because the GPU speedups are dependent on the number of -particles, automatic calculation of the split can be less efficient, -but typically results in loop times within 20% of an optimal fixed -split. -
-As an example, if you have two GPUs per node, 8 CPU cores per node, +
As an example, if you have two GPUs per node and 8 CPU cores per node, and would like to run on 4 nodes (32 cores) with dynamic balancing of -force calculation across CPU and GPU cores, the fix might be +force calculation across CPU and GPU cores, you could specify
-fix 0 all gpu force/neigh 0 1 -1 +package gpu force/neigh 0 1 -1In this case, all CPU cores and GPU devices on the nodes would be utilized. Each GPU device would be shared by 4 CPU cores. The CPU @@ -246,39 +207,14 @@ cores would perform force calculations for some fraction of the particles at the same time the GPUs performed force calculation for the other particles.
-Asynchronous pair computation on GPU and CPU -
-The GPU accelerated pair styles can perform pair style force -calculation on the GPU at the same time other force calculations -within LAMMPS are being performed on the CPU. These include pair, -bond, angle, etc forces as well as long-range Coulombic forces. This -is enabled by the split setting in the gpu fix as described above. -
-With a split setting less than 1.0, a portion of the pair-wise force -calculations will also be performed on the CPU. When the CPU finishes -its pair style computations (if any), the next LAMMPS force -computation will begin (bond, angle, etc), possibly before the GPU has -finished its pair style computations. -
-This means that if split is set to 1.0, the GPU will begin the -LAMMPS force computation immediately. This can be used to run a -hybrid GPU pair style at the same time as a hybrid -CPU pair style. In this case, the GPU pair style should be first in -the hybrid command in order to perform simultaneous calculations. This -also allows bond, angle, -dihedral, improper, and -long-range force computations to run -simultaneously with the GPU pair style. If all CPU force computations -complete before the GPU, LAMMPS will block until the GPU has finished -before continuing the timestep. -
Timing output:
-As noted above, GPU accelerated pair styles can perform computations -asynchronously with CPU computations. The "Pair" time reported by -LAMMPS will be the maximum of the time required to complete the CPU -pair style computations and the time required to complete the GPU pair -style computations. Any time spent for GPU-enabled pair styles for +
As described by the package gpu command, GPU +accelerated pair styles can perform computations asynchronously with +CPU computations. The "Pair" time reported by LAMMPS will be the +maximum of the time required to complete the CPU pair style +computations and the time required to complete the GPU pair style +computations. Any time spent for GPU-enabled pair styles for computations that run simultaneously with bond, angle, dihedral, improper, and long-range diff --git a/doc/Section_accelerate.txt b/doc/Section_accelerate.txt index c67655f13e..b348fe207d 100644 --- a/doc/Section_accelerate.txt +++ b/doc/Section_accelerate.txt @@ -185,55 +185,16 @@ from the GPU package, you can either append "gpu" to the style name switch"_Section_start.html#2_6, or use the "suffix"_suffix.html command. -The "fix gpu"_fix_gpu.html command controls the GPU selection and -initialization steps. +The "package gpu"_package.html command must be used near the beginning +of your script to control the GPU selection and initialization steps. +It also enables asynchronous splitting of force computations between +the CPUs and GPUs. -The format for the fix is: - -fix fix-ID all gpu {mode} {first} {last} {split} :pre - -where fix-ID is the name for the fix. The gpu fix must be the first -fix specified for a given run, otherwise LAMMPS will exit with an -error. The gpu fix does not have any effect on runs that do not use -GPU acceleration, so there should be no problem specifying the fix -first in any input script. - -The {mode} setting can be either "force" or "force/neigh". In the -former, neighbor list calculation is performed on the CPU using the -standard LAMMPS routines. In the latter, the neighbor list calculation -is performed on the GPU. The GPU neighbor list can be used for better -performance, however, it cannot not be used with a triclinic box or -with "hybrid"_pair_hybrid.html pair styles. - -There are cases when it may be more efficient to select the CPU for -neighbor list builds. If a non-GPU enabled style (e.g. a fix or -compute) requires a neighbor list, it will also be built using CPU -routines. Redundant CPU and GPU neighbor list calculations will -typically be less efficient. - -The {first} setting is the ID (as reported by -lammps/lib/gpu/nvc_get_devices) of the first GPU that will be used on -each node. The {last} setting is the ID of the last GPU that will be -used on each node. If you have only one GPU per node, {first} and -{last} will typically both be 0. Selecting a non-sequential set of GPU -IDs (e.g. 0,1,3) is not currently supported. - -The {split} setting is the fraction of particles whose forces, -torques, energies, and/or virials will be calculated on the GPU. This -can be used to perform CPU and GPU force calculations simultaneously, -e.g. on a hybrid node with a multicore CPU and a GPU(s). If {split} -is negative, the software will attempt to calculate the optimal -fraction automatically every 25 timesteps based on CPU and GPU -timings. Because the GPU speedups are dependent on the number of -particles, automatic calculation of the split can be less efficient, -but typically results in loop times within 20% of an optimal fixed -split. - -As an example, if you have two GPUs per node, 8 CPU cores per node, +As an example, if you have two GPUs per node and 8 CPU cores per node, and would like to run on 4 nodes (32 cores) with dynamic balancing of -force calculation across CPU and GPU cores, the fix might be +force calculation across CPU and GPU cores, you could specify -fix 0 all gpu force/neigh 0 1 -1 :pre +package gpu force/neigh 0 1 -1 :pre In this case, all CPU cores and GPU devices on the nodes would be utilized. Each GPU device would be shared by 4 CPU cores. The CPU @@ -241,39 +202,14 @@ cores would perform force calculations for some fraction of the particles at the same time the GPUs performed force calculation for the other particles. -[Asynchronous pair computation on GPU and CPU] - -The GPU accelerated pair styles can perform pair style force -calculation on the GPU at the same time other force calculations -within LAMMPS are being performed on the CPU. These include pair, -bond, angle, etc forces as well as long-range Coulombic forces. This -is enabled by the {split} setting in the gpu fix as described above. - -With a {split} setting less than 1.0, a portion of the pair-wise force -calculations will also be performed on the CPU. When the CPU finishes -its pair style computations (if any), the next LAMMPS force -computation will begin (bond, angle, etc), possibly before the GPU has -finished its pair style computations. - -This means that if {split} is set to 1.0, the GPU will begin the -LAMMPS force computation immediately. This can be used to run a -"hybrid"_pair_hybrid.html GPU pair style at the same time as a hybrid -CPU pair style. In this case, the GPU pair style should be first in -the hybrid command in order to perform simultaneous calculations. This -also allows "bond"_bond_style.html, "angle"_angle_style.html, -"dihedral"_dihedral_style.html, "improper"_improper_style.html, and -"long-range"_kspace_style.html force computations to run -simultaneously with the GPU pair style. If all CPU force computations -complete before the GPU, LAMMPS will block until the GPU has finished -before continuing the timestep. - [Timing output:] -As noted above, GPU accelerated pair styles can perform computations -asynchronously with CPU computations. The "Pair" time reported by -LAMMPS will be the maximum of the time required to complete the CPU -pair style computations and the time required to complete the GPU pair -style computations. Any time spent for GPU-enabled pair styles for +As described by the "package gpu"_package.html command, GPU +accelerated pair styles can perform computations asynchronously with +CPU computations. The "Pair" time reported by LAMMPS will be the +maximum of the time required to complete the CPU pair style +computations and the time required to complete the GPU pair style +computations. Any time spent for GPU-enabled pair styles for computations that run simultaneously with "bond"_bond_style.html, "angle"_angle_style.html, "dihedral"_dihedral_style.html, "improper"_improper_style.html, and "long-range"_kspace_style.html diff --git a/doc/Section_commands.html b/doc/Section_commands.html index 941b2a1de2..0618a5d250 100644 --- a/doc/Section_commands.html +++ b/doc/Section_commands.html @@ -338,15 +338,14 @@ of each style or click on the style itself for a full description:
These are fix styles contributed by users, which can be used if diff --git a/doc/Section_commands.txt b/doc/Section_commands.txt index 3635a753f5..f9b9b1a189 100644 --- a/doc/Section_commands.txt +++ b/doc/Section_commands.txt @@ -418,7 +418,6 @@ of each style or click on the style itself for a full description: "evaporate"_fix_evaporate.html, "external"_fix_external.html, "freeze"_fix_freeze.html, -"gpu"_fix_gpu.html, "gravity"_fix_gravity.html, "heat"_fix_heat.html, "indent"_fix_indent.html, diff --git a/doc/fix_gpu.html b/doc/fix_gpu.html deleted file mode 100644 index d48e510798..0000000000 --- a/doc/fix_gpu.html +++ /dev/null @@ -1,112 +0,0 @@ - -
LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Commands - - - - - - - -
- -fix gpu command -
-Syntax: -
-fix ID group-ID gpu mode first last split --
Examples: -
-fix 0 all gpu force 0 0 1.0 -fix 0 all gpu force 0 0 0.75 -fix 0 all gpu force/neigh 0 0 1.0 -fix 0 all gpu force/neigh 0 1 -1.0 --
Description: -
-Select and initialize GPUs to be used for acceleration and configure -GPU acceleration in LAMMPS. This fix is required in order to use -any style with GPU acceleration. The fix must be the first fix -specified for a run or an error will be generated. The fix will not have an -effect on any LAMMPS computations that do not use GPU acceleration, so there -should not be any problems with specifying this fix first in input scripts. -
-The mode setting specifies where neighbor list calculations will be -performed. If mode is force, neighbor list calculation is performed -on the CPU. If mode is force/neigh, neighbor list calculation is -performed on the GPU. GPU neighbor list calculation currently cannot -be used with a triclinic box. GPU neighbor list calculation currently -cannot be used with hybrid pair styles. GPU -neighbor lists are not compatible with styles that are not -GPU-enabled. When a non-GPU enabled style requires a neighbor list, -it will also be built using CPU routines. In these cases, it will -typically be more efficient to only use CPU neighbor list builds. -
-The first and last settings specify the GPUs that will be used for -simulation. On each node, the GPU IDs in the inclusive range from -first to last will be used. -
-The split setting can be used for load balancing force calculation -work between CPU and GPU cores in GPU-enabled pair styles. If -0<split<1.0, a fixed fraction of particles is offloaded to the GPU -while force calculation for the other particles occurs simulataneously -on the CPU. If split<0, the optimal fraction (based on CPU and GPU -timings) is calculated every 25 timesteps. If split=1.0, all force -calculations for GPU accelerated pair styles are performed on the -GPU. In this case, hybrid, bond, -angle, dihedral, -improper, and long-range -calculations can be performed on the CPU while the GPU is performing -force calculations for the GPU-enabled pair style. -
-In order to use GPU acceleration, a GPU enabled style must be selected -in the input script in addition to this fix. Currently, this is -limited to a few pair styles and the PPPM kspace -style. -
-See this section of the manual for more -details about using the GPU package. -
-Restart, fix_modify, output, run start/stop, minimize info: -
-This fix is part of the "gpu" package. It is only enabled if LAMMPS -was built with that package. See the Making -LAMMPS section for more info. -
-No information about this fix is written to binary restart -files. None of the fix_modify options -are relevant to this fix. -
-No parameter of this fix can be used with the start/stop keywords of -the run command. -
-Restrictions: -
-The fix must be the first fix specified for a given run. The -force/neigh mode should not be used with a triclinic box or -hybrid pair styles. -
-The split setting must be positive when using -hybrid pair styles. -
-Currently, group-ID must be all. -
-Related commands: none -
-Default: none -
- diff --git a/doc/fix_gpu.txt b/doc/fix_gpu.txt deleted file mode 100644 index 6abf729e74..0000000000 --- a/doc/fix_gpu.txt +++ /dev/null @@ -1,102 +0,0 @@ -"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c - -:link(lws,http://lammps.sandia.gov) -:link(ld,Manual.html) -:link(lc,Section_commands.html#comm) - -:line - -fix gpu command :h3 - -[Syntax:] - -fix ID group-ID gpu mode first last split :pre - -ID, group-ID are documented in "fix"_fix.html command :ulb,l -gpu = style name of this fix command :l -mode = force or force/neigh :l -first = ID of first GPU to be used on each node :l -last = ID of last GPU to be used on each node :l -split = fraction of particles assigned to the GPU :l -:ule - -[Examples:] - -fix 0 all gpu force 0 0 1.0 -fix 0 all gpu force 0 0 0.75 -fix 0 all gpu force/neigh 0 0 1.0 -fix 0 all gpu force/neigh 0 1 -1.0 :pre - -[Description:] - -Select and initialize GPUs to be used for acceleration and configure -GPU acceleration in LAMMPS. This fix is required in order to use -any style with GPU acceleration. The fix must be the first fix -specified for a run or an error will be generated. The fix will not have an -effect on any LAMMPS computations that do not use GPU acceleration, so there -should not be any problems with specifying this fix first in input scripts. - -The {mode} setting specifies where neighbor list calculations will be -performed. If {mode} is force, neighbor list calculation is performed -on the CPU. If {mode} is force/neigh, neighbor list calculation is -performed on the GPU. GPU neighbor list calculation currently cannot -be used with a triclinic box. GPU neighbor list calculation currently -cannot be used with "hybrid"_pair_hybrid.html pair styles. GPU -neighbor lists are not compatible with styles that are not -GPU-enabled. When a non-GPU enabled style requires a neighbor list, -it will also be built using CPU routines. In these cases, it will -typically be more efficient to only use CPU neighbor list builds. - -The {first} and {last} settings specify the GPUs that will be used for -simulation. On each node, the GPU IDs in the inclusive range from -{first} to {last} will be used. - -The {split} setting can be used for load balancing force calculation -work between CPU and GPU cores in GPU-enabled pair styles. If -0<{split}<1.0, a fixed fraction of particles is offloaded to the GPU -while force calculation for the other particles occurs simulataneously -on the CPU. If {split}<0, the optimal fraction (based on CPU and GPU -timings) is calculated every 25 timesteps. If {split}=1.0, all force -calculations for GPU accelerated pair styles are performed on the -GPU. In this case, "hybrid"_pair_hybrid.html, "bond"_bond_style.html, -"angle"_angle_style.html, "dihedral"_dihedral_style.html, -"improper"_improper_style.html, and "long-range"_kspace_style.html -calculations can be performed on the CPU while the GPU is performing -force calculations for the GPU-enabled pair style. - -In order to use GPU acceleration, a GPU enabled style must be selected -in the input script in addition to this fix. Currently, this is -limited to a few "pair styles"_pair_style.html and the PPPM "kspace -style"_kspace_style.html. - -See "this section"_doc/Section_accerate.html of the manual for more -details about using the GPU package. - -[Restart, fix_modify, output, run start/stop, minimize info:] - -This fix is part of the "gpu" package. It is only enabled if LAMMPS -was built with that package. See the "Making -LAMMPS"_Section_start.html#2_3 section for more info. - -No information about this fix is written to "binary restart -files"_restart.html. None of the "fix_modify"_fix_modify.html options -are relevant to this fix. - -No parameter of this fix can be used with the {start/stop} keywords of -the "run"_run.html command. - -[Restrictions:] - -The fix must be the first fix specified for a given run. The -force/neigh {mode} should not be used with a triclinic box or -"hybrid"_pair_hybrid.html pair styles. - -The {split} setting must be positive when using -"hybrid"_pair_hybrid.html pair styles. - -Currently, group-ID must be all. - -[Related commands:] none - -[Default:] none - diff --git a/doc/package.html b/doc/package.html index 814340bc81..c1b5b0bebf 100644 --- a/doc/package.html +++ b/doc/package.html @@ -15,39 +15,136 @@package style args-
cuda args = to be determined +
cuda args = to be determined + omp args = Nthreads ++
Nthreads = # of OpenMP threads to associate with each MPI process
Examples:
-package cuda blah +package gpu force 0 0 1.0 +package gpu force 0 0 0.75 +package gpu force/neigh 0 0 1.0 +package gpu force/neigh 0 1 -1.0 +package cuda blah +package omp 4Description:
-This command invokes package-specific settings. Currently only the -USER-CUDA package uses it. +
This command invokes package-specific settings. Currently the +following packages use it: GPU, USER-CUDA, and USER-OMP.
+See this section of the manual for more +details about using these various packages for accelerating +a LAMMPS calculation. +
+
+ +The gpu style invokes options associated with the use of the GPU +package. It allows you to select and initialize GPUs to be used for +acceleration via this package and configure how the GPU acceleration +is performed. These settings are required in order to use any style +with GPU acceleration. +
+The mode setting specifies where neighbor list calculations will be +performed. If mode is force, neighbor list calculation is performed +on the CPU. If mode is force/neigh, neighbor list calculation is +performed on the GPU. GPU neighbor list calculation currently cannot +be used with a triclinic box. GPU neighbor list calculation currently +cannot be used with hybrid pair styles. GPU +neighbor lists are not compatible with styles that are not +GPU-enabled. When a non-GPU enabled style requires a neighbor list, +it will also be built using CPU routines. In these cases, it will +typically be more efficient to only use CPU neighbor list builds. +
+The first and last settings specify the GPUs that will be used for +simulation. On each node, the GPU IDs in the inclusive range from +first to last will be used. +
+The split setting can be used for load balancing force calculation +work between CPU and GPU cores in GPU-enabled pair styles. If 0 < +split < 1.0, a fixed fraction of particles is offloaded to the GPU +while force calculation for the other particles occurs simulataneously +on the CPU. If split<0, the optimal fraction (based on CPU and GPU +timings) is calculated every 25 timesteps. If split = 1.0, all force +calculations for GPU accelerated pair styles are performed on the +GPU. In this case, hybrid, bond, +angle, dihedral, +improper, and long-range +calculations can be performed on the CPU while the GPU is performing +force calculations for the GPU-enabled pair style. If all CPU force +computations complete before the GPU, LAMMPS will block until the GPU +has finished before continuing the timestep. +
+As an example, if you have two GPUs per node and 8 CPU cores per node, +and would like to run on 4 nodes (32 cores) with dynamic balancing of +force calculation across CPU and GPU cores, you could specify +
+package gpu force/neigh 0 1 -1 ++In this case, all CPU cores and GPU devices on the nodes would be +utilized. Each GPU device would be shared by 4 CPU cores. The CPU +cores would perform force calculations for some fraction of the +particles at the same time the GPUs performed force calculation for +the other particles. +
+
+The cuda style invokes options associated with the use of the -USER-CUDA package. These will be described when the USER-CUDA package -is released with LAMMPS. +USER-CUDA package. These need to be documented.
+
+ +The omp style invokes options associated with the use of the +USER-OMP package. +
+The only setting to make is the number of OpenMP threads to be +allocated for each MPI process. For example, if your system has nodes +with dual quad-core processors, it has a total of 8 cores per node. +You could run MPI on 2 cores on each node (e.g. using options for the +mpirun command), and set the Nthreads setting to 4. This would +effectively use all 8 cores on each node. Since each MPI process +would spawn 4 threads (one of which runs as part of the MPI process +itself). +
+For performance reasons, you should not set Nthreads to more threads +than there are physical cores, but LAMMPS does not check for this. +
+
+Restrictions:
+This command cannot be used after the simulation box is defined by a +read_data or create_box command. +
The cuda style of this command can only be invoked if LAMMPS was built with the USER-CUDA package. See the Making LAMMPS section for more info.
-Obviously, you must have GPU hardware and associated software to build -and use LAMMPS with either the GPU or USER-CUDA packages. +
The gpu style of this command can only be invoked if LAMMPS was built +with the GPU package. See the Making LAMMPS +section for more info.
-Related commands: +
The omp style of this command can only be invoked if LAMMPS was built +with the USER-OMP package. See the Making +LAMMPS section for more info.
-fix gpu +
Related commands: none
Default: none
diff --git a/doc/package.txt b/doc/package.txt index 0e53f6d23e..6ac414e843 100644 --- a/doc/package.txt +++ b/doc/package.txt @@ -12,35 +12,127 @@ package command :h3 package style args :pre -style = {cuda} :ulb,l -args = 0 or more args specific to the style :l - {cuda} args = to be determined :pre +style = {gpu} or {cuda} or {omp} :ulb,l +args = arguments specific to the style :l + {gpu} args = mode first last split + mode = force or force/neigh :l + first = ID of first GPU to be used on each node :l + last = ID of last GPU to be used on each node :l + split = fraction of particles assigned to the GPU :l + {cuda} args = to be determined + {omp} args = Nthreads :pre + Nthreads = # of OpenMP threads to associate with each MPI process :pre :ule [Examples:] -package cuda blah :pre +package gpu force 0 0 1.0 +package gpu force 0 0 0.75 +package gpu force/neigh 0 0 1.0 +package gpu force/neigh 0 1 -1.0 +package cuda blah +package omp 4 :pre [Description:] -This command invokes package-specific settings. Currently only the -USER-CUDA package uses it. +This command invokes package-specific settings. Currently the +following packages use it: GPU, USER-CUDA, and USER-OMP. + +See "this section"_doc/Section_accerate.html of the manual for more +details about using these various packages for accelerating +a LAMMPS calculation. + +:line + +The {gpu} style invokes options associated with the use of the GPU +package. It allows you to select and initialize GPUs to be used for +acceleration via this package and configure how the GPU acceleration +is performed. These settings are required in order to use any style +with GPU acceleration. + +The {mode} setting specifies where neighbor list calculations will be +performed. If {mode} is force, neighbor list calculation is performed +on the CPU. If {mode} is force/neigh, neighbor list calculation is +performed on the GPU. GPU neighbor list calculation currently cannot +be used with a triclinic box. GPU neighbor list calculation currently +cannot be used with "hybrid"_pair_hybrid.html pair styles. GPU +neighbor lists are not compatible with styles that are not +GPU-enabled. When a non-GPU enabled style requires a neighbor list, +it will also be built using CPU routines. In these cases, it will +typically be more efficient to only use CPU neighbor list builds. + +The {first} and {last} settings specify the GPUs that will be used for +simulation. On each node, the GPU IDs in the inclusive range from +{first} to {last} will be used. + +The {split} setting can be used for load balancing force calculation +work between CPU and GPU cores in GPU-enabled pair styles. If 0 < +{split} < 1.0, a fixed fraction of particles is offloaded to the GPU +while force calculation for the other particles occurs simulataneously +on the CPU. If {split}<0, the optimal fraction (based on CPU and GPU +timings) is calculated every 25 timesteps. If {split} = 1.0, all force +calculations for GPU accelerated pair styles are performed on the +GPU. In this case, "hybrid"_pair_hybrid.html, "bond"_bond_style.html, +"angle"_angle_style.html, "dihedral"_dihedral_style.html, +"improper"_improper_style.html, and "long-range"_kspace_style.html +calculations can be performed on the CPU while the GPU is performing +force calculations for the GPU-enabled pair style. If all CPU force +computations complete before the GPU, LAMMPS will block until the GPU +has finished before continuing the timestep. + +As an example, if you have two GPUs per node and 8 CPU cores per node, +and would like to run on 4 nodes (32 cores) with dynamic balancing of +force calculation across CPU and GPU cores, you could specify + +package gpu force/neigh 0 1 -1 :pre + +In this case, all CPU cores and GPU devices on the nodes would be +utilized. Each GPU device would be shared by 4 CPU cores. The CPU +cores would perform force calculations for some fraction of the +particles at the same time the GPUs performed force calculation for +the other particles. + +:line The {cuda} style invokes options associated with the use of the -USER-CUDA package. These will be described when the USER-CUDA package -is released with LAMMPS. +USER-CUDA package. These need to be documented. + +:line + +The {omp} style invokes options associated with the use of the +USER-OMP package. + +The only setting to make is the number of OpenMP threads to be +allocated for each MPI process. For example, if your system has nodes +with dual quad-core processors, it has a total of 8 cores per node. +You could run MPI on 2 cores on each node (e.g. using options for the +mpirun command), and set the {Nthreads} setting to 4. This would +effectively use all 8 cores on each node. Since each MPI process +would spawn 4 threads (one of which runs as part of the MPI process +itself). + +For performance reasons, you should not set {Nthreads} to more threads +than there are physical cores, but LAMMPS does not check for this. + +:line [Restrictions:] +This command cannot be used after the simulation box is defined by a +"read_data"_read_data.html or "create_box"_create_box.html command. + The cuda style of this command can only be invoked if LAMMPS was built with the USER-CUDA package. See the "Making LAMMPS"_Section_start.html#2_3 section for more info. -Obviously, you must have GPU hardware and associated software to build -and use LAMMPS with either the GPU or USER-CUDA packages. +The gpu style of this command can only be invoked if LAMMPS was built +with the GPU package. See the "Making LAMMPS"_Section_start.html#2_3 +section for more info. -[Related commands:] +The omp style of this command can only be invoked if LAMMPS was built +with the USER-OMP package. See the "Making +LAMMPS"_Section_start.html#2_3 section for more info. -"fix gpu"_fix_gpu.html +[Related commands:] none [Default:] none