- The Gaussian directories will require about 2-3 GB of disk space for the executables, depending on the computer system.
- The default memory allocation in Gaussian 16 is 800 MB. The large fixed dimensions in the program necessitate a swap space size of 1–2 GB. Of course, additional swap space will be required if more memory is requested in a job by using the %Mem Link 0 command, or via the -M- command in the Default.Route file. These requirements are for each simultaneously executing job.
- Refer to the platform list which comes with the CD. The most recent version of this document can always be found at www.gaussian.com/g16/g16_plat.pdf.
Configuring the Gaussian Execution Environment
- g16root: Indicates the directory where the g16 subdirectory resides (i.e., the directory above it).
- GAUSS_SCRDIR: Indicates the directory which should be used for scratch files.
The Gaussian system includes initialization files to set up the user environment for running the program. These files are:
It is customary to include lines like the following within the .login or .profile file for Gaussian users:
.login files: setenv g16root /location source $g16root/g16/bsd/g16.login setenv GAUSS_SCRDIR /path .profile files: export g16root=/location . $g16root/g16/bsd/g16.profile export GAUSS_SCRDIR=/location
Once things are set up correctly, the g16 command is used to execute Gaussian 16.
Environment Variables Reference
- GAUSS_EXEDIR: Specifies the directories in which the Gaussian images are stored. By default it includes the main directory $g16root/g16 and several alternate directories.
- GAUSS_ARCHDIR: Specifies the directory in which the main site-wide archive file is kept, and into which temporary archive files should be placed if the main archive is unavailable. It defaults to $g16root/g16/arch if unset.
- G16BASIS: The directory which contains files specifying the standard Gaussian internally stored basis sets, as well as some additional basis sets in the form of general basis set input. This environment variable is provided for convenience and is designed for use with the @ include mechanism.
- Gaussian defaults may be set via environment variables of the form GAUSS_xDEF. See the documentation for the Default.Route file for details.
- Network/cluster parallel calculations using Linda may also use the GAUSS_LFLAGS environment variable to pass options to the Linda process. See Parallel Jobs for details.
Running under UNIX
Once all input and resource specifications are prepared, you are ready to run the program. Gaussian 16 may be run interactively using one of two command styles:
g16 job-name g16 <input-file >output-file
In the first form, the program reads input from job-name.gjf and writes its output to job-name.log. When job-name is not specified, the program reads from standard input and writes to standard output, and these can be redirected or piped in the usual UNIX fashion. Either form of command can be forced in the background in the same manner as any shell command using the ampersand.
Scripts and Gaussian
Scripts designed to run Gaussian 16 may also be created in several ways. First, g16 commands like those above may be included in a shell script. Secondly, actual Gaussian input may be included in the script using the << construct:
#!/bin/sh g16 <<END >water.log %Chk=water # RHF/6-31G(d) water energy 0 1 O H 1 1.0 H 1 1.0 2 120.0 END echo "Job done."
All lines preceding the string following the << symbols are taken as input to the g16 command.
Finally, loops may be created to run several Gaussian jobs in succession. For example, the following script runs all of the Gaussian input files specified as its command line arguments, and it maintains a log of its activities in the file Status:
#!/bin/sh echo "Current Job Status:" > Status for file in $argv; do nam=`echo $file | sed 's/\..*$//'` echo "Starting file $file at `date`" >> Status g16 < $file > $nam.log echo "$file Done with status $status" >> Status done echo "All Done." >> Status
The following more complex script creates Gaussian input files on-the-fly from the partial input in the files given as the script’s command line arguments. The latter are lacking full route sections; their route sections consist of simply a # sign or a # line containing special keywords needed for that molecular system, but no method, basis set, or calculation type.
The script creates a two-step job for each partial input file—a Hartree-Fock optimization followed by an MP2 single point energy calculation—consisting of both the literal commands included in the script and the contents of each file specified at script execution time. It includes the latter by exploiting the Gaussian 16 @ include file mechanism:
#!/bin/sh echo "Current Job Status:" > Status for file in $argv; do echo "Starting file $file at `date`" >> Status nam=`echo $file | sed 's/\..*$//'` g16 <<END> $nam.log %Chk=$nam # HF/6-31G(d) FOpt @$file/N --Link1-- %Chk=$nam %NoSave # MP2/6-31+G(d,p) SP Guess=Read Geom=AllCheck END echo "$file Done with status $status" >> Status done # end of for...do echo "All Done." >> Status
Batch Execution with NQS
Gaussian may be run using the NQS batch facility on those UNIX systems that support it. The subg16 command, defined in the initialization files, submits an input file to a batch queue. It has the following syntax:
subg16 queue-name job-name [-scrdir dir1] [-exedir dir2] [-p n]
The two required parameters are the queue and job names. Input is taken from job-name.gjf and output goes to job-name.log, just as for interactive runs. The NQS log file is sent to job-name.batch-log.
The optional parameters -scrdir and -exedir are used to override the default scratch and executable directories, respectively. Any other parameters are taken to be NQS options. In particular, -p n can be used to set the priority within the queue to n. This is priority for initiation (1 being lowest), and does not affect the run-time priority.
To submit an NQS job from an interactive session, a file like the following should be created (with filename name.job):
# QSUB -r name -o name.out -eo # QSUB -lt 2000 -lT 2100 # QSUB -lm 34mw -lM 34mw g16 <name.gjf
where name should be replaced with a name that is appropriate to your calculation. The first line names the running job, names the output file, and causes errors to be included in the output file. The time parameters are different to allow addition of job control for cleanup, (for example, archiving the checkpoint file in the event that the job exceeds its time limit). The memory parameters are used both for initial scheduling of your job for execution and by the program to determine dynamic memory use.
This job would then be submitted by issuing the command,
$ qsub name.job
and the output would be placed in your current working directory.
Gaussian uses several scratch files in the course of its computation. They include:
- The Checkpoint file: name.chk
- The Read-Write file: name.rwf
- The Two-Electron Integral file: name.int (empty by default)
- The Two-Electron Integral Derivative file: name.d2e (empty by default)
- The Scratch file: name.skr
By default, these files are given a name generated from the process ID of the Gaussian process, and they are stored in the scratch directory, designated by the GAUSS_SCRDIR environment variable (UNIX). This mechanism is designed to allow multiple Gaussian jobs to execute simultaneously using a common scratch directory. If GAUSS_SCRDIR is unset, the scratch file location defaults to the current working directory of the Gaussian process. You may also see files of the form name.inp in this directory. These are the internal input files used by the program.
By default, scratch files are deleted at the end of a successful run. However, you may wish to save the checkpoint file for later use in another Gaussian job, for use by a visualization program, to restart a failed job, and so on. This may be accomplished by naming the checkpoint file, providing an explicit name and/or location for it, via a %Chk command within the Gaussian input file. Here is an example:
This command, which is placed at the very beginning of the input file—before the route section—gives the checkpoint file the name water.chk, overriding the usual generated name and causing the file to be saved at job conclusion. In this case, the file will reside in the current directory. However, a command like this one will specify an alternate directory location as well as filename:
While scratch files are deleted automatically for successful jobs, they are not deleted when a job is killed externally or otherwise terminates abnormally. Consequently, leftover files may accumulate in the scratch directory. An easy method for avoiding excessive clutter is to have all users share a common scratch directory and to have that scratch directory cleared at system boot time by adding an rm command to the appropriate system boot script (e.g., /etc/rc or one of the files under /etc/rc.d/rc3.d). If the NQS batch system is in use, clearing the scratch directory should also be done before NQS is started, ensuring that no jobs are using the directory when it is cleared.
If disk space in the scratch directory is limited, but space is available elsewhere on the system, you may want to split the various scratch files among several disk locations. The following commands allow you to specify the names and locations of the other scratch files:
|%D2E=path||Integral Derivative file|
In general, the read-write file is by far the largest, and so it is the one for which an alternate location is most often specified.
Splitting Scratch Files Across Disks
An alternate syntax is provided for splitting the read-write file, the Integral file, and/or the Integral Derivative file among two or more disks (or file systems). Here is the syntax for the %RWF command:
where each loc is a directory location or a file pathname, and each size is the maximum size for the file segment at that location. Gaussian will automatically generate unique filenames for any loc which specifies a directory only. On UNIX systems, directory specifications (without filenames) must include a terminal slash.
By default, the sizes are in units of 8-byte words. This value may also be followed by KB, MB, GB, TB, KW, MW, GW or TW (without intervening spaces) to specify units of kilo-, mega-, giga-, or tera-bytes or words. Note that 1 MB = 10242 bytes = 1,048,576 bytes (not 1,000,000 bytes).
A value of -1 for any size parameter indicates that any and all available space may be used, and a value of 0 says to use the current size of an existing segment. -1 is useful only for the last file specified, for which it is the default.
For example, the following directive splits the read-write file across three disks:
The maximum sizes for the file segments are 4 GB, 3 GB, and unlimited, respectively. Gaussian will generate names for the first two segments, and the third will be given the name my_job. Note that the directory specifications include terminal slashes.
Due to limitations in current UNIX implementations, -1 should be used with caution, as it will attempt to extend a file segment beyond all remaining disk capacity on these systems; using it will also have the side effect of keeping any additional file segments included in the list from ever being used.
Gaussian 16 can address single scratch files of up to 16 GB under 32-bit operating systems. There is no need to split scratch files into 2 GB files. The 16 GB total scratch space limit is inherent in 32-bit integers, however, and splitting the scratch file will not overcome it.
Saving and Deleting Scratch Files
By default, unnamed scratch files are deleted at the end of the Gaussian run, and named files are saved. The %NoSave command may be used to change this default behavior. When this directive is included in an input file, named scratch files whose directives appear in the input file before %NoSave will be deleted at the end of a run (as well as all unnamed scratch files). However, if the % directive naming the file appears after the %NoSave directive, the file will be retained. For example, these commands specify a name for the checkpoint file, and an alternate name and directory location for the read-write file, and cause only the checkpoint file to be saved at the conclusion of the Gaussian job:
|%RWF=/chem/scratch2/water||Files to be deleted go here.|
|%Chk=water||Files to be saved go here.|
Note that all files are saved when a job terminates abnormally.
The %Mem command controls the amount of dynamic memory to be used by Gaussian. By default, 800 MB (100MW) are used. This can be changed to a different value by specify a number followed by KB, MB, GB, TB, KW, MW, GW or TW (without intervening spaces) to specify units of kilo-, mega-, giga-, or tera-bytes or words. The default units are megawords.
For example, the following command also sets the amount of dynamic memory to 1 GB:
Even larger allocations may be needed for very large direct SCF calculations, at least 3N2 words, where N is the number of basis functions.
Warning: Requesting more memory than the amount of physical memory actually available on a computer system will lead to very poor performance.
If Gaussian is being used on a machine with limited physical memory, so that the default amount is not available, the default algorithms as well as the default memory allocation will be set appropriately during installation.
Shared-Memory Multiprocessor Parallel Execution
Gaussian defaults to execution on only a single processor. If your computer system has multiple processors/cores, and parallel processing is supported in your version of Gaussian, you may the specific CPUs on which to run with the %CPU link 0 command. For example, the following specifies that the program should run on the first 5 cores of a hexacore system (reserving one core for other use):
The node list can also be specified as a range (e.g., 0-5). Ranges can also be followed by a suffix of the form /n, which says to use every nth processor in the range (e.g., /2 specifies every second processor/core).
The older %NProcShared link 0 command can be used to specify the total number of processors on which to execute (leaving the selection of processors to the operating system). Clearly, the number of processors requested should not exceed the number of processors available, or a substantial decrease in performance will result.
Cluster/Network Parallel Execution
Parallel jobs can run across discrete systems in a LAN or nodes within a cluster using the Linda parallel execution environment. HF, CIS=Direct, and DFT calculations are Linda parallel, including energies, optimizations, and frequencies. TDDFT energies and gradients and MP2 energies and gradients are also Linda parallel. Portions of MP2 frequency and CCSD calculations are Linda parallel, but others are only SMP-parallel.
For a Linda parallel job to execute successfully, the following conditions must be true:
- You have already executed the appropriate Gaussian 16 initialization file ($g16root/g16/bsd/g16.login or $g16root/g16/bsd/g16.profile). Test this by running a serial Gaussian 16 calculation on the master node.
- The directory $g16root/g16 is accessible on all nodes.
- The LD_LIBRARY_PATH environment variable is set (see the install notes) to locate the Linda shared libraries.
- Local scratch space is available on each node if needed (via the GAUSS_SCRDIR environment variable).
- All nodes on which the program will run are trusted by the current host. You should be able to login remotely with the rsh or ssh command without having to give a password to each of them. Contact your system administrator about configuring security for the nodes in the network.
The Linda parallel programming model involves a master process, which runs on the current processor, and a number of worker processes which can run on other nodes of the network. So a Gaussian 16/Linda run must specify the number of processors to use, the list of processors where the jobs should be run, and occasionally other job parameters. An environment variable is generally the easiest way to specify this information (as we will see).
Each of these nodes needs to have some access to the Gaussian 16 directory tree. The recommended configuration is to have G16 on each system that will be used for the parallel job. Note that the Linda binaries need to have the same path on each machine. If this is not feasible, the G16 tree can be made accessible via an NFS-mounted disk which is mounted in an identical location on all nodes.
For MP2 calculations, each node must also have some local disk where Gaussian 16 can put temporary files. This is defined as usual via the GAUSS_SCRDIR environment variable, which should be set in the .cshrc or .profile for your account on each node.
The %LindaWorkers directive is used to specify computers where Linda worker processes should run. It has the following syntax:
This lists the TCP node name for each node to use. By default, one Linda worker is started on each node, but the optional value allows this to be varied. A worker is always started on the node where the job is started (the master node) whether or not it appears in the node list. For example, the following directive causes a network parallel job to be run across the specified 5 nodes. Nodes hamlet and ophelia will each run two worker processes.
Finally, you can also create a configuration file on the master node named .tsnet.config which contains the following line:
This will cause ssh to be used instead. This file can be placed in your home directory or in the directory from which you launch calculations. The %UseRSH directive is also supported; its purpose is to specify the use of rsh when ssh has been made the default in Default.Route (see below).
Note that in all cases passwordless ssh logins must already be configured from the master to all worker nodes.
A few Linda options that are sometime useful are:
|-v||Display verbose messages|
|-vv||Display very verbose messages|
For example, one could turn on very verbose Linda output using:
$ export GAUSS_LFLAGS="-vv"
There are many other Linda options but most of them are not used by Gaussian. Check the Linda manual. The -opt option form can be used in GAUSS_LFLAGS to invoke any valid .tsnet.config file directive. Note that Gaussian 16/Linda does not use the native Linda resources minworker and maxworker.
Combining Shared-Memory and Network/Cluster Parallelism
The following link 0 commands start a four-way parallel worker on hosts norway and italy and two such worker processes on spain:
These link 0 commands are appropriate when norway and italy are 4 processor/core computers, and spain is an 8 processor/core computer.
It is always best to use SMP-parallelism within nodes and Linda only between nodes. For example on a cluster of 4 nodes, each with a dual quad-core EM64T, one should use the following (rather than using more than one Linda worker per node):
Starting and Monitoring Parallel Gaussian 16 Jobs
Once the proper setup is completed, the g16 command is used as usual to initiate a parallel Gaussian 16 job:
% g16 input.gjf &
In the case of distributed parallel calculations, Gaussian 16 will start the master and worker processes as needed.
Linda parallel links have filenames of the form *.exel. Using the top or other commands on worker nodes will let you see lxxx.exel when it starts.
The relevant measure of performance for a parallel calculation is the elapsed or wall clock time. The easiest way to check this is to use an external monitor like time, times, or timex, e.g.
% times g16 input.gjf &
This command will report elapsed, CPU and system times for the Gaussian job. Note that the last two are relevant only on the master node, and similar amounts of CPU and system time are expended on each node. The parallel speedup is the ratio of the elapsed time running a serial job and the elapsed time for the parallel job.
Gaussian 16 can use NVIDIA K40, K80 and P100 GPUs under Linux (the latter starting with Rev. B.01). Earlier GPUs do not have the computational capabilities or memory size to run the algorithms in Gaussian 16.
Allocating Memory for Jobs
Allocating sufficient amounts of memory to jobs is even more important when using GPUs than for CPUs, since larger batches of work must be done at the same time in order to use the GPUs efficiently. The K40 and K80 units can have up to 16 GB of memory. Typically, most of this should be made available to Gaussian. Giving Gaussian 8-9 GB works well when there is 12 GB total on each GPU; similarly, allocating Gaussian 11-12 GB is appropriate for a 16 GB GPU. In addition, at least an equal amount of memory must be available for each CPU thread which is controlling a GPU.
About Control CPUs
When using GPUs, each GPU must be controlled by a specific CPU. The controlling CPU should be as physically close as possible to the GPU it is controlling. GPUs cannot share controlling CPUs. Note that CPUs used as GPU controllers do not participate as compute nodes during the parts of the calculation that are GPU-parallel.
The hardware arrangement on a system with GPUs can be checked using the nvidia-smi utility. For example, this output is for a machine with two 16-core Haswell CPU chips and four K80 boards, each of which has two GPUs:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity GPU0 X PIX SOC SOC SOC SOC SOC SOC 0-15 cores on first chip GPU1 PIX X SOC SOC SOC SOC SOC SOC 0-15 GPU2 SOC SOC X PIX PHB PHB PHB PHB 16-31 cores on second chip GPU3 SOC SOC PIX X PHB PHB PHB PHB 16-31 GPU4 SOC SOC PHB PHB X PIX PXB PXB 16-31 GPU5 SOC SOC PHB PHB PIX X PXB PXB 16-31 GPU6 SOC SOC PHB PHB PXB PXB X PIX 16-31 GPU7 SOC SOC PHB PHB PXB PXB PIX X 16-31
The important part of this output is the CPU affinity. This example shows that GPUs 0 and 1 (on the first K80 card) are connected to the CPUs on chip 0 while GPUs 2-7 (on the other two K80 cards) are connected to the CPUs on chip 1.
Specifying GPUs & Control CPUs for a Gaussian Job
The GPUs to use for a calculation and their controlling CPUs are specified with the %GPUCPU Link 0 command. This command takes one parameter:
where gpu-list is a comma-separated list of GPU numbers, possibly including numerical ranges (e.g., 0-4,6), and control-cpus is a similarly-formatted list of controlling CPU numbers. The corresponding items in the two lists are the GPU and its controlling CPU.
For example, on a 32-processor system with 6 GPUs, a job which uses all the CPUs—26 CPUs serving solely as compute nodes and 6 CPUs used for controlling GPUs—would use the following Link 0 commands:
%CPU=0-31 Control CPUs are included in this list. %GPUCPU=0,1,2,3,4,5=0,1,16,17,18,19
These command state that CPUs 0-31 will be used in the job. GPUs 0 through 5 will be used, with GPU0 controlled by CPU 0, GPU1 controlled by CPU 1, GPU2 controlled by CPU 16, GPU3 controlled by CPU 17, and so on. Note that the controlling CPUs are included in %CPU.
In the preceding example, the GPU and CPU lists could be expressed more tersely as:
Normally one uses consecutive processors in the obvious way, but things can be associated differently in special cases. For example, suppose the same machine already had a job using 6 CPUs, running with %CPU=16-21. Then, in order to use the other 26 CPUs with 6 controlling GPUs, you would specify:
This job would use a total of 26 processors, employing 20 of them solely for computation, along with the six GPUs controlled by CPUs 0, 1, 22, 23, 24 and 25 (respectively).
In [REV B], the lists of CPUs and GPUs are both sorted and then matched up. This ensures that the the lowest numbered threads are executed on CPUs that have GPUs. Doing so ensures that if a part of a calculation has to reduce the number of processors used (i.e., because of memory limitations), it will preferentially use/retain the threads with GPUs (since it removes threads in reverse order).
GPUs and Overall Job Performance
GPUs are effective for larger molecules when doing DFT energies, gradients and frequencies (for both ground and excited states), but they are not effective for small jobs. They are also not used effectively by post-SCF calculations such as MP2 or CCSD.
Each GPU is several times faster than a CPU. However, on modern machines, there are typically many more CPUs than GPUs. The best performance comes fromm using all the CPUs as well as the GPUs.
In some circumstances, the potential speedup from GPUs can be limited because many CPUs are also used effectively by Gaussian 16. For example, if the GPU is 5x faster than a CPU, then the speedup of using the GPU versus the CPU alone would be 5x. However, the potential speedup resulting from using GPUs on a larger computer with 32 CPUs and 8 GPUs is 2x:
Without GPUs: 32*1 = 32
With GPUs: (24*1) + (8*5) = 64 Remember that control CPUs are not used for computation.
Speedup: 64/32 = 2
Note that this analysis assumes that the GPU-parallel portion of the calculation dominates the total execution time.
Allocation of memory. GPUs can have up to 16 GB of memory. One typically tries to make most of this available to Gaussian. Be aware that there must be at least an equal amount of memory given to the CPU thread running each GPU as is allocated for computation. Using 8-9 GB works well on a 12 GB GPU, or 11-12 GB on a 16 GB GPU (reserving some memory for the system). Since Gaussian gives equal shares of memory to each thread, this means that the total memory allocated should be the number of threads times the memory required to use a GPU efficiently. For example, when using 4 CPUs and 2 GPUs each with 16 GB of memory, you should use 4 × 12 GB of total memory. For example:
%Mem=48GB %CPU=0-3 %GPUCPU=0-1=0,2
You will need to analyze the characteristics of your own environment carefully when making decisions about which processors and GPUs to use and how much memory to allocate.
GPUs in a Cluster
GPUs on nodes in a cluster can be used. Since the %CPU and %GPUCPU specifications are applied to each node in the cluster, the nodes must have identical configurations (number of GPUs and their affinity to CPUs); since most clusters are collections of identical nodes, this restriction is not usually a problem.
Default values for the Link 0 commands mentioned here can be set in the Default.Route file. They can also be set via environment variables and on the command line. The following table lists these equivalences:
|Link 0||Default.Route||Option||Env. Variable||Description|
|%CPU||-C-||-c=list||GAUSS_CDEF||Processor/core list for multiprocessor parallel jobs.|
|%GPUCPU||-G-||-g=list||GAUSS_GDEF||GPUs=Cores list for GPU parallel jobs.|
|%Mem||-M-||-m=value||GAUSS_MDEF||Memory amount for Gaussian jobs.|
|%NProcShared||-P-||-p=n||GAUSS_PDEF||Number of processors/cores for multiprocessor parallel jobs.
Superseded in most cases by %CPU.
|%UseRSH||-S- rsh||-s=rsh||GAUSS_SDEF=rsh||Use rsh to start workers for network parallel jobs.|
|%UseSSH||-S- ssh||-s=ssh||GAUSS_SDEF=ssh||Use ssh to start workers for network parallel jobs.|
|%LindaWorkers||-W-||-w=list||GAUSS_WDEF||List of hostnames for network parallel jobs.|
|%DebugLinda||GAUSS_LFLAGS=-v||Enable verbose Linda output.|
Quotation marks will be required for command line argument values whenever special characters are included within them. Note that the environment variable specifications above use bash syntax.
Full documentation for all Link 0 command is available in Link0.
Last updated on: 19 February 2018. [G16 Rev. B.01]