Managed Services - How to Use Torque

Last Updated: 05/06/2014
Back to:
Layout:
One Column
Two Column

Overview

Torque is a resource manager for clusters. It will allow you to submit jobs and manage status of jobs through the use of a few tools. This guide will show you how to use torque. More information can be found at the torque website.

This tutorial is operating system independent.

Basic Usage

Torque is able to dynamically allocate resources for your job. You simply need to submit it and it will find the processors for you.

Submit a job:

To submit a job you must write a shell script that torque will run. The idea is to create a Torque job file and then run "qsub job_file". Torque will run the job file with the options specified.

A basic job script example:

> cat test.sh
#!/bin/sh
date
sleep 30

This script will run date, and then sleep for 20 seconds on one processor. This script is not really useful until you replace the sleep and date commands with process intensive commands.

To submit the job to the queue, use qsub.

>qsub test.sh
2607.servername.colorado.edu

This job has the id 2607 and will be submitted to the default queue. When there is enough resources available, this job will run. On completion, Standard output and Standard error are saved into files where you ran the program from.

> ls
test.sh.o2607 test.sh.e2607

Delete a job

To delete a job use "qdel". Qdel will remove the job from the queue, and it will not be run. If it is being run it will stop the job.

>qdel 2607

Check the status of a job

To check the status of a job use "qstat". Qstat is a command that will return all queued and running jobs.

>qstat
        Job id                    Name             User            Time Use S Queue
        ------------------------- ---------------- --------------- -------- - -----
        2607.servername           STDIN            username               0 R workq  
        2608.servername           STDIN            username               0 Q workq 

The 'S' parameter tells the status of the job. 'R' for running, 'Q' for queued.

Advanced usage:

Torque allows you to use advanced features and customizations when running jobs. The below sections are continuations of the sections above.

Job Submission:

There are options in the shell script that can be used to customize your job.

A Basic script

>cat test.sh
        #!/bin/bash
        #PBS -N testjob
        cat $PBS_NODEFILE
        sleep 30

$PBS_NODEFILE is the location of a file that contains a list of the nodes allocated for this job.

#PBS specifies an option to Torque. There are many listed below, but more can be found in the man page for qsub.

#PBS -r n                       # The job is not rerunnable.
#PBS -r y                       # The job is rerunnable
#PBS -q testq                   # The queue to submit to    
#PBS -N testjob                 # The name of the job
#PBS -o testjob.out             # The file to print the output to
#PBS -e testjob.err             # The file to print the error to

# Mail Directives

#PBS -m abe                     # The points during the execution to send an email
#PBS -M me@colorado.edu         # Who to Mail to
#PBS -l walltime=01:00:00       # Specify the walltime
#PBS -l pmem=100mb              # Memory Allocation for the Job
#PBS -l nodes=4                 # Number of nodes to Allocate
#PBS -l nodes=4:ppn=3           # Number of nodes and the number processors per node

You can use any of the above options in the script to customize your job. If all of the above options are used, the job will be named testjob and be put into the testq. It will only run for 1 hour and mail me@colorado.edu at the beginning and end of the job.  It will use 4 nodes with 3 processors per node, with a total of 12 processors and 100 mb of memory.

Check the status of a job:

Torque and Maui allow you to check the status of jobs and the queue status.

In Torque:

Qstat has many options for checking a job status. The basic way is running the command with out any options which is showed above. Again the man pages are the best resources for information.

Other options include: -n, -f, -Q, -B, -u, -q

The -n option will show which nodes are running which jobs.

>qstat -n
        server.colorado.edu:
                                                                         Req'd  Req'd   Elap
        Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
        -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
        78.server.colorado     user     workq    STDIN              4811   --   --    --    --  R   --
           node34/0
        79.server.colorado     user     workq    STDIN              4830   --   --    --    --  R   --
           node34/1
        80.server.colorado     user     workq    STDIN              3867   --   --    --    --  R   --
           node33/0
        81.server.colorado     user     workq    STDIN              4821   --   --    --    --  R   --
           node32/0
        82.server.colorado     user     workq    STDIN              4840   --   --    --    --  R   --
           node32/1
        83.server.colorado     user     workq    STDIN              4859   --   --    --    --  R   --
           node32/2

The -f option will show the full details for a specified job.

>qstat -f 78
Job Id: 84.server.colorado.edu
    Job_Name = STDIN
    Job_Owner = username@server.colorado.edu
    resources_used.cput = 00:00:00
    resources_used.mem = 1704kb
    resources_used.vmem = 8028kb
    resources_used.walltime = 00:00:01
    job_state = R
    queue = workq
    server = server.colorado.edu
    Checkpoint = u
    ctime = Fri Apr 24 16:21:51 2009
    Error_Path = server.colorado.edu:/tmp/STDIN.e84
    exec_host = node34/0
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Fri Apr 24 16:21:53 2009
    Output_Path = server.colorado.edu:/tmp/STDIN.o84
    Priority = 0
    qtime = Fri Apr 24 16:21:51 2009
    Rerunable = True
    Resource_List.neednodes = node34
    session_id = 4877
    substate = 42
    Variable_List = PBS_O_HOME=/tmp,PBS_O_LOGNAME=username,
        PBS_O_PATH= /usr/local/bin:/usr/bin
        PBS_O_SHELL=/bin/tcsh,PBS_SERVER=server.colorado.edu,
        PBS_O_HOST=server.colorado.edu,PBS_O_WORKDIR=/tmp
        PBS_O_QUEUE=workq
    euser = username
    egroup = server
    hashname = 84.server.colorado.edu
    queue_rank = 83
    queue_type = E
    etime = Fri Apr 24 16:21:51 2009
    start_time = Fri Apr 24 16:21:53 2009
    start_count = 1

The -u option will show all jobs owned the specified user.

The -Q option will show the queue information. If a specific queue is specified it will only show the information from that queue.

>qstat -Q
        Queue              Max   Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext T         
        ----------------   ---   ---   ---   ---   ---   ---   ---   ---   ---   --- -         
        testing               0     0   yes   yes     0     0     0     0     0     0 E         
        normal              8     1   yes   yes     0     1     0     0     0     0 E         
        short                 0     0   yes   yes     0     0     0     0     0     0 E         
        long                  0     3   yes   yes     0     3     0     0     0     0 E         
        special              0     0   yes   yes     0     0     0     0     0     0 E

In Maui:

If maui is installed on your system, you will have access to another set of tools. One of these is showq. showq is a tool like qstat. It will show the queue information.

>showq
        ACTIVE JOBS--------------------
        JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
        624                   user1    Running     4    21:00:01  Fri Apr 24 13:34:17
        621                   user2    Running     2 95:21:19:49  Mon Apr 20 13:54:06
        622                   user2    Running     2 95:21:23:06  Mon Apr 20 13:57:23
        623                   user2    Running     2 96:04:13:37  Mon Apr 20 20:47:54
             4 Active Jobs      10 of   20 Processors Active (50.00%)
                                 5 of    7 Nodes Active      (71.43%)
        IDLE JOBS----------------------
        JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
        0 Idle Jobs
        BLOCKED JOBS----------------
        JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
        Total Jobs: 4   Active Jobs: 4   Idle Jobs: 0   Blocked Jobs: 0

Was this helpful?

Back to top