logo ParaLoop - The documentation

The copyright notice

paraloop is governed by the CeCILL license under French law

The ParaLoop Quick reference card

How are read switches and parameters ?

The switches and parameters are read in the order described under. If some switch or parameter is set at some step, it is NEVER SET again: thus the default values are set at the end of the process, the imposed values are set at the beginning.

  1. The switch on the command line
  2. $LIPMLIB/../etc/paraloop.root.cfg
  3. f1.cfg (see the switch --cfg
  4. f2.cfg
  5. f3.cfg
  6. $HOME/.paralooprc
  7. $LIPMLIB/../etc/paraloop.cfg

Switches and parameters useful for the end user

Files and directories

ParameterSwitchDefaultMeaning
---cfg=f1.cfg,f2.cfg,f3.cfg-List of configuration files, the first specified is read first
PARALOOP_max_file_size 1 GbThe max output file size. If more than 1 Gb, another file is created
PARALOOP_error_directory PARALOOP_errorThe error directory
PARALOOP_lock_directory PARALOOP_lockThe lock directory

Messages and log files

ParameterSwitchDefaultMeaning
---verboseNoAdd some messages, displayed to the console
---quietNoNearly no message
PARALOOP_log_level 010=log nearly nothing
01=log normally
012=log more

Input, output

ParameterSwitchDefaultMeaning
 --input The name of the input file
 --start0The start record number (0 means first record)
 --endEnd of fileThe end record number
 --interleavednoDistribute the data in a round-robin algorithm
 --output The name of the output file

Plugins

ParameterSwitchDefaultMeaning
 --plugins Display the list of available plugins
 --program The plugin to use
 --db Used by some plugins (Blast
 --wait Do not return, wait for every child to finish
 --waitonly lock_directory Just wait until every child finishes
 --waitonly lock_file Just wait until some specific child finishes

Processors and queues

ParameterSwitchDefaultMeaning
PARALOOP_ncpus--ncpusSet by the administratorThe number of cpus to use (the number of children processes to run)
 --localnoRun on the local machine, without sending the jobs to the cluster nodes
PARALOOP_fair_time_limit Set by the administratorOnly implemented with queues.
After this time has elapsed, the job is submitted again, then interrupted, letting your colleagues a chance to work.
PARALOOP_account--account Only implemented with PBS
The account, passed to the qsub utility.
PARALOOP_queue--queueSet by the administratorThe execution queue
PARALOOP_qsub_params Set by the administratorAdditional parameters passed to qsub

Interrupting

The best way for interruptnig the jobs is to create a file called paraloop.stp in the lock directory. You may also interrupt only one specific job with creating a file called paraloop.X.stp in the lock directory, where X is the number of the starting record.

Restarting

ParameterSwitchDefaultMeaning
 --restart lock_directory Restart the interrupted jobs
 --restart lock_file Restart only this job

Advanced switches and parameters

ParameterSwitchDefaultMeaning
 --monoprocessor Run this job on only one processor
 --start Start record to be processed
 --end End record to be processed
 --step Step used in the main loop.
If step is < 0, the end record must be < than the start record, but this does not work with every plugin
 --slice_size Instead of specifying --start, --end, it is sometmes more convenient specifying slices. In this case, the step parameter is always 1
--slice_size is thus the size of each slice.
 --slice_nb The slice number
 --slice_offset Add this offset for the calculation of the start of slice

Parameters of the Shell plugin

Please have a look to the Shell documentation for the details about this plugin.

ParameterDefaultMeaning
PARALOOP_Shell_interpreter/bin/shpath to the default shell interpreter

Parameters of the Bioperl plugin

Please have a look to the Bioperl documentation for the details about this plugin.

ParameterDefaultMeaning
PARALOOP_Bioperl_path path to the external script, ran at each iteration
PARALOOP_Bioperl_params''parameters passed to this script
PARALOOP_Bioperl_input_formatfastaFormat of the input file, read by the external script

Parameters of the Blast plugin

Please have a look to the Blast documentation for the details about this plugin.

ParameterSwitchDefaultMeaning
PARALOOP_Blast_origin ncbincbi for blast ncbi, wu for wu blast
PARALOOP_Blast_path blastall if Blast_origin is ncbi, blastp if Blast_origin is wu
PARALOOP_Blast_params -p blastp if Blast_origin is ncbi, '' if Blast_origin is wu
PARALOOP_Blast_chunk 1The sequences are grouped in chunks of N sequences, N is given by this parameter
PARALOOP_db--db The database

Parameters useful for the administrator

Those parameters may be set two files, with two different meanings:

.../etc/paraloop.root.cfg
Those parameters cannot be overloaded by the user
.../etc/paraloop.cfg
Those parameters are default values, they can be overloaded by the users.
ParameterDefaultMeaning
PARALOOP_Scheduler The Scheduler to use:
  • System for a multiprocessor machine
  • PBS for a machine equipped with the PBS queing system
  • Rsystem for a cluster without any queing system
PARALOOP_no_local_modenoIf specified, the users will not be able to use the --local switch, thus forcing them to use the queing system.
PARALOOP_fair_time_limit Set this parameter to keep the users from monopolizing the processors
PARALOOP_max_file_size If the output file grows too much, it is closed and a new file is reopened
PARALOOP_PBS_ncpus The default number of cpus when using PBS
PARALOOP_System_ncpus The default number of cpus when using System (or the --local switch)
PARALOOP_Rsystem_ncpus The default number of cpus when using Rsystem)

Parameters for the PBS Scheduler

ParameterDefaultMeaning
PARALOOP_account The account name, passed to qsub
PARALOOP_qsub_params''Additional parameters passed to qsub
PARALOOP_queue The execution queue

Parameters for the Rsystem scheduler

ParameterDefaultMeaning
PARALOOP_Rsystem_nodes The list of nodes constituting the cluster. Example:
node1,node2,node3
PARALOOP_Rsystem_rshrshThe program to use for sending / executing something on the nodes: may be ssh
PARALOOP_Rsystem_tmp/tmpThe name of a temporary directory. This directory must be local to the node, it cannot be shared

The PARALOOP documentation

The main documentation

DocumentDescription
User documentationThe user documentation, including a tutorial for writing plugins

The plugins

DocumentDescription
PluginThe abstract class at top of the plugin hierarchy
BpInputThe abstract class used for reading files with bioperl
LnInputThe abstract class used for reading text files
BioperlA general plugin to execute a treatment on bioperl files
ShellA general plugin to execute some lines of scripts, one line per processor
BlastA specialized plugin to execute a blast (ncbi or wu) or every sequence found in the input file
DummyThis dummy plugin can be used as a template for writing your own plugins

The schedulers

DocumentDescription
SchedulerThe abstract class at top of the Scheduler hierarchy
PBSA scheduler useful when you have PBS-Pro (or other systems ?) installed
SystemThis scheduler is used with multiprocessor SMP machines, or when you use the --local switch
RsystemThis scheduler is used with clusters which do NOT have any batch system installed

The other objects or modules

DocumentDescription
_InitializableEvery object should derive from this class
ParamParserParse a parameters files
LoggerLog in a structured way
RunnerRun an external program