Padb is a Job Inspection Tool for examining and debugging parallel programs, primarily it simplifies the process of gathering stack traces on compute clusters however it also supports a wide range of other functions. Padb supports a number of parallel environments and it works out-of-the-box on the majority of clusters. It's an open source, non-interactive, command line, script-able tool intended for use by programmers and system administrators alike.
Padb is developed and maintained by Ashley Pittman.
FeaturesThe following modes of operation are supported:
- Stack trace generation
- MPI Message queue display
- Deadlock detection and collective state reporting
- Process interrogation
- Signal forwarding/delivery
- MPI collective reporting
- Job monitoring
What padb can't doPadb is a job inspection tool, it can tell you want you want to know about your job and your MPI stack, it will not, however, tell you about your cluster as a whole and it won't diagnose problems with your wider environment, including you job launcher or runtime environment. Padb does not launch or wrap your jobs for you, it is not a job harness but rather attaches to or targets jobs which are already running.
Licensepadb is licensed under the LGPL and as such is open-source and free to use and modify.
HistoryPadb was originally conceived by software developers at Quadrics around 2004 to solve the kind of problems facing them at the time. It's been a part of the Quadrics software stack for a number of years and has recently been made available to a wider audience. It has been commercially supported for a number of years and is known to work at a scale of tens of thousands of processes.
Parallel EnvironmentsPadb works and is supported on the following parallel environments and MPI stacks. Not all features are available on all runtimes.
- Quadrics RMS
Runs natively on clusters running RMS.
Runs natively on clusters running Slurm irrespective of the runtime used.
- Open MPI
Supports orte or OpenMPI jobs run under slurm.
Supports mpd or MPICH2 jobs run under slurm.
In addition padb can be told to target individual UNIX processes.
PrerequisitesPadb requires very little support from the OS or parallel environment to run, it's main use is to assist in the debugging of parallel applications, it's therefore assumed that you have a working MPI stack or other parallel environment and that "Hello world" application runs to completion without error.
A Linux operating system is assumed and a working gdb is required for stack trace functionality. Work on a solaris port is under way.