Quick favorite links: Documentation FAQ Download Contact


This is a small utility which splits up data read as input into blocks of a chosen size, sends such blocks to parallel invocations of some program and concatenates the output of those invocations. It was inspired by the description of dbzip2 and implements most of its useful features in a simple way which gives more flexibility.

Its primary intention is to speed up compression, but if you find any other use for this generic program, please feel free to parallelize any cpu consuming task where this program might help.


Splitjob is published under GNU General Public License v.2. For more information on GPL visit the GNU web-site.


From the README:

1. General

This program is used to split up data from stdin in blocks which are sent
as input to parallel invocations of commands. The output from those are
then concatenated in the right order and sent to stdout.

Splitting up and parallelizing jobs like this might be useful to speed up
compression using multiple CPU cores or even multiple computers.

For this approach to be useful, the compressed format needs to allow multiple
compressed files to be concatenated. This is the case for gzip, bzip2 and xz.

2. Installation

Step 1, unpack the archive:

tar -xzvf splitjob*.tgz

Step 2, compile:

cd splitjob-*

Step 3, become root and install

su (and give password)
make install

3. Examples

Example 1, use multiple local cores:
splitjob -j 4 bzip2 < bigfile > bigfile.bz2

Example 2, use remote machines:
splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz

The above example assumes that ssh is configured to allow logins without asking
for password. See the manpage for ssh-keygen or do a google search for examples
on how to accomplish this.

Example 3, Use bigger blocks to reduce overhead:
splitjob -j 2 -b 10M gzip < file > file.gz

4. Documentation

There is a man-page for splitjob, and you will get some help by typing:

splitjob -h

5. Known problems

Splitjob does its best to detect and avoid any problems. If some sub command
fails it will by default make some retries before giving up and exiting with
a non-zero return value. However, like pbzip2, mpibzip2 and bzip2smp I would
like to say: Use at your own risk! Verify the contents of compressed files
before relying on them. If splitjob exits with any other return value than 0
its output should be discarded!

Performance wins and drawbacks

How splitjob is able to reduce compression time has been studied in these splitjob performance tests. The drawback of increased compressed size because of splitting up input data in small blocks being compressed independently has been studied in this splitjob test with different block sizes and different compression programs.


In lack of questions the FAQ doesn't exist yet. Questions will be answered at SourceForge splitjob support page.


        splitjob [options] [commands]
Reads from stdin, splits and sends to multiple parallel invocations
of commands and concatenates their output to stdout.
  -j     Set number of parallel jobs (default number of commands)
  -b  Set block size for each job (default 1 MB)
  -r     Set number of retries for failed jobs (default 3)
  -h        Display this help and exit
  -v        Show program version and copyright
Use multiple local cores:  splitjob -j 4 bzip2 < bigfile > bigfile.bz2
Use remote machines:       splitjob "ssh h1 gzip" "ssh h2 gzip" < f > f.gz
Big block reduce overhead: splitjob -j 2 -b 10M gzip < file > file.gz



11/11 2017  2.1        Bugfix: Fixed copy-paste error in code which caused
                       writing outside allocated buffer when output data from
                       called program was bigger than input data. This could
                       happen also at compression if data is already
                       compressed. In theory bugs like this could cause more
                       or less random behavior. In practice this bug has caused
                       corrupted backup archives. Any users of version 2.0
                       should upgrade to version 2.1 to avoid this bug!

15/10 2017  2.0        Added support for increasing number of jobs with SIGUSR2
                       and decreasing number of jobs with SIGUSR1.

9/10  2017  1.2        Might be able to recover if sub process fails even if
                       some data has been read out from the sub process.

31/1  2015  1.1        Freeing unused RAM in child processes.

14/12 2014  1.0        First stable version. No changes since 0.9.2beta which
                       has been tested for some months without any problems
24/8  2014  0.9.2beta  Bugfix: taking care of short reads which could cause
                       random and non optimal compression performance when
                       blocks sent to compression not allways were as big as
24/7  2014  0.9beta    First public release


Current development version

Current development version is 2.1, it is available from SourceForge download. The md5sum of splitjob-2.1.tar.xz is 13452f670b8294e060a1b0de7aa609b1

Current stable version

Current stable version 1.2 is available from SourceForge download. The md5sum of splitjob-1.2.tar.xz is fc36de81834244f875221aeb61427c1a

Older versions

Old stable development version 2.0, is still available from SourceForge download, but please do not use it!. The md5sum of splitjob-2.0.tar.xz is 09acdbf1a79d60f625a7ea3955964c70

Unfortunately version 2.0 which at the time of its release was intended to be a stable version has proven not to be fully reliable. Because of some yet unidentified bug it seems as if data sometimes might be corrupted. This bug seem to appear even if not the new feature of adjusting number of jobs is being used but the bug has so far never shown up with older versions. Please do not use version 2.0 for important things like backup purposes and if you have used version 2.0 please verify your result files. This bug was found 2017-11-09 but the cause has not been identified at the time of this writing.

Old stable version 1.1 is available from SourceForge download. The md5sum of splitjob-1.1.tar.bz2 is 524569591836405b9ee13f1ae7b8dde0

Old stable version 1.0 is available from SourceForge download. The md5sum of splitjob-1.0.tar.bz2 is cb3eb993b69dd1821c02fe3bc87d7ab8

Version 0.9.2beta is available from SourceForge download. The md5sum of splitjob-0.9.2beta.tar.bz2 is af7001e9e5680da24a214dafa4ae68e4

Version 0.9beta, is available from SourceForge download. The md5sum of splitjob-0.9beta.tar.bz2 is d79cd625e24f7f3d00b1ac726c65b459


Bug reports

Bugs should be reported to the SourceForge Bug Tracking System.

Feature requests

With my limited time to spend I make no promises, but requests for new features can be posted at SourceForge splitjob feature request page. Feature requests are welcome, but even more welcome are new implemented features contributed as patches at SourceForge splitjob patches page.


Questions will be answered at SourceForge splitjob support page.


It was once possible contact me, Henrik Carlqvist, by my sourceforge email. Unfortunately that email address is no longer usable because large amounts of spam.
Hosted by:
SourceForge Logo