Its primary intention is to speed up compression, but if you find any other use for this generic program, please feel free to parallelize any cpu consuming task where this program might help.
From the README:
1. General ---------- This program is used to split up data from stdin in blocks which are sent as input to parallel invocations of commands. The output from those are then concatenated in the right order and sent to stdout. Splitting up and parallelizing jobs like this might be useful to speed up compression using multiple CPU cores or even multiple computers. For this approach to be useful, the compressed format needs to allow multiple compressed files to be concatenated. This is the case for gzip, bzip2 and xz. 2. Installation --------------- Step 1, unpack the archive: tar -xzvf splitjob*.tgz Step 2, compile: cd splitjob-* make Step 3, become root and install su (and give password) make install 3. Examples ----------- Example 1, use multiple local cores: splitjob -j 4 bzip2 < bigfile > bigfile.bz2 Example 2, use remote machines: splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz The above example assumes that ssh is configured to allow logins without asking for password. See the manpage for ssh-keygen or do a google search for examples on how to accomplish this. Example 3, Use bigger blocks to reduce overhead: splitjob -j 2 -b 10M gzip < file > file.gz 4. Documentation ---------------- There is a man-page for splitjob, and you will get some help by typing: splitjob -h 5. Known problems ----------------- Splitjob does its best to detect and avoid any problems. If some sub command fails it will by default make some retries before giving up and exiting with a non-zero return value. However, like pbzip2, mpibzip2 and bzip2smp I would like to say: Use at your own risk! Verify the contents of compressed files before relying on them. If splitjob exits with any other return value than 0 its output should be discarded!
Usage: splitjob [options] [commands] Reads from stdin, splits and sends to multiple parallel invocations of commands and concatenates their output to stdout. Options: -j
From the CHANGELOG:
11/11 2017 2.1 Bugfix: Fixed copy-paste error in code which caused writing outside allocated buffer when output data from called program was bigger than input data. This could happen also at compression if data is already compressed. In theory bugs like this could cause more or less random behavior. In practice this bug has caused corrupted backup archives. Any users of version 2.0 should upgrade to version 2.1 to avoid this bug! 15/10 2017 2.0 Added support for increasing number of jobs with SIGUSR2 and decreasing number of jobs with SIGUSR1. 9/10 2017 1.2 Might be able to recover if sub process fails even if some data has been read out from the sub process. 31/1 2015 1.1 Freeing unused RAM in child processes. 14/12 2014 1.0 First stable version. No changes since 0.9.2beta which has been tested for some months without any problems found. 24/8 2014 0.9.2beta Bugfix: taking care of short reads which could cause random and non optimal compression performance when blocks sent to compression not allways were as big as intended. 24/7 2014 0.9beta First public release
Current development version is 2.1, it is available from SourceForge download. The md5sum of splitjob-2.1.tar.xz is 13452f670b8294e060a1b0de7aa609b1
Current stable version 1.2 is available from SourceForge download. The md5sum of splitjob-1.2.tar.xz is fc36de81834244f875221aeb61427c1a
stable development version 2.0, is still available from
SourceForge download, but please do not use it!.
The md5sum of splitjob-2.0.tar.xz is 09acdbf1a79d60f625a7ea3955964c70
|Unfortunately version 2.0 which at the time of its release was intended to be a stable version has proven not to be fully reliable. Because of some yet unidentified bug it seems as if data sometimes might be corrupted. This bug seem to appear even if not the new feature of adjusting number of jobs is being used but the bug has so far never shown up with older versions. Please do not use version 2.0 for important things like backup purposes and if you have used version 2.0 please verify your result files. This bug was found 2017-11-09 but the cause has not been identified at the time of this writing.|
Old stable version 1.1 is available from SourceForge download. The md5sum of splitjob-1.1.tar.bz2 is 524569591836405b9ee13f1ae7b8dde0
Old stable version 1.0 is available from SourceForge download. The md5sum of splitjob-1.0.tar.bz2 is cb3eb993b69dd1821c02fe3bc87d7ab8
Version 0.9.2beta is available from SourceForge download. The md5sum of splitjob-0.9.2beta.tar.bz2 is af7001e9e5680da24a214dafa4ae68e4
Version 0.9beta, is available from SourceForge download. The md5sum of splitjob-0.9beta.tar.bz2 is d79cd625e24f7f3d00b1ac726c65b459