Its primary intention is to speed up compression, but if you find any other use for this generic program, please feel free to parallelize any cpu consuming task where this program might help.
From the README:
1. General ---------- This program is used to split up data from stdin in blocks which are sent as input to parallel invocations of commands. The output from those are then concatenated in the right order and sent to stdout. Splitting up and parallelizing jobs like this might be useful to speed up compression using multiple CPU cores or even multiple computers. For this approach to be useful, the compressed format needs to allow multiple compressed files to be concatenated. This is the case for gzip, bzip2 and xz. 2. Installation --------------- Step 1, unpack the archive: tar -xJvf splitjob*.tar.xz Step 2, compile: cd splitjob-* make Step 3, become root and install su (and give password) make install 3. Examples ----------- Example 1, use multiple local cores: splitjob -j 4 bzip2 < bigfile > bigfile.bz2 Example 2, use remote machines: splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz The above example assumes that ssh is configured to allow logins without asking for password. See the manpage for ssh-keygen or do a google search for examples on how to accomplish this. Example 3, Use bigger blocks to reduce overhead: splitjob -j 2 -b 10M gzip < file > file.gz Example 4, parallel decompression: splitjob -X -r 10 -j 10 -b 384M "xz -d -" < file.xz > file 4. Documentation ---------------- There is a man-page for splitjob, and you will get some help by typing: splitjob -h 5. Known problems ----------------- Splitjob does its best to detect and avoid any problems. If some sub command fails it will by default make some retries before giving up and exiting with a non-zero return value. However, like pbzip2, mpibzip2 and bzip2smp I would like to say: Use at your own risk! Verify the contents of compressed files before relying on them. If splitjob exits with any other return value than 0 its output should be discarded! At parallel decompression there is a risk that the compressed data contains the magic bytes used to separate compressed blocks. This could happen by coincidence, but more likely because the compression has been used recursively e g a compressed tar file or disk image file containing files compressed with the same algorithm. Since version 3.1 of splitjob attempts to avoid failure are made by merging with data from next job at retry when a failure is detected and magic bytes are used to separate blocks. These attempts might still end in failure if: * A single block of compressed data contains more occurances of the magic bytes than the selected number of retries. This will give the error message "Failed again, giving up!" and can be avoided by increasing the number of retries with the -r switch. * A job has already sent some of its data to stdout and no longer keeps it in its buffer. This will give the error message "Got too much data and failed!" and can be avoided by increasing the block size with the -b switch. |
Usage: splitjob [options] [commands] Reads from stdin, splits and sends to multiple parallel invocations of commands and concatenates their output to stdout. Options: -j |
From the CHANGELOG:
28/8 2021 3.2 Robustness against lots of false magic bytes within compressed data as the number of retries now might be bigger than the number of jobs. 16/7 2021 3.1 Improved handling of false magic bytes within compressed data. This often happens at parallel decompression of a compressed archive or disk image file containing files compressed with the same algorithm. 8/4 2020 3.0 Removed predefined support for gzip parallel decompresion as this might fail without showing from gzip return value. 9/3 2017 2.2 Added experimental support for parallel decompression 11/11 2017 2.1 Bugfix: Fixed copy-paste error in code which caused writing outside allocated buffer when output data from called program was bigger than input data. This could happen also at compression if data is already compressed. In theory bugs like this could cause more or less random behavior. In practice this bug has caused corrupted backup archives. Any users of version 2.0 should upgrade to version 2.1 to avoid this bug! 15/10 2017 2.0 Added support for increasing number of jobs with SIGUSR2 and decreasing number of jobs with SIGUSR1. 9/10 2017 1.2 Might be able to recover if sub process fails even if some data has been read out from the sub process. 31/1 2015 1.1 Freeing unused RAM in child processes. 14/12 2014 1.0 First stable version. No changes since 0.9.2beta which has been tested for some months without any problems found. 24/8 2014 0.9.2beta Bugfix: taking care of short reads which could cause random and non optimal compression performance when blocks sent to compression not allways were as big as intended. 24/7 2014 0.9beta First public release |
Current stable version 3.2 is available from SourceForge download. The md5sum of splitjob-3.2.tar.xz is e11d35fced4b34de1ac5196c257d2b20
Latest development version was 2.2, it is available from SourceForge download. The md5sum of splitjob-2.2.tar.xz is 192ac1d5062d6fe77129e1b9391774ec
Old stable version 3.1 is still available from SourceForge download. The md5sum of splitjob-3.1.tar.xz is c3d0b6779cfe54278299d607e500ac86
Old stable version 3.0 is still available from SourceForge download. The md5sum of splitjob-3.0.tar.xz is 888fc4ca36d6b59117814363a2366d65
Old development version 2.1, is still available from SourceForge download. The md5sum of splitjob-2.1.tar.xz is 13452f670b8294e060a1b0de7aa609b1
Old stable buggy version 2.0, is still available from
SourceForge download, but please do not use it!.
The md5sum of splitjob-2.0.tar.xz is 09acdbf1a79d60f625a7ea3955964c70
Old stable version 1.2 is available from SourceForge download. The md5sum of splitjob-1.2.tar.xz is fc36de81834244f875221aeb61427c1a
Old stable version 1.1 is available from SourceForge download. The md5sum of splitjob-1.1.tar.bz2 is 524569591836405b9ee13f1ae7b8dde0
Old stable version 1.0 is available from SourceForge download. The md5sum of splitjob-1.0.tar.bz2 is cb3eb993b69dd1821c02fe3bc87d7ab8
Version 0.9.2beta is available from SourceForge download. The md5sum of splitjob-0.9.2beta.tar.bz2 is af7001e9e5680da24a214dafa4ae68e4
Version 0.9beta, is available from SourceForge download. The md5sum of splitjob-0.9beta.tar.bz2 is d79cd625e24f7f3d00b1ac726c65b459