These tests were made to see how splitjob block size and choice of compression
algorithm affects the resulting size. This kind of data is nice at is easily
reproducible. Other interesting data would be timing performance, and even
though splitjob seems to scale rather well timing measurements will differ
between different machines and even also on the same machine because of
differences in CPU load. As the machine used for tests only had two CPU cores
splitjob is only used to spawn two jobs (-j 2).
The input data is the file linux-3.15.6.tar (the Linux kernel source) with the
size 571576320 bytes. This particular size is interesting as it is possible
to split into half a number of times. I have also tried the splitjob default block
size 1 MB (1048576 bytes) and the splitjob documentation example 10 MB
(10485760 bytes), in the documentation 10 MB is used to reduce output size
overhead.
Command | Block size | Resulting size | Relative size | Comment |
xz -9 < linux-3.15.6.tar | wc -c | NA | 79420204 | NA | Compression without splitjob |
splitjob -j 1 -b 571576320 "xz -9" < linux-3.15.6.tar | wc -c | 571576320 | 79420204 | 100% | Only one single process as the block size is the same as the entire file size. |
splitjob -j 2 -b 285788160 "xz -9" < linux-3.15.6.tar | wc -c | 285788160 | 79545192 | 100.15% | File split into two blocks of equal size which are compressed in parallel. |
splitjob -j 2 -b 142894080 "xz -9" < linux-3.15.6.tar | wc -c | 142894080 | 79834892 | 100.52% | File split into four blocks of equal size which are compressed two and two in parallel. |
splitjob -j 2 -b 71447040 "xz -9" < linux-3.15.6.tar | wc -c | 71447040 | 80663856 | 101.56% | ...eight blocks... |
splitjob -j 2 -b 35723520 "xz -9" < linux-3.15.6.tar | wc -c | 35723520 | 81677840 | 102.84% | ...16 blocks... |
splitjob -j 2 -b 17861760 "xz -9" < linux-3.15.6.tar | wc -c | 17861760 | 82989044 | 104.49% | ...32 blocks... |
splitjob -j 2 -b 10M "xz -9" < linux-3.15.6.tar | wc -c | 10485760 | 84392984 | 106.26% | Documentation example 10 MB block size to reduce output size overhead |
splitjob -j 2 -b 8930880 "xz -9" < linux-3.15.6.tar | wc -c | 8930880 | 84872516 | 106.86% | ...64 blocks... |
splitjob -j 2 -b 4465440 "xz -9" < linux-3.15.6.tar | wc -c | 4465440 | 86690952 | 109.15% | ...128 blocks ... |
splitjob -j 2 -b 2232720 "xz -9" < linux-3.15.6.tar | wc -c | 2232720 | 89145344 | 112.24% | ...256 blocks ... |
splitjob -j 2 -b 1116360 "xz -9" < linux-3.15.6.tar | wc -c | 1116360 | 92317568 | 116.23% | ...512 blocks... |
splitjob -j 2 "xz -9" < linux-3.15.6.tar | wc -c | 1048576 | 92645396 | 116.65% | Splitjob default block size |
splitjob -j 2 -b 558180 "xz -9" < linux-3.15.6.tar | wc -c | 558180 | 96290324 | 121.24% | File split into 1024 blocks of equal size which are compressed two and two in parallel. |
Command | Block size | Resulting size | Relative size | Comment |
xz < linux-3.15.6.tar | wc -c | NA | 82223868 | 103.53% compared with xz -9 | Compression without splitjob |
splitjob -j 1 -b 571576320 xz < linux-3.15.6.tar | wc -c | 571576320 | 82223868 | 100% | Only one single process as the block size is the same as the entire file size. |
splitjob -j 2 -b 285788160 xz < linux-3.15.6.tar | wc -c | 285788160 | 82269680 | 100.05% | File split into two blocks of equal size which are compressed in parallel. |
splitjob -j 2 -b 142894080 xz < linux-3.15.6.tar | wc -c | 142894080 | 82345544 | 100.14% | File split into four blocks of equal size which are compressed two and two in parallel. |
splitjob -j 2 -b 71447040 xz < linux-3.15.6.tar | wc -c | 71447040 | 82496700 | 100.33% | ...eight blocks... |
splitjob -j 2 -b 35723520 xz < linux-3.15.6.tar | wc -c | 35723520 | 82752564 | 100.64% | ...16 blocks... |
splitjob -j 2 -b 17861760 xz < linux-3.15.6.tar | wc -c | 17861760 | 83411432 | 101.44% | ...32 blocks... |
splitjob -j 2 -b 10M xz < linux-3.15.6.tar | wc -c | 10485760 | 84431992 | 102.68% | Documentation example 10 MB block size to reduce output size overhead |
splitjob -j 2 -b 8930880 xz < linux-3.15.6.tar | wc -c | 8930880 | 84874000 | 103.22% | ...64 blocks... |
splitjob -j 2 -b 4465440 xz < linux-3.15.6.tar | wc -c | 4465440 | 86691036 | 105.43% | ...128 blocks ... |
splitjob -j 2 -b 2232720 xz < linux-3.15.6.tar | wc -c | 2232720 | 89145316 | 108.41% | ...256 blocks ... |
splitjob -j 2 -b 1116360 xz < linux-3.15.6.tar | wc -c | 1116360 | 92317564 | 112.27% | ...512 blocks... |
splitjob -j 2 xz < linux-3.15.6.tar | wc -c | 1048576 | 92645396 | 112.67% | Splitjob default block size |
splitjob -j 2 -b 558180 xz < linux-3.15.6.tar | wc -c | 558180 | 96290324 | 117.10% | File split into 1024 blocks of equal size which are compressed two and two in parallel. |
It might be worth noting that with a splitjob block size of 1 MB or lower "xz -9"
gives exactly the same result as "xz" which is the same as "xz -6". This probably
says something about how xz uses different block sizes for different compression
levels. Further testings has shown that
Command | Block size | Resulting size | Relative size | Comment |
bzip2 < linux-3.15.6.tar | wc -c | NA | 95116256 | 115.67% compared with xz 119.76% compared with xz -9 | Compression without splitjob |
splitjob -j 1 -b 571576320 bzip2 < linux-3.15.6.tar | wc -c | 571576320 | 95116256 | 100% | Only one single process as the block size is the same as the entire file size. |
splitjob -j 2 -b 285788160 bzip2 < linux-3.15.6.tar | wc -c | 285788160 | 95162155 | 100.04% | File split into two blocks of equal size which are compressed in parallel. |
splitjob -j 2 -b 142894080 bzip2 < linux-3.15.6.tar | wc -c | 142894080 | 95179351 | 100.06% | File split into four blocks of equal size which are compressed two and two in parallel. |
splitjob -j 2 -b 71447040 bzip2 < linux-3.15.6.tar | wc -c | 71447040 | 95210226 | 100.09% | ...eight blocks... |
splitjob -j 2 -b 35723520 bzip2 < linux-3.15.6.tar | wc -c | 35723520 | 95248377 | 100.13% | ...16 blocks... |
splitjob -j 2 -b 17861760 bzip2 < linux-3.15.6.tar | wc -c | 17861760 | 95247052 | 100.13% | ...32 blocks... |
splitjob -j 2 -b 10M bzip2 < linux-3.15.6.tar | wc -c | 10485760 | 95272415 | 100.16% | Documentation example 10 MB block size to reduce output size overhead |
splitjob -j 2 -b 8930880 bzip2 < linux-3.15.6.tar | wc -c | 8930880 | 95393785 | 100.29% | ...64 blocks... |
splitjob -j 2 -b 4465440 bzip2 < linux-3.15.6.tar | wc -c | 4465440 | 95472544 | 100.37% | ...128 blocks ... |
splitjob -j 2 -b 2232720 bzip2 < linux-3.15.6.tar | wc -c | 2232720 | 96200225 | 101.13% | ...256 blocks ... |
splitjob -j 2 -b 1116360 bzip2 < linux-3.15.6.tar | wc -c | 1116360 | 97019953 | 102.00% | ...512 blocks... |
splitjob -j 2 bzip2 < linux-3.15.6.tar | wc -c | 1048576 | 96775192 | 101.74% | Splitjob default block size |
splitjob -j 2 -b 558180 bzip2 < linux-3.15.6.tar | wc -c | 558180 | 98207747 | 103.25% | File split into 1024 blocks of equal size which are compressed two and two in parallel. |
Command | Block size | Resulting size | Relative size | Comment |
gzip < linux-3.15.6.tar | wc -c | NA | 121474801 | 127.71% compared with bzip2 147.73% compared with xz 152.95% compared with xz -9 | Compression without splitjob |
splitjob -j 1 -b 571576320 gzip < linux-3.15.6.tar | wc -c | 571576320 | 121474801 | 100% | Only one single process as the block size is the same as the entire file size. |
splitjob -j 2 -b 285788160 gzip < linux-3.15.6.tar | wc -c | 285788160 | 121479493 | 100.00% | File split into two blocks of equal size which are compressed in parallel. |
splitjob -j 2 -b 142894080 gzip < linux-3.15.6.tar | wc -c | 142894080 | 121477971 | 100.00% | File split into four blocks of equal size which are compressed two and two in parallel. |
splitjob -j 2 -b 71447040 gzip < linux-3.15.6.tar | wc -c | 71447040 | 121482091 | 100.00% | ...eight blocks... |
splitjob -j 2 -b 35723520 gzip < linux-3.15.6.tar | wc -c | 35723520 | 121487334 | 100.01% | ...16 blocks... |
splitjob -j 2 -b 17861760 gzip < linux-3.15.6.tar | wc -c | 17861760 | 121502238 | 100.02% | ...32 blocks... |
splitjob -j 2 -b 10M gzip < linux-3.15.6.tar | wc -c | 10485760 | 121521501 | 100.03% | Documentation example 10 MB block size to reduce output size overhead |
splitjob -j 2 -b 8930880 gzip < linux-3.15.6.tar | wc -c | 8930880 | 121532538 | 100.04% | ...64 blocks... |
splitjob -j 2 -b 4465440 gzip < linux-3.15.6.tar | wc -c | 4465440 | 121592245 | 100.09% | ...128 blocks ... |
splitjob -j 2 -b 2232720 gzip < linux-3.15.6.tar | wc -c | 2232720 | 121699102 | 100.18% | ...256 blocks ... |
splitjob -j 2 -b 1116360 gzip < linux-3.15.6.tar | wc -c | 1116360 | 121932408 | 100.37% | ...512 blocks... |
splitjob -j 2 gzip < linux-3.15.6.tar | wc -c | 1048576 | 121957733 | 100.39% | Splitjob default block size |
splitjob -j 2 -b 558180 gzip < linux-3.15.6.tar | wc -c | 558180 | 122405005 | 100.76% | File split into 1024 blocks of equal size which are compressed two and two in parallel. |