Parallel processing is a way to run multiple parts of a repetitive command at the same time, in parallel. Any recent CPU is capable of running different programs on different CPU cores, so utilizing more than one core to complete a task can greatly shorten the amount of time it takes to finish it.
Linux itself is 100% multi-process capable and it provides easy-to-use tools to parallelize processing. By default, each program runs on a separate CPU core and unless the program itself is capable of using multiple cores at the same time, the time it takes to finish something will depend on the speed of one CPU core. In other words, CPU-bound processes are the best candidates for parallelization.
Not every program reacts well to parallel processing, for example conventional (non-SSD) disk IO becomes magnitudes slower if two processes read/write disk at the same time because the disk head needs to keep jumping back and forth between different areas of the HDD to serve multiple processes.
Background tasks in shell, job control
In Linux, a task can be started in the background by appending “&” (ampersand) to the end of the command. In a way, it’s already processing in parallel, the task keeps running in the background (unless it’s waiting for user input) and the shell instantly returns. Once the background process completes, the shell displays a message to let you know that it’s completed.
$ sleep 2 &  64433 $ $ $ + Done sleep 2
In this example, I started a command (sleep 2, basically wait for two seconds), then pressed [ENTER] a few times until the two seconds passed. Then the shell displayed that the job was done.
While a task is running in the background, the command “fg” (short for foreground) puts it in the foreground. A running task can also be put in the background by pressing “CTRL-Z” in suspended mode, then typing “bg” (short for background) to keep it running in the background.
Let’s start 10 programs at the same time with a loop in shell and show all of them running at the same time, then wait a bit and see how some of them are gone.
$ for ((i=1;i<=10;i++)); do (sleep $i &); done; ps ux | grep sleep; sleep 3; echo '--'; ps aux | grep sleep techtipbits 64543 0.0 0.0 5260 748 pts/0 S 15:30 0:00 sleep 1 techtipbits 64545 0.0 0.0 5260 752 pts/0 S 15:30 0:00 sleep 2 techtipbits 64547 0.0 0.0 5260 748 pts/0 S 15:30 0:00 sleep 3 techtipbits 64549 0.0 0.0 5260 744 pts/0 S 15:30 0:00 sleep 4 techtipbits 64551 0.0 0.0 5260 744 pts/0 S 15:30 0:00 sleep 5 techtipbits 64553 0.0 0.0 5260 676 pts/0 S 15:30 0:00 sleep 6 techtipbits 64555 0.0 0.0 5260 744 pts/0 S 15:30 0:00 sleep 7 techtipbits 64557 0.0 0.0 5260 676 pts/0 S 15:30 0:00 sleep 8 techtipbits 64559 0.0 0.0 5260 748 pts/0 S 15:30 0:00 sleep 9 techtipbits 64561 0.0 0.0 5260 728 pts/0 S 15:30 0:00 sleep 10 techtipbits 64563 0.0 0.0 6076 892 pts/0 S+ 15:30 0:00 grep sleep -- techtipbits 64549 0.0 0.0 5260 744 pts/0 S 15:30 0:00 sleep 4 techtipbits 64551 0.0 0.0 5260 744 pts/0 S 15:30 0:00 sleep 5 techtipbits 64553 0.0 0.0 5260 676 pts/0 S 15:30 0:00 sleep 6 techtipbits 64555 0.0 0.0 5260 744 pts/0 S 15:30 0:00 sleep 7 techtipbits 64557 0.0 0.0 5260 676 pts/0 S 15:30 0:00 sleep 8 techtipbits 64559 0.0 0.0 5260 748 pts/0 S 15:30 0:00 sleep 9 techtipbits 64561 0.0 0.0 5260 728 pts/0 S 15:30 0:00 sleep 10 techtipbits 64566 0.0 0.0 6076 884 pts/0 S+ 15:30 0:00 grep sleep
The “for” statement creates a loop from 1 to 10, then it runs a “sleep X” command each time with X being the variable from the loop. Then it displays all processes matching sleep (ps | grep), waits a few seconds and does it again. You can see how after 3 seconds the first 3 sleep commands will have already finished so only sleep 4-10 remained.
Let’s do something useful, for each jpeg file in a folder, let’s optimize all of them using jpegoptim (all of them at the same time).
for i in *jpg; do (jpegoptim -n "$i" &); done
The greatest advantage of this method is its simplicity. Note how we’re putting quotes around $i to avoid issues with filenames that contain spaces. There are a few drawbacks, though – we can’t control the number of processes and these kinds of scripts don’t work well (if at all) for thousands of files in the same folder. It’s also difficult to monitor progress and every process spawns a new shell, so starting these processes is slow. It’s still much quicker than doing it one by one though.
Two common linux tools that make parallel processing easier, parallel and xargs. They have similar capabilities, taking a stream of data and executing a sub-process for each line in that stream but parallel can also split jobs between different computers.
Xargs works by building commands from the standard input – the following command builds a file with five lines, then feeds it into xargs that combines it into one command:
$ cat numbers.txt one two three four five $ xargs echo < numbers.txt one two three four five
In the end it made the command “echo one two three four five”. By default it adds as many parameters to a command as possible but it’s possible to control this behavior by using the “-n” parameter. Here it’s building “echo” commands with two parameters at a time.
a$ xargs -n2 echo < numbers.txt one two three four five
To make it work in parallel, we need to set two things, the number of parameters it should use to build each command (-n) and the number of parallel threads (-P, default 1)
Let’s feed xargs the list of files from the previous example (jpegs) then run jpegoptim for each using xargs:
find . -type f -name '*.jpg' -print0 | xargs -0 -n1 -P8 jpegoptim -n
For each file that’s called “*.jpg” it builds a “jpegoptim -n filename” command and runs up to 8 of them in parallel. The “find -print0” and “xargs -0” work together, to avoid problems with spaces in filenames.
We can measure efficiency by testing different -P levels (I’m running it on a 4core XEON with hyperthreading enabled, so 4 physical, 8 virtual cores)
$ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P1 jpegoptim -n >/dev/null real 0m4.108s $ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P4 jpegoptim -n >/dev/null real 0m1.606s $ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P8 jpegoptim -n >/dev/null real 0m1.388s $ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P20 jpegoptim -n >/dev/null real 0m1.394s
It goes to show that HT cores don’t help much, the difference between -P4 and -P8 is 15% and adding even more threads aren’t helping. Bottom line – best to use the number of physical cores available.
In the next article, we’re going to look at GNU parallel and its advanced uses. It allows even more control and a possibility to spread tasks between remote computers, too.