Parallel processing in Linux

Parallel processing is a way to run multiple parts of a repetitive command at the same time, in parallel. Any recent CPU is capable of running different programs on different CPU cores, so utilizing more than one core to complete a task can greatly shorten the amount of time it takes to finish it.

Linux itself is 100% multi-process capable and it provides easy-to-use tools to parallelize processing. By default, each program runs on a separate CPU core and unless the program itself is capable of using multiple cores at the same time, the time it takes to finish something will depend on the speed of one CPU core. In other words, CPU-bound processes are the best candidates for parallelization.

Not every program reacts well to parallel processing, for example conventional (non-SSD) disk IO becomes magnitudes slower if two processes read/write disk at the same time because the disk head needs to keep jumping back and forth between different areas of the HDD to serve multiple processes.

Background tasks in shell, job control

In Linux, a task can be started in the background by appending “&” (ampersand) to the end of the command. In a way, it’s already processing in parallel, the task keeps running in the background (unless it’s waiting for user input) and the shell instantly returns. Once the background process completes, the shell displays a message to let you know that it’s completed.

$ sleep 2 &
[1] 64433
$ 
$ 
$ 
[1]+  Done                    sleep 2

In this example, I started a command (sleep 2, basically wait for two seconds), then pressed [ENTER] a few times until the two seconds passed. Then the shell displayed that the job was done.

While a task is running in the background, the command “fg” (short for foreground) puts it in the foreground. A running task can also be put in the background by pressing “CTRL-Z” in suspended mode, then typing “bg” (short for background) to keep it running in the background.

Let’s start 10 programs at the same time with a loop in shell and show all of them running at the same time, then wait a bit and see how some of them are gone.

$ for ((i=1;i<=10;i++)); do (sleep $i &); done; ps ux | grep sleep; sleep 3; echo '--'; ps aux | grep sleep
techtipbits     64543  0.0  0.0   5260   748 pts/0    S    15:30   0:00 sleep 1
techtipbits     64545  0.0  0.0   5260   752 pts/0    S    15:30   0:00 sleep 2
techtipbits     64547  0.0  0.0   5260   748 pts/0    S    15:30   0:00 sleep 3
techtipbits     64549  0.0  0.0   5260   744 pts/0    S    15:30   0:00 sleep 4
techtipbits     64551  0.0  0.0   5260   744 pts/0    S    15:30   0:00 sleep 5
techtipbits     64553  0.0  0.0   5260   676 pts/0    S    15:30   0:00 sleep 6
techtipbits     64555  0.0  0.0   5260   744 pts/0    S    15:30   0:00 sleep 7
techtipbits     64557  0.0  0.0   5260   676 pts/0    S    15:30   0:00 sleep 8
techtipbits     64559  0.0  0.0   5260   748 pts/0    S    15:30   0:00 sleep 9
techtipbits     64561  0.0  0.0   5260   728 pts/0    S    15:30   0:00 sleep 10
techtipbits     64563  0.0  0.0   6076   892 pts/0    S+   15:30   0:00 grep sleep
--
techtipbits     64549  0.0  0.0   5260   744 pts/0    S    15:30   0:00 sleep 4
techtipbits     64551  0.0  0.0   5260   744 pts/0    S    15:30   0:00 sleep 5
techtipbits     64553  0.0  0.0   5260   676 pts/0    S    15:30   0:00 sleep 6
techtipbits     64555  0.0  0.0   5260   744 pts/0    S    15:30   0:00 sleep 7
techtipbits     64557  0.0  0.0   5260   676 pts/0    S    15:30   0:00 sleep 8
techtipbits     64559  0.0  0.0   5260   748 pts/0    S    15:30   0:00 sleep 9
techtipbits     64561  0.0  0.0   5260   728 pts/0    S    15:30   0:00 sleep 10
techtipbits     64566  0.0  0.0   6076   884 pts/0    S+   15:30   0:00 grep sleep

The “for” statement creates a loop from 1 to 10, then it runs a “sleep X” command each time with X being the variable from the loop. Then it displays all processes matching sleep (ps | grep), waits a few seconds and does it again. You can see how after 3 seconds the first 3 sleep commands will have already finished so only sleep 4-10 remained.

Let’s do something useful, for each jpeg file in a folder, let’s optimize all of them using jpegoptim (all of them at the same time).

for i in *jpg; do (jpegoptim -n "$i" &); done

The greatest advantage of this method is its simplicity. Note how we’re putting quotes around $i to avoid issues with filenames that contain spaces. There are a few drawbacks, though – we can’t control the number of processes and these kinds of scripts don’t work well (if at all) for thousands of files in the same folder. It’s also difficult to monitor progress and every process spawns a new shell, so starting these processes is slow. It’s still much quicker than doing it one by one though.

Xargs

Two common linux tools that make parallel processing easier, parallel and xargs. They have similar capabilities, taking a stream of data and executing a sub-process for each line in that stream but parallel can also split jobs between different computers.

Xargs works by building commands from the standard input – the following command builds a file with five lines, then feeds it into xargs that combines it into one command:

$ cat numbers.txt 
one
two
three
four
five
$ xargs echo < numbers.txt 
one two three four five

In the end it made the command “echo one two three four five”. By default it adds as many parameters to a command as possible but it’s possible to control this behavior by using the “-n” parameter. Here it’s building “echo” commands with two parameters at a time.

a$ xargs -n2 echo < numbers.txt 
one two
three four
five

To make it work in parallel, we need to set two things, the number of parameters it should use to build each command (-n) and the number of parallel threads (-P, default 1)

Let’s feed xargs the list of files from the previous example (jpegs) then run jpegoptim for each using xargs:

find . -type f -name '*.jpg' -print0 | xargs -0 -n1 -P8 jpegoptim -n

For each file that’s called “*.jpg” it builds a “jpegoptim -n filename” command and runs up to 8 of them in parallel. The “find -print0” and “xargs -0” work together, to avoid problems with spaces in filenames.

We can measure efficiency by testing different -P levels (I’m running it on a 4core XEON with hyperthreading enabled, so 4 physical, 8 virtual cores)

$ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P1 jpegoptim -n >/dev/null
real	0m4.108s

$ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P4 jpegoptim -n >/dev/null
real	0m1.606s

$ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P8 jpegoptim -n >/dev/null
real	0m1.388s

$ time find . -type f -name '*jpg' -print0 | xargs -0 -n1 -P20 jpegoptim -n >/dev/null
real	0m1.394s

It goes to show that HT cores don’t help much, the difference between -P4 and -P8 is 15% and adding even more threads aren’t helping. Bottom line – best to use the number of physical cores available.

In the next article, we’re going to look at GNU parallel and its advanced uses. It allows even more control and a possibility to spread tasks between remote computers, too.

Related Posts