Having a proper backup solution is one of the most important tools to ensure data safety and integrity. Linux systems have no built-in backup system like “Windows Backup” in Windows 10 or “Time Machine” in Windows, however, Duplicity provides a free and reliable solution to take automated file system level backups of any Linux system.
The proper backup strategy
No backup solution provides complete safety against all possible threats so a proper backup strategy should provide various different ways to increase the availability of data and preferably multiple ways to recover anything that’s been lost.
The golden rule of “3-2-1” says that there should be at least 3 different versions of data copies at any given time, stored on 2 different kinds of media and at least 1 of them should be off-site.
3 copies – A proper backup system stores multiple copies of the same data taken copied at different time periods to ensure that any incident that affects multiple versions of the same file can be easily recovered from. It also increases the chances of successful recovery should one of the copies becomes somehow corrupted.
2 media – Backups should not be stored on the same media as the original copy to protect against hard drive corruption or theft/loss of the hard drive or SSD that stores information. A backup that is destroyed at the same time as the original information is not really a backup.
1 off-site location – At least one of the back-ups should be stored at a different location to protect from physical damage, fire or theft.
Other backup considerations
Data integrity and recovery from backups should be periodically tested because any backup that can’t be used to recover data from is completely useless and gives a false sense of security which is even worse than having no backups at all.
Another important consideration is to make sure that the second or third copy of data can’t be tampered with from the original computer. Ransomware nowadays especially looks for backups on nearby computers and local networks to make sure that vulnerable systems can’t be recovered easily so it’s essential that some of the backup copies either be kept off-line or in an air-gapped system at a different location that can’t be accessed from the original machine.
In the golden age of tape backups, it was rather easy since there were simply multiple tapes having the same data stored on them and kept in safe storage but nowadays in the age of cloud backups and giant hard drives the easy accessibility of safe online storage is superseding tapes.
Duplicity backups
Duplicity is a free cross-platform command-line software solution that can create file backups that are
- encrypted: data is protected both at transit and at rest
- incremental: change data can be simply backed up without having to make new backups
- versioned: multiple versions of the same file can be recovered
- local or remote: data can be stored at local or remote locations automatically using various services such as SFTP, FTP, SSH, RSYNC, Amazon S3, Google Cloud Storage
- signed: data integrity is ensured and can be checked automatically
- saves meta-information: all file meta-information (users, ownerships, times) is saved and can be restored
It is free software, published under the GNU GPL License. Most Linux distributions offer Duplicity packages that can simply be installed by using the built-in package manager. Using Debian Linux, it can simply be installed by running “apt-get install duplicity” from the terminal as root.
It’s important that Duplicity provides file backups, so it is usable to recover lost file (or even a complete system) after data corruption, however, it can’t be used to recover whole partitions (and especially the boot manager and partition table part of a corrupted hard drive) in case of total data loss.
Some parts of a running Linux system – e.g. database files that can often change while backing them up – can become corrupted when backed up on the file system level and it’s a good idea to use other ways to safely backup those, for example making daily / hourly database dumps into separate files and backing up those files instead. Files are copied one byte one while Duplicity is running so the whole system will not be backed up in a consistent state – this is not really a shortcoming of Duplicity but the way file backups work – it’s something to keep in mind when planning a backup strategy.
Making your first backup with Duplicity
To create a local encrypted backup of for example /home/win into a folder called “duplicity-test”:
$ duplicity full /home/win/ file://duplicity-test
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
GnuPG passphrase for decryption:
Retype passphrase for decryption to confirm:
--------------[ Backup Statistics ]--------------
StartTime 1619282164.80 (Sat Apr 24 18:36:04 2021)
EndTime 1619282164.87 (Sat Apr 24 18:36:04 2021)
ElapsedTime 0.07 (0.07 seconds)
SourceFiles 4
SourceFileSize 8649 (8.45 KB)
NewFiles 4
NewFileSize 8649 (8.45 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 4
RawDeltaSize 4553 (4.45 KB)
TotalDestinationSizeChange 2232 (2.18 KB)
Errors 0
-------------------------------------------------
Simple as that – by default backups are encrypted, as as you can see, Duplicity asked for a password to protect our backups. This can be automated by defining the “PASSPHRASE” environment variable. It’s also possible to do unencrypted backups by adding “–no-encryption” but it’s a good practice to always keep backups encrypted and also it’s a must save the backup password somewhere off-site.
Running it again (now with the PASSPHRASE set in the environment to skip asking for it), makes a second full backup, a new “backup chain” in Duplicity terminology. Each full backup chain can have many incrementals added, see below.
$ export PASSPHRASE=test
$ duplicity full /home/win/ file://duplicity-test
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Apr 24 18:41:34 2021
--------------[ Backup Statistics ]--------------
StartTime 1619282715.28 (Sat Apr 24 18:45:15 2021)
EndTime 1619282715.28 (Sat Apr 24 18:45:15 2021)
ElapsedTime 0.00 (0.00 seconds)
SourceFiles 4
SourceFileSize 8649 (8.45 KB)
NewFiles 4
NewFileSize 8649 (8.45 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 4
RawDeltaSize 4553 (4.45 KB)
TotalDestinationSizeChange 2233 (2.18 KB)
Errors 0
-------------------------------------------------
This time it didn’t ask for a password and automatically made a full backup that can be automated and ran as a scheduled task from cron, because having a non-automatic backup is like having no backup at all.
Querying a list of backups can be done by running “duplicity collection-status URL” like this:
$ duplicity collection-status file://duplicity-test
Last full backup date: Sat Apr 24 18:45:15 2021
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /home/techtipbits/.cache/duplicity/87125eb5b4cfb4c498a3d0baab384e69
Found 2 secondary backup chains.
Secondary chain 1 of 2:
-------------------------
Chain start time: Sat Apr 24 18:36:01 2021
Chain end time: Sat Apr 24 18:36:01 2021
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set: Time: Num volumes:
Full Sat Apr 24 18:36:01 2021 1
-------------------------
Secondary chain 2 of 2:
-------------------------
Chain start time: Sat Apr 24 18:41:34 2021
Chain end time: Sat Apr 24 18:41:34 2021
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set: Time: Num volumes:
Full Sat Apr 24 18:41:34 2021 1
-------------------------
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Sat Apr 24 18:45:15 2021
Chain end time: Sat Apr 24 18:45:15 2021
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set: Time: Num volumes:
Full Sat Apr 24 18:45:15 2021 1
-------------------------
No orphaned or incomplete backup sets found.
Now let’s run an incremental backup by replacing “full” with “incremental” in the command-line. This will only save the changed files since the last backup – this can be ran any number of times, saving only the difference since the last state.
$ duplicity incremental /home/win/ file://duplicity-test
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Apr 24 18:45:15 2021
--------------[ Backup Statistics ]--------------
StartTime 1619282905.95 (Sat Apr 24 18:48:25 2021)
EndTime 1619282905.95 (Sat Apr 24 18:48:25 2021)
ElapsedTime 0.00 (0.00 seconds)
SourceFiles 4
SourceFileSize 8649 (8.45 KB)
NewFiles 0
NewFileSize 0 (0 bytes)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 0
RawDeltaSize 0 (0 bytes)
TotalDestinationSizeChange 111 (111 bytes)
Errors 0
-------------------------------------------------
Querying the collection-status once more, now it shows that our primary backup chain has an incremental backup, so we have currently 3 full backup chains and 1 incremental one added to the latest full.
$ duplicity collection-status file://duplicity-test
Last full backup date: Sat Apr 24 18:45:15 2021
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /home/techtipbits/.cache/duplicity/87125eb5b4cfb4c498a3d0baab384e69
Found 2 secondary backup chains.
Secondary chain 1 of 2:
-------------------------
Chain start time: Sat Apr 24 18:36:01 2021
Chain end time: Sat Apr 24 18:36:01 2021
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set: Time: Num volumes:
Full Sat Apr 24 18:36:01 2021 1
-------------------------
Secondary chain 2 of 2:
-------------------------
Chain start time: Sat Apr 24 18:41:34 2021
Chain end time: Sat Apr 24 18:41:34 2021
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set: Time: Num volumes:
Full Sat Apr 24 18:41:34 2021 1
-------------------------
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Sat Apr 24 18:45:15 2021
Chain end time: Sat Apr 24 18:48:25 2021
Number of contained backup sets: 2
Total number of contained volumes: 2
Type of backup set: Time: Num volumes:
Full Sat Apr 24 18:45:15 2021 1
Incremental Sat Apr 24 18:48:25 2021 1
-------------------------
No orphaned or incomplete backup sets found.
Restore and verify files from backups
To restore files from a backup chain (the last one by default), simply use “duplicity restore” setting the file to recover and the location (folder) to recover to:
$ duplicity restore file://duplicity-test restore-test
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Apr 24 18:45:15 2021
Naturally, it’s very rare that one would want to restore the whole archive, so let’s only restore one file from it: /home/win/testfile. Note how –file-to-restore is relative to the backup’s root.
It’s further possible to use the “–time” parameter to set the age of the file to be restored – Duplicity will find the best match for the given point of time and restore the file to that state.
$ duplicity restore --file-to-restore testfile file://duplicity-test testfile
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Apr 24 18:45:15 2021
To verify our backups, integrity we should use “duplicity verify [url] [location”:
$ duplicity verify file://duplicity-test /home/win
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Apr 24 18:45:15 2021
Verify complete: 4 files compared, 0 differences found.
Remote backups and automatic incrementals
For now we’ve used duplicity in local mode, saving backups on the same hard drive (in a different folder), in our example under ~/duplicity-test. The obvious shortcoming is that these don’t satisfy the 3-2-1 rule so we should do better by doing this remotely. Duplicity supports plenty of remote file systems that it can use to back-up, to backup up complete full back-up every week and incrementals daily, we should run:
$ export FTP_PASSWORD="testftppassword"
$ export PASSPHRASE="backuppassword"
$ duplicity --full-if-older-than 1W /home/win/ sftp://username@sftp.user.host/remotefolder
The FTP_PASSWORD environment variable sets the SFTP password to be used when logging in to the remote system and “–full-if-older-than 1W” option changes the default backup mode from FULL to INCREMENTAL unless the last full backup is more than a week old. If, for example, this command is run once a day, it will make daily incrementals for 6 days then 1 full backup and repeat indefinitely.
It should be noted that this backup solution is still vulnerable to the ransomware / intentional data corruption issue because anyone being able to read this script can log in to the remote location using SFTP and destroy backups. To combat this, it’s best to either set up automatic snapshot versioning on the remote file system or use a provider that allows such a feature, i.e. taking an automatic snapshot at 8pm and running daily backups at 9pm. This way previous versions of the complete backup system are always available in the event of accidental or intentional backup corruption.