Decremental Backups with DAR by smartmic

Share This Article

Sed ut perspiciatis unde.

Introduction

You will find here a collection of example of use cases
for several features of dar suite command-line tools.

Dar and remote backup

This topic has for objective to show the different methods available
to perform a remote backup (a backup of a system using a remote storage).
It does not describe the remote storage itself, nor the way to access it, but the
common ways to do so. For a precise description/recipies on how to use
dar with ssh, netcat, ftp or sftp, see the topics following this one.

Between these two hosts, we could also use NFS and we could
use dar as usually, eventually adding an IPSEC VPN if the
underlying network would not be secur (backup over Internet, …), there is
nothing very complicated and this is a valid solution.

We could also split the backup in very small slices (using dar’s -s and
eventually -S option) slices that would be moved to/from the storage before the
backup process to continue creating/reading the next one. We could even
make use of one or more of the dar’s -E -F and -~ options to automate
the process and get a pretty viable backup process.

But what if for any reasons these previous methods were not acceptable for
our use context?

As a last resort, we can leverage the fact that dar can use its standard input
and output to work, and pipe these to any arbitrary command giving us the
greatest freedom available.
In the following we will find list two different ways to do so:

single pipe
dual pipes

Single pipe

Full Backup

dar can output its archive to its standard output instead of a given
file. To activate it, use “-” as basename. Here is an example:

dar -c - -R / -z | some_program

dar -c - -R / -z > named_pipe_or_file

Note, that file
splitting is not available as it has not much meaning when writing to a
pipe. At the other end of the pipe (on the remote
host), the data can be redirected to a file, with proper filename
(something that matches “*.1.dar”).

some_other_program > backup_name.1.dar

It is also possible to redirect the output to dar_xform
which can in turn, on the
remote host, split the data flow into several slices, pausing between them
if necessary, exactly as dar is able to do:

some_other_program | dar_xform -s 100M - backup_name

this will create backup_name.1.dar,
backup_name.2.dar and so on. The resulting archive is totally
compatible with those directly generated by dar.

some_program and some_other_program can be
anything you want.

Restoration

For restoration, the process implies dar to read an archive from a pipe,
which is possible adding the --sequential-read option. This
has however a drawback compared to the normal way dar behaves as it cannot
anymore seek to where is locarted one’s file data but has to sequentially
read the whole backup (same way tar behaves), the only consequence
is a longer processing time espetially when restoring only a few files.

On the storage host, we would use:

dar_xform backup_name - | some_other_program

# or if archive is composed of a single slice some_other_program < backup_name.1.dar

While on the host to restore we would use:

some_program | dar -x - --sequential-read ...other options...

Differential/incremental Backup

Here with a single pipe, the only possible way is to rely on the operation of
catalogue isolation. This operation can be performed on the storage host
and the resulting isolated catalogue can the be transferted through a pipe
back to the host to backup. But there is a better way: on-fly isolation.

dar -c - -R / -z -@ isolated_full_catalogue | some_program

This will produce a small file named isolated_full_catalogue.1.dar
on the local host (the host to backup), something we can then use to
create a differential/incremental backup:

dar -c - -R / -z -@ isolated_diff_catalgue -A isolated_full_catalogue | some_program

We can then remove the isolated_full_catalogue.1.dar and
keep the new isolated_diff_catalogue to proceed further for
incremental backups. For differential backup, we would keep
isolated_full_catalogue.1.dar and would use the -@ option
to create an on-fly isolated catalogue only when creating the full backup.

The restoration process here is not different from what we saw above
for the full backup. We will restore the full backup, then the differential
and incremental, following their order of creation.

Dual pipes

To overcome the limited performance met when reading an archive using
a single pipe, we can use a pair of pipes instead and rely on
dar_slave on the remote storage host.

If we specify "-" as the backup basename for a reading operation
(-l, -t, -d, -x, or to -A when used with -C or -c),
dar and dar_slave will use their standard input
and output to communicate. The input of the first is expect to
receive the output of the second and vice versa.

We could test this with a pair of named pipes todar
and toslave and use shell redirection on dar and dar_slave
to make the glue. But this will not work due to the shell behavior:
dar and dar_slave would get blocked upon opening of the first named pipe,
waiting for the peer to open it also, even before they have started
(dead lock at shell level).

To overcome this issue met with named pipes, there is -i and -o options
that help: they receive a filename as argument, which may be a named pipe.
The argument provided to -i is used instead of stdin and the one
provided to -o is used instead of stdout. Note that -i and -o options are only
available if "-" is used as basename. Let's take an example:

Let's assume we want to restore an archive from the remote backup server. Thus
there we have to run dar_slave this way:

mkfifo /tmp/todar /tmp/toslave dar_slave -o /tmp/todar -i /tmp/toslave backup_name some_program_remote /tmp/toslave

we assume some_program_remote to read the data /tmp/todar
and making it available to the host we want to restore for dar to be able
to read it, while some_other_program_remote receive the output from dar
and write it to /tmp/toslave.

On the local host you have to run dar this way:

mkfifo /tmp/todar /tmp/toslave dar -x - -i /tmp/todar -o /tmp/toslave -v ... some_program_local > /tmp/todar some_other_program_local < /tmp/toslave

having here some_program_local communicating with
some_program_remote and writes the data received from dar_slave
to the /tmp/todar named pipe. While in the other direction
dar's output is read by some_other_program_local from
/tmp/toslave then sent it (by a way that is out of the scope
of this document) to some_other_program_remote that in turn
makes it available to dar_slave as seen above.

This applies also to differential backups when it comes to read the archive of
reference by mean of -A option. In the previous single pipe context, we used
an isolated catalogue. We can still do the same here, but can also leverage this
feature espetially when it comes to binary delta that imply reading the delta
signature in addition to the metadata, something not possible with
--sequential-read mode: We then come to this following architecture:

LOCAL HOST REMOTE HOST +-----------------+ +-----------------------------+ | filesystem | | backup of reference | | | | | | | | | | | | | | V | | V | | +-----+ | backup of reference | +-----------+ | | | DAR |--<-]=========================[-<--| DAR_SLAVE | | | | |-->-]=========================[->--| | | | +-----+ | orders to dar_slave | +-----------+ | | | | | +-----------+ | | +--->---]=========================[->--| DAR_XFORM |--> backup| | | saved data | +-----------+ to slices| +-----------------+ +-----------------------------+

with dar on localhost using the following syntax, reading from a pair
of fifo the reference archive (-A option) and producing the differential backup
to its standard output:

mkfifo /tmp/toslave /tmp/todar some_program_local > /tmp/todar some_other_program_local < /tmp/toslave dar -c - -A - -i /tmp/todar -o /tmp/toslave [...other options...] | some_third_program_local

While dar_slave is run this way on the remote host:

mkfifo /tmp/toslave /tmp/todar some_program_remote < /tmp/todar some_other_program_remote & gt; /tmp/toslave dar_slave -i /tmp/toslave -o /tmp/todar ref_backup

last dar_xform receives the differential backup and here
splits it into 1 giga slices adding a sha1 hash to each:

some_third_program_remote | dar_xform -s 1G -3 sha1 - diff_backup

dar and netcat

the netcat (nc) program is a simple but insecure (no authentication,
no data ciphering) approach to make link between dar and dar_slave or dar and dar_xform
as presented in the previous topic.

The context in which will take place the following examples is the one of a
"local" host named "flower" has to be backup or restored form/to a
remote host called "honey" (OK, the name of the machines are silly...)

Creating a full backup

on honey:

nc -l -p 5000 > backup.1.dar

then on flower:

dar -c - -R / -z | nc -w 3 honey 5000

but this will produce only one slice, instead you could use the
following to have several slices on honey:

nc -l -p 5000 | dar_xform -s 10M -S 5M -p - backup

by the way note that dar_xform
can also launch a user script between slices exactly the same way
as dar does, thanks to the -E and -F options.

Testing the archive

Testing the archive can be done
on honey, but diffing (comparison) implies reading the filesystem,
of flower this it must be run there. Both operation as well as archive
listing an other read operations can leverage what follows:

on honey:

nc -l -p 5000 | dar_slave backup | nc -l -p 5001

then on flower:

nc -w 3 honey 5001 | dar -t - | nc -w 3 honey 5000

Note that here too dar_slave can
run a script between slices, if for example you need to load slices
from a tape robot, this can be done automatically, or if you just want to
mount/unmount a removable media eject or load it and ask the user to
change it or whatever else is your need.

Comparing with original filesystem

this is very similar to the previous example:

on honey:

nc -l -p 5000 | dar_slave backup | nc -l -p 5001

while on flower:

nc -w 3 honey 5001 | dar -d - -R / | nc -w 3 honey 5000

Making a differential backup

Here the problem
is that dar needs two pipes to send orders and read data coming from
dar_slave, and a third pipe to write out the new archive. This cannot
be realized only with stdin and stdout as previously. Thus we will need
a named pipe (created by the mkfifo command).

On honey in two different terminals:

nc -l -p 5000 | dar_slave backup | nc -l -p 5001 nc -l -p 5002 | dar_xform -s 10M -p - diff_backup

Then on flower:

mkfifo toslave nc -w 3 honey 5000 < toslave & nc -w 3 honey 5001 | dar -A - -o toslave -c - -R / -z | nc -w 3 honey 5002

with netcat the
data goes in clear over the network. You could use ssh instead if you
want to have encryption over the network. The principle are the same
let's see this now:

Dar and ssh

Creating full backup

we assume you have a sshd daemon on flower. We can
then run the following on honey:

ssh flower dar -c - -R / -z > backup.1.dar

Or still on honey:

ssh flower dar -c - -R / -z | dar_xform -s 10M -S 5M -p - backup

Testing the archive

On honey:

dar -t backup

Comparing with original filesystem

On flower:

mkfifo todar toslave ssh honey dar_slave backup > todar < toslave & dar -d - -R / -i todar -o toslave

Important: Depending on the
shell you use, it may be necessary to invert the order in which "> todar" and
"< toslave" are given on command line. The problem is that the shell hangs trying to open the pipes. Thanks to "/PeO" for his feedback.

Or on honey:

mkfifo todar toslave ssh flower dar -d - -R / > toslave < todar & dar_slave -i toslave -o todar backup

Making a differential backup

On flower:

mkfifo todar toslave ssh honey dar_slave backup > todar < toslave &

and on honey:

ssh flower dar -c - -A - -i todar -o toslave > diff_linux.1.dar

ssh flower dar -c - -A - -i todar -o toslave | dar_xform -s 10M -S 5M -p - diff_linux

Integrated ssh support

Since release 2.6.0, you can use an URL-like archive basename. Assuming
you have slices test.1.dar, test.2.dar ... available in the directory
Archive of an FTP of SFTP (ssh) server you could read, extract, list, test, ... that
archive using the following syntax:

dar -t ftp://login@ftp.server.some.where/Archive/example1 ...other options dar -t sftp//login:pass@sftp.server.some/where/Archive/example2 ...other options dar -t sftp//sftp.server.some/where/Archive/example2 -afile-auth ...other options

Same thing with -l, -x, -A and -@ options. Note that you still need to
provide the archive base name not a slice name as usually done with dar.
This option is also compatible with slicing and slice hashing, which will be
generated on remote server beside the slices:

dar -c sftp://login:password@secured.server.some.where/Archive/day2/incremental -A ftp://login@ftp.server.some.where/Archive/CAT_test --hash sha512 -@ sftp://login2:password2@secured.server.some.where/Archive/day2/CAT_incremental

By default if no password is given, dar asks the user interactively. If
no login is used, dar assumes the login to be "anonymous". When you add
the -afile-auth option, in absence of password on command-line, dar
checks for a password in the file ~/.netrc for both FTP and SFTP
protocols to avoid exposing password on command-line while still have
non interactive backup. See man netrc for this common file's syntax.
Using -afile-auth also activate public key authentication if
all is set for that (~/.ssh/id_rsa ...)

Comparing the different way to perform remote backup

Since release 2.6.0 dar can directly use ftp or sftp to operate remotely.
This new feature has sometime some advantage over the
methods descibed above with ssh sometimes it has not,
the objective here is to clarify the pro and cons of each method.

Operation	dar + dar_slave/dar_xform through ssh	dar alone	embedded sftp/ftp in dar
Underlying mode of operation	direct access mode	sequential read mode	direct access mode
Backup	best solution if you want to keep a local copy of the backup or if you want to push the resulting archive to several destinations if sftp not available, only ssh is on-fly hash file is written locally (where is dar_xform ran) and is thus computed by dar_xform which cannot see network transmission errors	efficient but does not support slicing, for the rest this is an as good solution as with dar_xform	best solution if you do not have space on local disks to store the resulting backup requires on-fly isolation to local disk if you want to feed a local dar_manager database with the new archive if ssh not available, only sftp is on-fly hash file is written to the remote directory beside the slice but calculated locally, which can be used to detect network transmission error
Testing Diffing Listing	workaround if you hit the sftp known_hosts limitation sftp not available only ssh relies on dar <-> dar_slave exchanges which protocol is not designed for high latency exchanges and may give slow network performances in that situation	very slow as it requires reading the whole archive	maybe a simpler command line to execute best solution if filtering a few files from a large archive dar will fetch over the network only the necessary data. ssh not available only sftp
Restoration	workaround if you hit the sftp known_hosts limitation sftp not available only ssh	very slow as it requires reading the whole archive	efficient and simple ssh not available only sftp
Merging (should be done locally rather than over network if possible!!!)	complicated with the many pipes to setup	not supported!	not adapted if you need to feed the merging result to a local dar_manager database (on-fly isolation not available with merging with dar)
Isolation	workaround if you hit the sftp known_hosts limitation sftp not available only ssh	very slow as it requires reading the whole archive	efficient and simple, transfers the less possible data over the network ssh not available only sftp
Repairing (should be done locally rather than over network if possible!!!)	not supported!	propably the best way to repaire remotely for efficiency, as this operation uses sequential reading	ssh not available only sftp

Bytes, bits, kilo, mega etc.

Sorry by advance for the following school-like introduction to size prefix available
with dar, but it seems that the metric system is (still) not taught in all countries
leading some to ugly/erroneous writings... so let me remind what
I've been told at school...

You probably know a bit the metric system, where a dimension is expressed by a base unit
(the meter for distance, the liter for volume, the Joule for energy,
the Volt for electrical potential, the bar for pressure, the Watt for
power, the second for time, etc.), and all declined using prefixes:

prefix (symbol) = ratio ================ deci (d) = 0.1 centi (c) = 0.01 milli (m) = 0.001 micro (μ) = 0.000,001 nano (n) = 0.000,000,001 pico (p) = 0.000,000,000,001 femto (f) = 0.000,000,000,000,001 atto (a) = 0.000,000,000,000,000,001 zepto (z) = 0.000,000,000,000,000,000,001 yocto (y) = 0.000,000,000,000,000,000,000,001 deca (da) = 10 hecto (h) = 100 kilo (k) = 1,000 (yes, this is a lower case letter, not an upper case! Uppercase letter 'K' is the Kelvin: temperature unit) mega (M) = 1,000,000 giga (G) = 1,000,000,000 tera (T) = 1,000,000,000,000 peta (P) = 1,000,000,000,000,000 exa (E) = 1,000,000,000,000,000,000 zetta (Z) = 1,000,000,000,000,000,000,000 yotta (Y) = 1,000,000,000,000,000,000,000,000

Not all prefix have been introduced at the same time, the oldest
(c, d, m, da, h, k) exist since 1795, this explain the fact
they are all lowercase and are not all power of 1000. The other
are much more recent (1960, 1975, 1991 according to
Wikipedia)

some other rules I had been told at school are:

the unit follows the number
a space has to be used between the number of the unit

Thus instead of writing "4K hour" the correct writing is "4 kh" for
four kilohour.

This way two milliseconds (noted "2 ms") are 0.002 second,
and 5 kilometers (noted "5 km") are 5,000 meters. All
was fine and nice up to the recent time when computer science appeared:
In that discipline, the need to measure the size of information storage
raised. The smallest size, is the bit
(contraction of binary digit), binary because
it has two possible states: "0" and "1". Grouping bits by 8 computer
scientists called it a byte
or also an octet.

A byte having 256 different states (2 power 8) and when the ASCII (American
Standard Code for Information Interchange) code arrived to assign a letter
or more generally characters to the different values of a byte, ('A' is
assigned to 65, space to 32, etc) and as as most text is composed of a
set of character, they started to count information size in byte unit.
Time after time, following technology evolution, memory size approached
1000 bytes.

But as memory is accessed through a bus which is a fixed number
of cables (or integrated circuits), on which only two possible
voltages are authorized (to mean 0 or 1), the total amount of byte that a
bus can address is always a power of 2 here too. With a two cable bus,
you can have 4 values (00, 01, 10 and 11, where a digit is the state
of a cable) so you can address 4 bytes.

Giving a value to each cable defines an address to read or write
in the

Decremental Backups with DAR by smartmic

Decremental Backups with DAR by smartmic

Share This Article

Newsletter

Introduction

Dar and remote backup

Single pipe

Full Backup

Restoration

Differential/incremental Backup

Dual pipes

dar and netcat

Creating a full backup

Testing the archive

Comparing with original filesystem

Making a differential backup

Dar and ssh

Creating full backup

Testing the archive

Comparing with original filesystem

Making a differential backup

Integrated ssh support

Comparing the different way to perform remote backup

Bytes, bits, kilo, mega etc.

HackTech

Leave a comment Cancel reply

Editor's Choice

Decremental Backups with DAR by smartmic

Decremental Backups with DAR by smartmic

Share This Article

Newsletter

Introduction

Single pipe

Full Backup

Restoration

Differential/incremental Backup

Dual pipes

Creating a full backup

Testing the archive

Comparing with original filesystem

Making a differential backup

Creating full backup

Testing the archive

Comparing with original filesystem

Making a differential backup

Integrated ssh support

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter