Introduction
You will find here a collection of example of use cases
for several features of dar suite command-line tools.
Dar and remote backup
This topic has for objective to show the different methods available
to perform a remote backup (a backup of a system using a remote storage).
It does not describe the remote storage itself, nor the way to access it, but the
common ways to do so. For a precise description/recipies on how to use
dar with ssh, netcat, ftp or sftp, see the topics following this one.
Between these two hosts, we could also use NFS and we could
use dar as usually, eventually adding an IPSEC VPN if the
underlying network would not be secur (backup over Internet, …), there is
nothing very complicated and this is a valid solution.
We could also split the backup in very small slices (using dar’s -s and
eventually -S option) slices that would be moved to/from the storage before the
backup process to continue creating/reading the next one. We could even
make use of one or more of the dar’s -E -F and -~ options to automate
the process and get a pretty viable backup process.
But what if for any reasons these previous methods were not acceptable for
our use context?
As a last resort, we can leverage the fact that dar can use its standard input
and output to work, and pipe these to any arbitrary command giving us the
greatest freedom available.
In the following we will find list two different ways to do so:
- single pipe
- dual pipes
Single pipe
Full Backup
dar can output its archive to its standard output instead of a given
file. To activate it, use “-” as basename. Here is an example:
dar -c - -R / -z | some_program
or
dar -c - -R / -z > named_pipe_or_file
Note, that file
splitting is not available as it has not much meaning when writing to a
pipe. At the other end of the pipe (on the remote
host), the data can be redirected to a file, with proper filename
(something that matches “*.1.dar”).
some_other_program > backup_name.1.dar
It is also possible to redirect the output to dar_xform
which can in turn, on the
remote host, split the data flow into several slices, pausing between them
if necessary, exactly as dar is able to do:
some_other_program | dar_xform -s 100M - backup_name
this will create backup_name.1.dar
,
backup_name.2.dar
and so on. The resulting archive is totally
compatible with those directly generated by dar.
some_program
and some_other_program
can be
anything you want.
Restoration
For restoration, the process implies dar to read an archive from a pipe,
which is possible adding the --sequential-read
option. This
has however a drawback compared to the normal way dar behaves as it cannot
anymore seek to where is locarted one’s file data but has to sequentially
read the whole backup (same way tar behaves), the only consequence
is a longer processing time espetially when restoring only a few files.
On the storage host, we would use:
dar_xform backup_name - | some_other_program
# or if archive is composed of a single slice
some_other_program < backup_name.1.dar
While on the host to restore we would use:
some_program | dar -x - --sequential-read ...other options...
Differential/incremental Backup
Here with a single pipe, the only possible way is to rely on the operation of
catalogue isolation. This operation can be performed on the storage host
and the resulting isolated catalogue can the be transferted through a pipe
back to the host to backup. But there is a better way: on-fly isolation.
dar -c - -R / -z -@ isolated_full_catalogue | some_program
This will produce a small file named isolated_full_catalogue.1.dar
on the local host (the host to backup), something we can then use to
create a differential/incremental backup:
dar -c - -R / -z -@ isolated_diff_catalgue -A isolated_full_catalogue | some_program
We can then remove the isolated_full_catalogue.1.dar
and
keep the new isolated_diff_catalogue
to proceed further for
incremental backups. For differential backup, we would keep
isolated_full_catalogue.1.dar
and would use the -@ option
to create an on-fly isolated catalogue only when creating the full backup.
The restoration process here is not different from what we saw above
for the full backup. We will restore the full backup, then the differential
and incremental, following their order of creation.
Dual pipes
To overcome the limited performance met when reading an archive using
a single pipe, we can use a pair of pipes instead and rely on
dar_slave
on the remote storage host.
If we specify "-" as the backup basename for a reading operation
(-l, -t, -d, -x, or to -A when used with -C or -c),
dar and dar_slave will use their standard input
and output to communicate. The input of the first is expect to
receive the output of the second and vice versa.
We could test this with a pair of named pipes todar
and toslave
and use shell redirection on dar and dar_slave
to make the glue. But this will not work due to the shell behavior:
dar and dar_slave would get blocked upon opening of the first named pipe,
waiting for the peer to open it also, even before they have started
(dead lock at shell level).
To overcome this issue met with named pipes, there is -i and -o options
that help: they receive a filename as argument, which may be a named pipe.
The argument provided to -i is used instead of stdin and the one
provided to -o is used instead of stdout. Note that -i and -o options are only
available if "-" is used as basename. Let's take an example:
Let's assume we want to restore an archive from the remote backup server. Thus
there we have to run dar_slave this way:
mkfifo /tmp/todar /tmp/toslave
dar_slave
some_program_remote /tmp/toslave
we assume some_program_remote
to read the data /tmp/todar
and making it available to the host we want to restore for dar to be able
to read it, while some_other_program_remote
receive the output from dar
and write it to /tmp/toslave
.
On the local host you have to run dar this way:
mkfifo /tmp/todar /tmp/toslave
dar -x -
some_program_local > /tmp/todar
some_other_program_local < /tmp/toslave
having here some_program_local
communicating with
some_program_remote
and writes the data received from dar_slave
to the /tmp/todar
named pipe. While in the other direction
dar's output is read by some_other_program_local
from
/tmp/toslave
then sent it (by a way that is out of the scope
of this document) to some_other_program_remote
that in turn
makes it available to dar_slave as seen above.
This applies also to differential backups when it comes to read the archive of
reference by mean of -A option. In the previous single pipe context, we used
an isolated catalogue. We can still do the same here, but can also leverage this
feature espetially when it comes to binary delta that imply reading the delta
signature in addition to the metadata, something not possible with
--sequential-read
mode: We then come to this following architecture:
LOCAL HOST REMOTE HOST
+-----------------+ +-----------------------------+
| filesystem | | backup of reference |
| | | | | |
| | | | | |
| V | | V |
| +-----+ | backup of reference | +-----------+ |
| | DAR |--<-]=========================[-<--| DAR_SLAVE | |
| | |-->-]=========================[->--| | |
| +-----+ | orders to dar_slave | +-----------+ |
| | | | +-----------+ |
| +--->---]=========================[->--| DAR_XFORM |--> backup|
| | saved data | +-----------+ to slices|
+-----------------+ +-----------------------------+
with dar on localhost using the following syntax, reading from a pair
of fifo the reference archive (-A option) and producing the differential backup
to its standard output:
mkfifo
some_program_local > /tmp/todar
some_other_program_local < /tmp/toslave
dar
While dar_slave is run this way on the remote host:
mkfifo
some_program_remote < /tmp/todar
some_other_program_remote & gt; /tmp/toslave
dar_slave
last dar_xform receives the differential backup and here
splits it into 1 giga slices adding a sha1 hash to each:
some_third_program_remote | dar_xform -s 1G -3 sha1
dar and netcat
the netcat (nc) program is a simple but insecure (no authentication,
no data ciphering) approach to make link between dar and dar_slave or dar and dar_xform
as presented in the previous topic.
The context in which will take place the following examples is the one of a
"local" host named "flower" has to be backup or restored form/to a
remote host called "honey" (OK, the name of the machines are silly...)
Creating a full backup
on honey:
nc -l -p 5000 > backup.1.dar
then on flower:
dar -c - -R / -z | nc -w 3 honey 5000
but this will produce only one slice, instead you could use the
following to have several slices on honey:
nc -l -p 5000 | dar_xform -s 10M -S 5M -p - backup
by the way note that dar_xform
can also launch a user script between slices exactly the same way
as dar does, thanks to the -E and -F options.
Testing the archive
Testing the archive can be done
on honey, but diffing (comparison) implies reading the filesystem,
of flower this it must be run there. Both operation as well as archive
listing an other read operations can leverage what follows:
on honey:
nc -l -p 5000 | dar_slave backup | nc -l -p 5001
then on flower:
nc -w 3 honey 5001 | dar -t - | nc -w 3 honey 5000
Note that here too dar_slave can
run a script between slices, if for example you need to load slices
from a tape robot, this can be done automatically, or if you just want to
mount/unmount a removable media eject or load it and ask the user to
change it or whatever else is your need.
Comparing with original filesystem
this is very similar to the previous example:
on honey:
nc -l -p 5000 | dar_slave backup | nc -l -p 5001
while on flower:
nc -w 3 honey 5001 | dar -d - -R / | nc -w 3 honey 5000
Making a differential backup
Here the problem
is that dar needs two pipes to send orders and read data coming from
dar_slave, and a third pipe to write out the new archive. This cannot
be realized only with stdin and stdout as previously. Thus we will need
a named pipe (created by the mkfifo command).
On honey in two different terminals:
Then on flower:
mkfifo toslave
with netcat the
data goes in clear over the network. You could use ssh instead if you
want to have encryption over the network. The principle are the same
let's see this now:
Dar and ssh
Creating full backup
we assume you have a sshd daemon on flower. We can
then run the following on honey:
ssh flower dar -c - -R / -z > backup.1.dar
Or still on honey:
ssh flower dar -c - -R / -z | dar_xform -s 10M -S 5M -p - backup
Testing the archive
On honey:
dar -t backup
Comparing with original filesystem
On flower:
mkfifo todar toslave
ssh honey dar_slave backup > todar < toslave &
dar -d - -R / -i todar -o toslave
Important: Depending on the
shell you use, it may be necessary to invert the order in which "> todar" and
"< toslave" are given on command line. The problem is that the shell
hangs trying to open the pipes. Thanks to "/PeO" for his feedback.
Or on honey:
mkfifo todar toslave
ssh flower dar -d - -R / > toslave < todar &
dar_slave -i toslave -o todar backup
Making a differential backup
On flower:
mkfifo todar toslave
ssh honey dar_slave backup > todar < toslave &
and on honey:
ssh flower dar -c - -A - -i todar -o toslave > diff_linux.1.dar
Or
ssh flower dar -c - -A - -i todar -o toslave | dar_xform -s 10M -S 5M -p - diff_linux
Integrated ssh support
Since release 2.6.0, you can use an URL-like archive basename. Assuming
you have slices test.1.dar, test.2.dar ... available in the directory
Archive of an FTP of SFTP (ssh) server you could read, extract, list, test, ... that
archive using the following syntax:
dar -t ftp://login@ftp.server.some.where/Archive/example1 ...other options
dar -t sftp//login:pass@sftp.server.some/where/Archive/example2 ...other options
dar -t sftp//sftp.server.some/where/Archive/example2 -afile-auth ...other options
Same thing with -l, -x, -A and -@ options. Note that you still need to
provide the archive base name not a slice name as usually done with dar.
This option is also compatible with slicing and slice hashing, which will be
generated on remote server beside the slices:
dar -c sftp://login:password@secured.server.some.where/Archive/day2/incremental
-A ftp://login@ftp.server.some.where/Archive/CAT_test --hash sha512
-@ sftp://login2:password2@secured.server.some.where/Archive/day2/CAT_incremental
By default if no password is given, dar asks the user interactively. If
no login is used, dar assumes the login to be "anonymous". When you add
the -afile-auth
option, in absence of password on command-line, dar
checks for a password in the file ~/.netrc for both FTP and SFTP
protocols to avoid exposing password on command-line while still have
non interactive backup. See man netrc for this common file's syntax.
Using -afile-auth
also activate public key authentication if
all is set for that (~/.ssh/id_rsa ...)
Comparing the different way to perform remote backup
Since release 2.6.0 dar can directly use ftp or sftp to operate remotely.
This new feature has sometime some advantage over the
methods descibed above with ssh sometimes it has not,
the objective here is to clarify the pro and cons of each method.
Operation | dar + dar_slave/dar_xform through ssh | dar alone | embedded sftp/ftp in dar |
---|---|---|---|
Underlying mode of operation | direct access mode | sequential read mode | direct access mode |
Backup |
|
|
|
Testing Diffing Listing |
|
|
|
Restoration |
|
|
|
Merging (should be done locally rather than over network if possible!!!) |
|
|
|
Isolation |
|
|
|
Repairing (should be done locally rather than over network if possible!!!) |
|
|
|
Bytes, bits, kilo, mega etc.
Sorry by advance for the following school-like introduction to size prefix available
with dar, but it seems that the metric system is (still) not taught in all countries
leading some to ugly/erroneous writings... so let me remind what
I've been told at school...
You probably know a bit the metric system, where a dimension is expressed by a base unit
(the meter for distance, the liter for volume, the Joule for energy,
the Volt for electrical potential, the bar for pressure, the Watt for
power, the second for time, etc.), and all declined using prefixes:
prefix (symbol) = ratio
================
deci (d) = 0.1
centi (c) = 0.01
milli (m) = 0.001
micro (μ) = 0.000,001
nano (n) = 0.000,000,001
pico (p) = 0.000,000,000,001
femto (f) = 0.000,000,000,000,001
atto (a) = 0.000,000,000,000,000,001
zepto (z) = 0.000,000,000,000,000,000,001
yocto (y) = 0.000,000,000,000,000,000,000,001
deca (da) = 10
hecto (h) = 100
kilo (k) = 1,000 (yes, this is a lower case letter, not an
upper case! Uppercase letter 'K' is the Kelvin: temperature unit)
mega (M) = 1,000,000
giga (G) = 1,000,000,000
tera (T) = 1,000,000,000,000
peta (P) = 1,000,000,000,000,000
exa (E) = 1,000,000,000,000,000,000
zetta (Z) = 1,000,000,000,000,000,000,000
yotta (Y) = 1,000,000,000,000,000,000,000,000
Not all prefix have been introduced at the same time, the oldest
(c, d, m, da, h, k) exist since 1795, this explain the fact
they are all lowercase and are not all power of 1000. The other
are much more recent (1960, 1975, 1991 according to
Wikipedia)
some other rules I had been told at school are:
- the unit follows the number
- a space has to be used between the number of the unit
Thus instead of writing "4K hour" the correct writing is "4 kh" for
four kilohour.
This way two milliseconds (noted "2 ms") are 0.002 second,
and 5 kilometers (noted "5 km") are 5,000 meters. All
was fine and nice up to the recent time when computer science appeared:
In that discipline, the need to measure the size of information storage
raised. The smallest size, is the bit
(contraction of binary digit), binary because
it has two possible states: "0" and "1". Grouping bits by 8 computer
scientists called it a byte
or also an octet.
A byte having 256 different states (2 power 8) and when the ASCII (American
Standard Code for Information Interchange) code arrived to assign a letter
or more generally characters to the different values of a byte, ('A' is
assigned to 65, space to 32, etc) and as as most text is composed of a
set of character, they started to count information size in byte unit.
Time after time, following technology evolution, memory size approached
1000 bytes.
But as memory is accessed through a bus which is a fixed number
of cables (or integrated circuits), on which only two possible
voltages are authorized (to mean 0 or 1), the total amount of byte that a
bus can address is always a power of 2 here too. With a two cable bus,
you can have 4 values (00, 01, 10 and 11, where a digit is the state
of a cable) so you can address 4 bytes.
Giving a value to each cable defines an address to read or write
in the