Welcome! » Log In » Create A New Profile

Parallelism and deduplication

Posted by Anonymous 
Parallelism and deduplication
May 04, 2016 05:32PM
On Wed, May 4, 2016 at 4:20 PM, Christopher Barry <christopher.r.barry < at > gmail.com ([email]christopher.r.barry < at > gmail.com[/email])> wrote:

[quote]On Wed, 4 May 2016 10:30:58 -0700Scott Hess <scott < at > doubleu.com ([email]scott < at > doubleu.com[/email])> wrote:>Personally, I wouldn&#39;t mind having a .timestamp file in the snapshot,
[quote]which would contain a printed form of the timestamp rsnapshot used for
the touch, and perhaps with the same timestamp as the directory using
touch -r.  That file would never need to be renamed after the snapshot
[/quote]

in cron, do a:

date > /etc/rsnapshot/.timestamp && rsnapshot <increment>

where /etc/rsnapshot is one of the backed up directories in every
backup.

and bada-bing, Bob&#39;s your Uncle. :)[/quote]

Derp.

Like:

/usr/bin/rsnapshot sync && date -R -r .sync >.sync/.timestamp && touch -r .sync .sync/.timestamp

Then next rotation will just carry it along, forever.  Create .sync/.timestamp ahead of time so that the first creation doesn&#39;t screw up the timestamp on .sync.

Like I said, two-line change :-).

Or cmd_postexec, which runs after backup for the lowest interval, and the backup finishes by updating the directory timestamp.

-scott
Parallelism and deduplication
May 04, 2016 08:27PM
On Wed, May 4, 2016 at 8:29 PM, Scott Hess <scott < at > doubleu.com> wrote:
[quote]On Wed, May 4, 2016 at 4:20 PM, Christopher Barry
<christopher.r.barry < at > gmail.com> wrote:
[quote]
On Wed, 4 May 2016 10:30:58 -0700Scott Hess <scott < at > doubleu.com> wrote:
[quote]Personally, I wouldn't mind having a .timestamp file in the snapshot,
which would contain a printed form of the timestamp rsnapshot used for
the touch, and perhaps with the same timestamp as the directory using
touch -r. That file would never need to be renamed after the snapshot
[/quote]
in cron, do a:

date > /etc/rsnapshot/.timestamp && rsnapshot <increment>

where /etc/rsnapshot is one of the backed up directories in every
backup.

and bada-bing, Bob's your Uncle. :)
[/quote][/quote]
Which does you *zero* good if you've got a tar, an rsync, or simple
NFS mount cp going on from the "daily.0" snapshot and it rotates out
from under you in the midst of the replication, This is an old
problem. Stuffing a legibble timestamp inside the numbered snapshot
does not help much when the path to the version of the file changes
out from under you.

[quote]

Derp.

Like:
[/quote]

[quote]/usr/bin/rsnapshot sync && date -R -r .sync >.sync/.timestamp && touch -r
.sync .sync/.timestamp

Then next rotation will just carry it along, forever. Create
.sync/.timestamp ahead of time so that the first creation doesn't screw up
the timestamp on .sync.

Like I said, two-line change :-).
[/quote]
Which does not solve the underlying problem I referred to.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 04, 2016 08:35PM
Nico, the horse is dead.
Seriously.

[quote]On May 4, 2016, at 21:25, Nico Kadel-Garcia <nkadel < at > gmail.com> wrote:

[quote]On Wed, May 4, 2016 at 8:29 PM, Scott Hess <scott < at > doubleu.com> wrote:
On Wed, May 4, 2016 at 4:20 PM, Christopher Barry
<christopher.r.barry < at > gmail.com> wrote:
[quote]
[quote]On Wed, 4 May 2016 10:30:58 -0700Scott Hess <scott < at > doubleu.com> wrote:
Personally, I wouldn't mind having a .timestamp file in the snapshot,
which would contain a printed form of the timestamp rsnapshot used for
the touch, and perhaps with the same timestamp as the directory using
touch -r. That file would never need to be renamed after the snapshot
[/quote]
in cron, do a:

date > /etc/rsnapshot/.timestamp && rsnapshot <increment>

where /etc/rsnapshot is one of the backed up directories in every
backup.

and bada-bing, Bob's your Uncle. :)
[/quote][/quote]
Which does you *zero* good if you've got a tar, an rsync, or simple
NFS mount cp going on from the "daily.0" snapshot and it rotates out
from under you in the midst of the replication, This is an old
problem. Stuffing a legibble timestamp inside the numbered snapshot
does not help much when the path to the version of the file changes
out from under you.

[quote]

Derp.

Like:
[/quote]

[quote]/usr/bin/rsnapshot sync && date -R -r .sync >.sync/.timestamp && touch -r
.sync .sync/.timestamp

Then next rotation will just carry it along, forever. Create
.sync/.timestamp ahead of time so that the first creation doesn't screw up
the timestamp on .sync.

Like I said, two-line change :-).
[/quote]
Which does not solve the underlying problem I referred to.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 04, 2016 10:26PM
On Wed, May 4, 2016 at 8:25 PM, Nico Kadel-Garcia <nkadel < at > gmail.com ([email]nkadel < at > gmail.com[/email])> wrote:
[quote]On Wed, May 4, 2016 at 8:29 PM, Scott Hess <scott < at > doubleu.com ([email]scott < at > doubleu.com[/email])> wrote:
[quote]On Wed, May 4, 2016 at 4:20 PM, Christopher Barry
<christopher.r.barry < at > gmail.com ([email]christopher.r.barry < at > gmail.com[/email])> wrote:
[quote]On Wed, 4 May 2016 10:30:58 -0700Scott Hess <scott < at > doubleu.com ([email]scott < at > doubleu.com[/email])> wrote:
[quote]Personally, I wouldn&#39;t mind having a .timestamp file in the snapshot,
which would contain a printed form of the timestamp rsnapshot used for
the touch, and perhaps with the same timestamp as the directory using
touch -r.  That file would never need to be renamed after the snapshot
[/quote]
in cron, do a:

date > /etc/rsnapshot/.timestamp && rsnapshot <increment>

where /etc/rsnapshot is one of the backed up directories in every
backup.

and bada-bing, Bob&#39;s your Uncle. :)
[/quote][/quote]
Which does you *zero* good if you&#39;ve got a tar, an rsync, or simple
NFS mount cp going on from the "daily.0" snapshot and it rotates out
from under you in the midst of the replication,  This is an old
problem. Stuffing a legibble timestamp inside the numbered snapshot
does not help much when the path to the version of the file changes
out from under you.
[/quote]

Write up a patch.  Pull the thread and find out how long it is.  Maybe it&#39;s a lot shorter than the nay-sayers think it is likely to be.  Maybe it&#39;s a lot longer than the people who request the feature think it should be.  Either way we&#39;d learn something.

-scott
Parallelism and deduplication
May 04, 2016 10:39PM
On Wed, May 04, 2016 at 09:45:10AM -0400, Nico Kadel-Garcia (nkadel < at > gmail.com) wrote:

[quote]Copying a large daily.3 snapshot aside, and making sure it's
consistent, can be tricky if the copy rotates under you.
[/quote]
On solution is to use filesystem snapshot, if you're using LVM
or a filesystem that supports snapshots by itself (zfs, brtfs):
create snapshot, copy daily.3 (or whatever) out, destroy the
snapshot.

[quote]This is why I've long wanted to change the numbering scheme from
"daily.0", "daily.1", etc. to "daily.20160401010203",
"daily.20160402113433", to use full UTC compatbile YYYYMMDDhhmmss
date stamped names.
[/quote]
That would have its advantages, but it'd not be a trivial change.

--
Tapani Tarvainen

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 05, 2016 02:17AM
Hallo, Patrick,

Du meintest am 04.05.16:

[quote][quote]Rsnapshot sets the directory timestamp using touch, I think as the
last thing it does. It's intentional, not some sort of side effect.
[/quote][/quote]
[...]

[quote]Perhaps, but that behaviour is not documented as far as I know.
[/quote]
It is, in every log file.

Viele Gruesse!
Helmut

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 05, 2016 02:55AM
2016-05-01 20:30 GMT+02:00 Scott Hess <scott < at > doubleu.com>:
[quote]Each snapshot should be a complete copy of the directory structure, with all
unique files uniquely present, and all files shared with the previous backup
hardlinked. So in my snapshot root, I can do:

# ls -li daily.?/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.0/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.1/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.2/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.3/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.4/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.5/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.6/nbackup/bin/ls

They're all the same file (other refs are in 4x hourly.?, 5x weekly.?, and
.sync). I could copy or move daily.3 to a different disk, and the files in
all the others would still be present. There are no incrementals.
[/quote]

Let's assume that everything is going very bad, a bad sysadmin deleted
multiple backups and so on.
I'll end with having only a single, older, backup, in example: daily.5.
No ".sync" or anything else. Only daily.5

"daily.5" should be totally available as all hardlinked files are
still retained (due to link count > 0 thus files are still there)
What would happen by copying daily.5 to brand new ".sync" dir and
running rsnapshot again ? Only differences between daily.5 and current
backups are transfered, right?
Thus, in this case, i'll still able to do an incremental backup
starting from a very old "snapshot". Is this true?

This could be very usefull, with larger servers. If a full backup
would be 300GB and incremental would be just some hundreds of MB, the
ability to
run an incremental from an older backup will allow me to not transfer
300GB but just some MB.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 05, 2016 06:19AM
Hallo, Gandalf,

Du meintest am 05.05.16:

[quote]I'll end with having only a single, older, backup, in example:
daily.5. No ".sync" or anything else. Only daily.5
[/quote]
[quote]"daily.5" should be totally available as all hardlinked files are
still retained (due to link count > 0 thus files are still there)
[/quote]
Take a look at the "rsnapshot.log".

There you can see that you can rename each backup to some other name
which "rsnapshot" doesn't change, and there you can see that yoy can
make a "hard linked" copy with the simple command "rsync ..." - just
copy such a logged line to the CLI and adjust the source directory and
the target directory.

By the way: hard linked files and directories need to be one the same
disc/partition. Copying such a directory to another partition doesn't
hard link from the source partition.

Viele Gruesse!
Helmut

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 05, 2016 07:36AM
I&#39;m thinking that at some point you&#39;re just going to have to dig in there and figure out how rsync works.  You&#39;ve thrown out a number of hypothetical situations on this thread, and at this point I&#39;m worried that in three months you&#39;ll come back and say "But you said _this_ would work, and you said _that_ would work, but we lost our backups!"

The basic operation of rsnapshot is not super complicated, you can see what it&#39;s doing right in the log files.  I have in the past manually constructed a new backup target out of pieces of other backup targets, because I wanted to move a volume between servers and couldn&#39;t afford the space to backup the volume in two places.  It&#39;s not that hard.  If you&#39;re worried about a particular case, spin up a Linux server, build a trivial example structure or three to backup, setup cron to run backups every five minutes for a few hours, then get in there and experiment.

-scott

On Thu, May 5, 2016 at 2:52 AM, Gandalf Corvotempesta <gandalf.corvotempesta < at > gmail.com ([email]gandalf.corvotempesta < at > gmail.com[/email])> wrote:
[quote]2016-05-01 20:30 GMT+02:00 Scott Hess <scott < at > doubleu.com ([email]scott < at > doubleu.com[/email])>:
[quote]Each snapshot should be a complete copy of the directory structure, with all
unique files uniquely present, and all files shared with the previous backup
hardlinked.  So in my snapshot root, I can do:

# ls -li daily.?/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.0/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.1/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.2/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.3/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.4/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.5/nbackup/bin/ls
564308 -rwxr-xr-x 17 root root 110080 Mar 10 11:10 daily.6/nbackup/bin/ls

They&#39;re all the same file (other refs are in 4x hourly.?, 5x weekly.?, and
.sync).  I could copy or move daily.3 to a different disk, and the files in
all the others would still be present.  There are no incrementals.
[/quote]

Let&#39;s assume that everything is going very bad, a bad sysadmin deleted
multiple backups and so on.
I&#39;ll end with having only a single, older, backup, in example: daily.5.
No ".sync" or anything else. Only daily.5

"daily.5" should be totally available as all hardlinked files are
still retained (due to link count > 0 thus files are still there)
What would happen by copying daily.5 to brand new ".sync" dir and
running rsnapshot again ? Only differences between daily.5 and current
backups are transfered, right?
Thus, in this case, i&#39;ll still able to do an incremental backup
starting from a very old "snapshot". Is this true?

This could be very usefull, with larger servers. If a full backup
would be 300GB and incremental would be just some hundreds of MB, the
ability to
run an incremental from an older backup will allow me to not transfer
300GB but just some MB.
[/quote]
Parallelism and deduplication
May 06, 2016 01:56AM
2016-05-05 16:33 GMT+02:00 Scott Hess <scott < at > doubleu.com>:
[quote]I'm thinking that at some point you're just going to have to dig in there
and figure out how rsync works. You've thrown out a number of hypothetical
situations on this thread, and at this point I'm worried that in three
months you'll come back and say "But you said _this_ would work, and you
said _that_ would work, but we lost our backups!"
[/quote]
Absolutely not. This is not the case.

I'm trying to replace a Bacula backup environment (too many things to
check: huge databases (mine is 980GB), too many backup leves, too
prone to failures, ....) with
a smarter system like rsnapshot, where everything could be summarized
with: "rsync + cp + mv". No backup leves, no databases, files always
available for restore (and even for looking, like "cat my_lost_file")

I would like to copy the weekly backups to a tape library. Obviously I
need to resolve all hardlinks when copying or i'll end up with
inconsistent data in case of restore.
My question is: how can I detect an hardlink made by rsnapshot from an
hardlink that was on the source server ? rsnapshot's hardlink must be
dereferenced when copying to tape, original hardlink must preserved to
restore the system in the original state.

For example:

on source server:
$ ln file1 file2
$ echo 'test' > file3

when running multiple backups, file3 would be hardlinked between
".sync" and "daily.N". "file1" and "file" are already hardlinked.

When copying, I have to copy the *content* of "file3", dereferencing
it, but still preserving the hardlink between file1 and file2

I hope this is clear.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 02:32AM
2016-05-06 10:53 GMT+02:00 Gandalf Corvotempesta
<gandalf.corvotempesta < at > gmail.com>:
[quote]My question is: how can I detect an hardlink made by rsnapshot from an
hardlink that was on the source server ? rsnapshot's hardlink must be
[/quote]
You can not: rsnapshot does not preserve hard links _in_ the data to backup.

Best
Martin

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 02:55AM
2016-05-06 11:30 GMT+02:00 Martin Schröder <martin < at > oneiros.de>:
[quote]You can not: rsnapshot does not preserve hard links _in_ the data to backup.
[/quote]
So, in my example, "file1" and "file2" would be copied as files and
not hardlinked together ?
Thus, all hardlinks that I can see in a backup directory, are from
rsnapshot and not from the source server.

Doing a restore would result in much more used space than the original
server, as all hardlinked files are resolved and not preserved. Right
?

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 02:58AM
2016-05-06 11:52 GMT+02:00 Gandalf Corvotempesta
<gandalf.corvotempesta < at > gmail.com>:
[quote]2016-05-06 11:30 GMT+02:00 Martin Schröder <martin < at > oneiros.de>:
[quote]You can not: rsnapshot does not preserve hard links _in_ the data to backup.
[/quote]
So, in my example, "file1" and "file2" would be copied as files and
not hardlinked together ?
Thus, all hardlinks that I can see in a backup directory, are from
rsnapshot and not from the source server.
[/quote]
AFAIK yes.

[quote]Doing a restore would result in much more used space than the original
server, as all hardlinked files are resolved and not preserved. Right
?
[/quote]
s/would/could/, but yes.

Best
Martin

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 03:21AM
On 6 May 2016 at 10:52, Gandalf Corvotempesta <gandalf.corvotempesta < at > gmail.com ([email]gandalf.corvotempesta < at > gmail.com[/email])> wrote:
[quote]2016-05-06 11:30 GMT+02:00 Martin Schröder <martin < at > oneiros.de ([email]martin < at > oneiros.de[/email])>:
[quote]You can not: rsnapshot does not preserve hard links _in_ the data to backup.
[/quote]
So, in my example, "file1" and "file2" would be copied as files and
not hardlinked together ?
Thus, all hardlinks that I can see in a backup directory, are from
rsnapshot and not from the source server.

Doing a restore would result in much more used space than the original
server, as all hardlinked files are resolved and not preserved. Right
?

[/quote]

Yes. I suspect this is a problem with all backup systems that rely on hard links to eliminate duplicates (including Apple&#39;s Time Machine AFAIK). They work reasonably well as long as the source filesystem doesn&#39;t have a lot of hard links in it, but restores will produce additional data copies unless you run a separate dedupe process.

poc
Parallelism and deduplication
May 06, 2016 03:34AM
On Fri, May 6, 2016 at 5:30 AM, Martin Schröder <martin < at > oneiros.de> wrote:
[quote]2016-05-06 10:53 GMT+02:00 Gandalf Corvotempesta
<gandalf.corvotempesta < at > gmail.com>:
[quote]My question is: how can I detect an hardlink made by rsnapshot from an
hardlink that was on the source server ? rsnapshot's hardlink must be
[/quote]
You can not: rsnapshot does not preserve hard links _in_ the data to backup.

Best
Martin
[/quote]
I beg your pardon? That is the "-H" option for rsync. It's not enabled
by default, but it's certainly available in the rsync_short_args.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 03:36AM
On Fri, May 6, 2016 at 6:17 AM, Patrick O'Callaghan
<pocallaghan < at > gmail.com> wrote:
[quote]
On 6 May 2016 at 10:52, Gandalf Corvotempesta
<gandalf.corvotempesta < at > gmail.com> wrote:
[quote]
2016-05-06 11:30 GMT+02:00 Martin Schröder <martin < at > oneiros.de>:
[quote]You can not: rsnapshot does not preserve hard links _in_ the data to
backup.
[/quote]
So, in my example, "file1" and "file2" would be copied as files and
not hardlinked together ?
Thus, all hardlinks that I can see in a backup directory, are from
rsnapshot and not from the source server.

Doing a restore would result in much more used space than the original
server, as all hardlinked files are resolved and not preserved. Right
?
[/quote]

Yes. I suspect this is a problem with all backup systems that rely on hard
links to eliminate duplicates (including Apple's Time Machine AFAIK). They
work reasonably well as long as the source filesystem doesn't have a lot of
hard links in it, but restores will produce additional data copies unless
you run a separate dedupe process.
[/quote]
A "restore" of an individual snapshot would have few internal
snapshots, anywhere. Restoring multiple snapshots..... gets a bit
adventuresome to try and recreate the hardlink infrastructure.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 03:40AM
2016-05-06 12:32 GMT+02:00 Nico Kadel-Garcia <nkadel < at > gmail.com>:
[quote]I beg your pardon? That is the "-H" option for rsync. It's not enabled
by default, but it's certainly available in the rsync_short_args.
[/quote]
And how would it differentiate between hardlinks because of multiple
versions and because they are in the original data?

If we have three generations of file1 and file2 (which are hardlinked in
the original data), do file1 and file2 have three or six links?

Best
Martin

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 03:48AM
On May 06 12:38, Martin Schröder (martin < at > oneiros.de) wrote:

[quote]2016-05-06 12:32 GMT+02:00 Nico Kadel-Garcia <nkadel < at > gmail.com>:
[quote]I beg your pardon? That is the "-H" option for rsync. It's not enabled
by default, but it's certainly available in the rsync_short_args.
[/quote]
And how would it differentiate between hardlinks because of multiple
versions and because they are in the original data?
[/quote]
Links in original data appear in the same directory tree
(.../daily.0/... &c), links created by rsnapshot are outside it.
Nothing else is needed to distinguish them.

[quote]If we have three generations of file1 and file2 (which are hardlinked in
the original data), do file1 and file2 have three or six links?
[/quote]
Six, of course.

But when you restore from there, it'll only create two - those
that are in the directory tree you're restoring from.

Suggestion: try it.

--
Tapani Tarvainen

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 03:57AM
On May 06 11:30, Martin Schröder (martin < at > oneiros.de) wrote:

[quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote]
With default settings it does not, but you can explicitly ask it
to do so with, e.g.,

rsync_short_options=-aH

in the configuration file.

--
Tapani Tarvainen

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 04:15AM
On 6 May 2016 at 11:25, Tapani Tarvainen <rsnapshot < at > tapanitarvainen.fi ([email]rsnapshot < at > tapanitarvainen.fi[/email])> wrote:
[quote]On May 06 11:30, Martin Schröder (martin < at > oneiros.de ([email]martin < at > oneiros.de[/email])) wrote:

[quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote]
With default settings it does not, but you can explicitly ask it
to do so with, e.g.,

rsync_short_options=-aH

in the configuration file.[/quote]

However I recommend reading the rsync man page on the -H option. It lists several cases in which the result may not be what you expect (not sure if any of them apply to rsnapshot).

poc
Parallelism and deduplication
May 06, 2016 04:24AM
2016-05-06 13:12 GMT+02:00 Patrick O'Callaghan <pocallaghan < at > gmail.com>:
[quote]However I recommend reading the rsync man page on the -H option. It lists
several cases in which the result may not be what you expect (not sure if
any of them apply to rsnapshot).
[/quote]
"If you specify a --link-dest directory that contains hard links, the
linking of the destination files against the --link-dest files can
cause some paths in the destination to become linked together due to
the --link-dest associations."

That would apply to rsnapshot, right?

Best
Martin

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 04:41AM
On 6 May 2016 at 12:22, Martin Schröder <martin < at > oneiros.de ([email]martin < at > oneiros.de[/email])> wrote:
[quote]2016-05-06 13:12 GMT+02:00 Patrick O&#39;Callaghan <pocallaghan < at > gmail.com ([email]pocallaghan < at > gmail.com[/email])>:
[quote]However I recommend reading the rsync man page on the -H option. It lists
several cases in which the result may not be what you expect (not sure if
any of them apply to rsnapshot).
[/quote]
"If you specify a --link-dest directory that contains hard links, the
linking of the destination files against the --link-dest files can
cause some paths in the destination to become linked together due to
the --link-dest associations."

That would apply to rsnapshot, right?

[/quote]

I would imagine so, but it&#39;s not clear is whether this is a bug or a feature.

Note that none of this addresses the problem of *restoring* hard links. A restore would have to use rsync -H again (or a dedupe process) to recover the exact same state, and I suspect someone could find a counterexample even then. Whether this matters or not depends on what we mean by "restore".

poc
Parallelism and deduplication
May 06, 2016 05:33AM
Hallo, Tapani,

Du meintest am 06.05.16:

[quote][quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote][/quote]
[quote]With default settings it does not, but you can explicitly ask it
to do so with, e.g.,
[/quote]
[quote]rsync_short_options=-aH
[/quote]
[quote]in the configuration file.
[/quote]
On a tape?

Viele Gruesse!
Helmut

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 05:33AM
Hallo, Gandalf,

Du meintest am 06.05.16:

[quote]So, in my example, "file1" and "file2" would be copied as files and
not hardlinked together ?
[/quote]
Surely - a tape doesn't use hard links between backups.

Viele Gruesse!
Helmut

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 05:51AM
On May 06 14:13, Helmut Hullen (Hullen < at > t-online.de) wrote:

[quote]Hallo, Tapani,

Du meintest am 06.05.16:

[quote][quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote][/quote]
[quote]With default settings it does not, but you can explicitly ask it
to do so with, e.g.,
[/quote]
[quote]rsync_short_options=-aH
[/quote]
[quote]in the configuration file.
[/quote]
On a tape?
[/quote]
:-)

Obviously not, but then it would not be rsnapshot that'd
write it to the tape, would it?

Though... I have seen a system that _could_ have hard links
on a tape, along with everything else one could have on a disk:
recovery system on some ancient HP-UX box.
It was rather amusing to watch it swap on tape. :-)
(OK, it stopped being amusing after a few hours.)

And actually I think 'tar' could write and read hard links
to and from tape, although I don't recall ever trying that.
So if you use the -H option with rsnapshot and then archive
the backup tree with tar, hard links could be preserved
in restore.

--
Tapani Tarvainen

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 06:27PM
On Fri, May 6, 2016 at 8:48 AM, Tapani Tarvainen
<rsnapshot < at > tapanitarvainen.fi> wrote:
[quote]On May 06 14:13, Helmut Hullen (Hullen < at > t-online.de) wrote:

[quote]Hallo, Tapani,

Du meintest am 06.05.16:

[quote][quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote][/quote]
[quote]With default settings it does not, but you can explicitly ask it
to do so with, e.g.,
[/quote]
[quote]rsync_short_options=-aH
[/quote]
[quote]in the configuration file.
[/quote]
On a tape?
[/quote]
:-)

Obviously not, but then it would not be rsnapshot that'd
write it to the tape, would it?

Though... I have seen a system that _could_ have hard links
on a tape, along with everything else one could have on a disk:
recovery system on some ancient HP-UX box.
It was rather amusing to watch it swap on tape. :-)
(OK, it stopped being amusing after a few hours.)

And actually I think 'tar' could write and read hard links
to and from tape, although I don't recall ever trying that.
So if you use the -H option with rsnapshot and then archive
the backup tree with tar, hard links could be preserved
in restore.
[/quote]
tar, and dump, do hardlinks just fine.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 06, 2016 10:50PM
Hallo, Nico,

Du meintest am 07.05.16:

[quote][quote][quote][quote][quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote][/quote][/quote][/quote][/quote]
[...]

[quote]tar, and dump, do hardlinks just fine.
[/quote]
And that's far away from "rsnapshot".

Viele Gruesse!
Helmut

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 07, 2016 06:42AM
On Sat, May 7, 2016 at 1:24 AM, Helmut Hullen <Hullen < at > t-online.de> wrote:
[quote]Hallo, Nico,

Du meintest am 07.05.16:

[quote][quote][quote][quote][quote]rsnapshot does not preserve hard links _in_ the data to backup.
[/quote][/quote][/quote][/quote][/quote]
[...]

[quote]tar, and dump, do hardlinks just fine.
[/quote]
And that's far away from "rsnapshot".

Viele Gruesse!
Helmut
[/quote]
One of our previous commenters referred to writing the rsnapshot
copies to tape, and whether internal hardlinks would be preserved.
Both technologies work just fine for just that purpose.

In fact, "dump" is deprecated these days for lots of reasons. If you
need to backup rsnapshot copies to tape, I've had good success with
using the "AMANDA" software, and backup up the "weekly" and moving
aside a properly date-labeled "cp -al" copy of the "daily"
specifically to prevent accidental rsnapshot rotation during the
backup.

I'm also reaching back into the "WayBack" machine about this stuff,
that was way back when I first encountered rsnapshot and when I ported
AMANDA to SunOS....

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Parallelism and deduplication
May 07, 2016 08:12AM
Il 07/05/2016 15:39, Nico Kadel-Garcia ha scritto:
[quote] [quote]

One of our previous commenters referred to writing the rsnapshot
copies to tape, and whether internal hardlinks would be preserved.
Both technologies work just fine for just that purpose.
[/quote] [/quote] Ok. I admit, I've created much confusion.
Let me try to explain (keep in mind that english is not my native language)

Actually i'm backingup tons of hosts with bacula. I would like to replace bacula
with something better (for reasons that I don't repeat) and easier to maitain.

i'm trying rsnapshot with a subset of these hosts and is working very well.
Now, I have 4 issues to address:

1) how to restore hardlinks that was present on source host? This is not a big
deal, just a question because I'm unable to differentiate between existing hardlinks
and hardlinks created by rsnapshot during the "cp -al" phase

2) I have to copy one or more backups to a tape library, for disaster recovery.
Obviously, tape will be used only when everything is going bad, they are a very
last resort. There is no need (for me) to preserve hardlinks. If i have to restore from
tapes, everything else is lost, so having the exact hardlink preserved it doesn't matter.

3) as with bacula there are many, many, many points to check and to administer (mysql db,
full level retention, and so on), many things could go wrong.
In example if you loose/corrupt 1 full, you loose ALL backups for that host.
I would like to be sure that this is impossible with rsnapshot, as all backup levels are indipendent.
Deleting everything except 1 directory doesn't loose all backups but only the deleted ones.

4) hardlinks must be on the same filesystem. This is clear. So, something like this:
$ cp daily.4 /mnt/other_file_system
would [b]resolve[/b] all hardlinks and copy the whole backup (without hardlinks) to the other filesystem, right?
This is something I could try on my one, next Monday at work.
The same should be when using tar over tape. I have to save the file content, not a bad hardlink pointer.
Parallelism and deduplication
May 07, 2016 08:26AM
[quote]On May 7, 2016, at 07:39, Nico Kadel-Garcia <nkadel < at > gmail.com> wrote:

and moving
aside a properly date-labeled
[/quote]
You're going to push this forever, aren't you? It's so small minded and not scalable.

My snapshots can take weeks to run over infinitBand. What date would you propose is "proper"--when it completes or starts?
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Sorry, only registered users may post in this forum.

Click here to login