Welcome! » Log In » Create A New Profile

relinking (deduping) disconnected rsnapshot trees

Posted by Anonymous 
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 07:50AM
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 09:29AM
On Wed, 3 Jun 2015 14:49:10 +0000
"Winkel, Richard J." <winkelr < at > missouri.edu> wrote:

[quote]Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
[/quote]
Can you go into more detail here? disk overflow? Do you mean you ran
out of disk space, didn't notice and backups have been failing for
some period?

How have you proceeded to correct this problem? e.g. did you replace/add
disks and rebuild, and now you have a larger RAID (5?) and wish to
resume backing up to this? Did you make room by deleting a bunch of
older stuff? Or, do you now have another additional RAID device to add
new backups to? The more detail the better.

[quote]I'd rather not just go back to the last intact backup, but find a way
to merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are
identical, then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 10:19AM
If it&#39;s a limited amount of unlinked data, then one approach you could use is:

mv weekly.0 weekly.0.broken
cp -al weekly.1 weekly.0
rsync -aHS weekly.0.broken/ weekly.1/
rm -rf weekly.0.broken

This will take an existing correct backup (weekly.1), then rsync _over_ it.  Depending on what kind of fidelity you want to the brokenness, you might add --delete to the rsync command.  Basically just read the rsnapshot.log and figure out a script which will replicate the gist of it.

BUT!  Obviously you need to be really careful, and you would be well advised to spend a bit of time thinking about how this is going to affect your overall backup.  Normally I would not expect rsnapshot to leave you with fragmentary unlinked backups.  Having something you believe to be that implies that your tip-of-stream is disconnected from your older backups, and hot-patching one or two directories isn&#39;t going to resolve that.  In that case I&#39;d think even harder about the problem, and maybe even fake up a simple rsnapshot sandbox to experiment with, to make sure I&#39;m making things better rather than worse.

Also ... IMHO you might be best served to shift those broken directories aside and set a calendar entry to manually delete them after an appropriate time, and not treat them as part of your normal rsnapshot stream.  Those broken snapshot directories are mis-leading, and I&#39;d worry about mistakes made in six months when something else comes up and you have to manually intervene.

-scott

On Wed, Jun 3, 2015 at 7:49 AM, Winkel, Richard J. <winkelr < at > missouri.edu ([email]winkelr < at > missouri.edu[/email])> wrote:
[quote]Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I&#39;d rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree.  In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn&#39;t seem to be the tool to use, at least I can&#39;t
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net ([email]rsnapshot-discuss < at > lists.sourceforge.net[/email])
[url=https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss]https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss[/url]
[/quote]
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 10:30AM
[quote]I guess I'm just being lazy. But if anyone else already has something
in hand
it seems like it would be useful to a lot of people.
Otherwise I guess I'll have to invent it.

On 06/03/2015 09:49 AM, Winkel, Richard J. wrote:
[quote]Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------

_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]
[/quote]
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 10:30AM
On 06/03/2015 11:56 AM, Rich Winkel wrote:
[quote]Here's a first draft:
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Syntax $0 tree1 tree2"
echo "Scans 2 trees for identically path'd and named files and
if they are identical, links them together."
exit 1
fi
if ! [ -d "$1" -a -d "$2" -a $(df -P "$1" "$2" | awk '{print $1}'|
uniq | wc -l) -eq 2 ]; then
echo "Arguments must be directories on the same partition!
Exiting..."
exit 2
fi
find "$1" -type f -print | sed "s,$1,," |while read f; do
if cmp -s "$1/$f" "$2/$f"; then
echo "Linking $1/$f to $2/$f "
rm -f "$2/$f"
ln "$1/$f" "$2/$f"
fi
done

[/quote]
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 10:30AM
[quote]Thanks for the reply!
I moved one of the backup trees elsewhere so I have some free space.
The overflow has been happening for about a month.
I obviously need to check what happened to the syslog message.
Also I had lazy_deletes turned on, I think this interfered with the
rollback
procedure, it left _delete* directories lying around that were never
cleaned up.

On 06/03/2015 11:24 AM, Christopher Barry wrote:
[quote]On Wed, 3 Jun 2015 14:49:10 +0000
"Winkel, Richard J." <winkelr < at > missouri.edu> wrote:

[quote]Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
[/quote]Can you go into more detail here? disk overflow? Do you mean you ran
out of disk space, didn't notice and backups have been failing for
some period?

How have you proceeded to correct this problem? e.g. did you replace/add
disks and rebuild, and now you have a larger RAID (5?) and wish to
resume backing up to this? Did you make room by deleting a bunch of
older stuff? Or, do you now have another additional RAID device to add
new backups to? The more detail the better.

[quote]I'd rather not just go back to the last intact backup, but find a way
to merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are
identical, then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------

_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]
------------------------------------------------------------------------------

_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]
[/quote]
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 11:23AM
Hi Scott,
Thanks for the thoughtful reply. You have a good approach, I don't think anything would be lost, but I think files could be linked in that shouldn't be there.
Unfortunately I can't just set the broken stuff aside, I have about 20TB of data here and the raid is about 95% full.

On 06/03/2015 12:18 PM, Scott Hess wrote:

[quote] If it's a limited amount of unlinked data, then one approach you could use is:

mv weekly.0 weekly.0.broken
cp -al weekly.1 weekly.0
rsync -aHS weekly.0.broken/ weekly.1/
rm -rf weekly.0.broken

This will take an existing correct backup (weekly.1), then rsync _over_ it. Depending on what kind of fidelity you want to the brokenness, you might add --delete to the rsync command. Basically just read the rsnapshot.log and figure out a script which will replicate the gist of it.

BUT! Obviously you need to be really careful, and you would be well advised to spend a bit of time thinking about how this is going to affect your overall backup. Normally I would not expect rsnapshot to leave you with fragmentary unlinked backups. Having something you believe to be that implies that your tip-of-stream is disconnected from your older backups, and hot-patching one or two directories isn't going to resolve that. In that case I'd think even harder about the problem, and maybe even fake up a simple rsnapshot sandbox to experiment with, to make sure I'm making things better rather than worse.

Also ... IMHO you might be best served to shift those broken directories aside and set a calendar entry to manually delete them after an appropriate time, and not treat them as part of your normal rsnapshot stream. Those broken snapshot directories are mis-leading, and I'd worry about mistakes made in six months when something else comes up and you have to manually intervene.

-scott

On Wed, Jun 3, 2015 at 7:49 AM, Winkel, Richard J. <winkelr < at > missouri.edu ([email]winkelr < at > missouri.edu[/email])> wrote:
[quote] Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net ([email]rsnapshot-discuss < at > lists.sourceforge.net[/email])
[url=https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss]https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss[/url]
[/quote]

[/quote]
relinking (deduping) disconnected rsnapshot trees
June 03, 2015 11:25PM
You may want to take a look at hardlink:

[url=http://jak-linux.org/projects/hardlink/]http://jak-linux.org/projects/hardlink/[/url]

Best,

Rasmus
[i]Intomics is a contract research organization specialized in deriving core biological insight from large scale data. We help our clients in the pharmaceutical industry develop tomorrow's medicines better, faster, and cheaper through optimized use of biomedical data.[/i]-----------------------------------------------------------------
Hansen, Rasmus Borup Intomics - from data to biology
System Administrator Diplomvej 377
Scientific Programmer DK-2800 Kgs. Lyngby
Denmark
E: rbh < at > intomics.com ([email]rbh < at > intomics.com[/email]) W: [url=http://www.intomics.com/]http://www.intomics.com/[/url]
P: +45 5167 7972 P: +45 8880 7979

[quote]On 03 Jun 2015, at 16:49, Winkel, Richard J. <winkelr < at > missouri.edu ([email]winkelr < at > missouri.edu[/email])> wrote:
Because of an undetected disk overflow I have fragmented copies of partial rsnapshot backups on a raid.I'd rather not just go back to the last intact backup, but find a way to merge the new data with the existingtree. In other words, scan directories A and B andif files A/subpathK/fileX and B/subpathK/fileX exist and are identical, then link them together, otherwise do nothing.Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't figure it out.Has anyone else run across this issue and how did you resolve it?Thanks,Rich------------------------------------------------------------------------------_______________________________________________rsnapshot-discuss mailing listrsnapshot-discuss < at > lists.sourceforge.net ([email]rsnapshot-discuss < at > lists.sourceforge.net[/email])https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]
relinking (deduping) disconnected rsnapshot trees
June 04, 2015 02:05AM
Hallo, Rasmus,

Du meintest am 04.06.15:

[quote]You may want to take a look at hardlink:
[/quote]
[quote]http://jak-linux.org/projects/hardlink/
http://jak-linux.org/projects/hardlink/
[/quote]
Or you use the original "hardlink" (written in C and then compiled into
a binary):

http://arktur.shuttle.de/CD/beta/slack/ap1/hardlink-1.2-i486-1hln.tgz

This "hardlink" program is really small. And really quick.
Authors: Jakub Jalunik, Dag Wieers, Dries Verachtert etc.

Viele Gruesse!
Helmut

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
relinking (deduping) disconnected rsnapshot trees
June 04, 2015 10:40AM
This is a good starting point:  [url=http://en.wikipedia.org/wiki/List_of_duplicate_file_finders]http://en.wikipedia.org/wiki/List_of_duplicate_file_finders[/url]

The primary issue with these tools in an rsnapshot root is that they can become memory constrained, unless special care is taken.  I ended up writing my own perl script which runs a find generating various metadata, then runs that through sort to group the files which could be the same, then processes that result.

-scott

On Wed, Jun 3, 2015 at 11:07 PM, Rasmus Borup Hansen <rbh < at > intomics.com ([email]rbh < at > intomics.com[/email])> wrote:
[quote]You may want to take a look at hardlink:

[url=http://jak-linux.org/projects/hardlink/]http://jak-linux.org/projects/hardlink/[/url]

Best,

Rasmus

[i]Intomics is a contract research organization specialized in deriving core biological insight from large scale data. We help our clients in the pharmaceutical industry develop tomorrow&#39;s medicines better, faster, and cheaper through optimized use of biomedical data.[/i]-----------------------------------------------------------------
Hansen, Rasmus Borup              Intomics - from data to biology
System Administrator              Diplomvej 377
Scientific Programmer             DK-2800 Kgs. Lyngby
                                  Denmark
E: rbh < at > intomics.com ([email]rbh < at > intomics.com[/email])               W: [url=http://www.intomics.com/]http://www.intomics.com/[/url]
P: +45 5167 7972                  P: +45 8880 7979

[quote]On 03 Jun 2015, at 16:49, Winkel, Richard J. <winkelr < at > missouri.edu ([email]winkelr < at > missouri.edu[/email])> wrote:

Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I&#39;d rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree.  In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn&#39;t seem to be the tool to use, at least I can&#39;t
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net ([email]rsnapshot-discuss < at > lists.sourceforge.net[/email])
[url=https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss]https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss[/url]

[/quote]

------------------------------------------------------------------------------

_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net ([email]rsnapshot-discuss < at > lists.sourceforge.net[/email])
[url=https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss]https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss[/url]

[/quote]
relinking (deduping) disconnected rsnapshot trees
June 04, 2015 01:08PM
On Wed, 3 Jun 2015 17:29:05 +0000
"Winkel, Richard J." <winkelr < at > missouri.edu> wrote:

[quote]On 06/03/2015 11:56 AM, Rich Winkel wrote:
[quote]Here's a first draft:
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Syntax $0 tree1 tree2"
echo "Scans 2 trees for identically path'd and named files
and if they are identical, links them together."
exit 1
fi
if ! [ -d "$1" -a -d "$2" -a $(df -P "$1" "$2" | awk '{print $1}'|
uniq | wc -l) -eq 2 ]; then
echo "Arguments must be directories on the same partition!
Exiting..."
exit 2
fi
find "$1" -type f -print | sed "s,$1,," |while read f; do
if cmp -s "$1/$f" "$2/$f"; then
echo "Linking $1/$f to $2/$f "
rm -f "$2/$f"
ln "$1/$f" "$2/$f"
fi
done

[/quote]
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]

This may be useful:
http://serverfault.com/questions/618735/can-i-use-rsync-to-create-a-list-of-only-changed-files

It shows a way to leverage rsync to create a list of the files that
would be transfered from the <src> to the <dest>, without actually
doing anything. Armed with this list, you can infer that all other
files can be hardlinks.

HTH
--
Regards,
Christopher Barry

Random geeky fortune:
If it ain't baroque, don't phiques it.

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Sorry, only registered users may post in this forum.

Click here to login