Welcome! » Log In » Create A New Profile

Reaching the limit of rsnapshot: Too many files, only few ch

Posted by Anonymous 
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 02:15AM
Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory tree. Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change the way we use rsnapshot.

Up to now the application which creates the files can't handle a storage API. We need something which can be mounted
like a file system.

Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 04:49AM
On Mon, Aug 1, 2016 at 4:56 AM, Thomas Güttler
<guettliml < at > thomas-guettler.de> wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory tree. Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change the way we use rsnapshot.
[/quote]
Well, *that* part would be easy. The deep and unchanging section can
be backed up separately, and possibly less frequently, then the more
dynamic section for most configurations. That's easy to set up with
appropriate "--exclude" targets for the bulkier backup, and a more
directory targeted backup for that the parts you mention that have
specific requirements.

[quote]Up to now the application which creates the files can't handle a storage API. We need something which can be mounted
like a file system.
[/quote]
This is unclear. The files created by your software need to on a
filesystem, that is then backed up by rsnapshot?

[quote]Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
*Why* is it scattering data among 17 Million files on multiple
Terabytes? Or are the updates more focused?

[quote]--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
[/quote]
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 05:17AM
Am 01.08.2016 um 13:46 schrieb Nico Kadel-Garcia:
[quote]On Mon, Aug 1, 2016 at 4:56 AM, Thomas Güttler
<guettliml < at > thomas-guettler.de> wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory tree. Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change the way we use rsnapshot.
[/quote]
Well, *that* part would be easy. The deep and unchanging section can
be backed up separately, and possibly less frequently, then the more
dynamic section for most configurations. That's easy to set up with
appropriate "--exclude" targets for the bulkier backup, and a more
directory targeted backup for that the parts you mention that have
specific requirements.
[/quote]
Maintaining include/exclude lists by hand does not work in this context.
Updates can happen in any directory.

[quote][quote]Up to now the application which creates the files can't handle a storage API. We need something which can be mounted
like a file system.
[/quote]
This is unclear. The files created by your software need to on a
filesystem, that is then backed up by rsnapshot?
[/quote]
Yes. The software needs a big filesystem. And this filesystem gets
backed up by rsnapshost.

[quote][quote]Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
*Why* is it scattering data among 17 Million files on multiple
Terabytes? Or are the updates more focused?
[/quote]
This is out of my sphere of influence. I am here
to improve the backup. Changing the system which creates
these files is out-of-scope.

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 05:17AM
Am 01.08.2016 um 13:46 schrieb Nico Kadel-Garcia:
[quote]On Mon, Aug 1, 2016 at 4:56 AM, Thomas Güttler
<guettliml < at > thomas-guettler.de> wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory tree. Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change the way we use rsnapshot.
[/quote]
Well, *that* part would be easy. The deep and unchanging section can
be backed up separately, and possibly less frequently, then the more
dynamic section for most configurations. That's easy to set up with
appropriate "--exclude" targets for the bulkier backup, and a more
directory targeted backup for that the parts you mention that have
specific requirements.
[/quote]
Maintaining include/exclude lists by hand does not work in this context.
Updates can happen in any directory.

[quote][quote]Up to now the application which creates the files can't handle a storage API. We need something which can be mounted
like a file system.
[/quote]
This is unclear. The files created by your software need to on a
filesystem, that is then backed up by rsnapshot?
[/quote]
Yes. The software needs a big filesystem. And this filesystem gets
backed up by rsnapshost.

[quote][quote]Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
*Why* is it scattering data among 17 Million files on multiple
Terabytes? Or are the updates more focused?
[/quote]
This is out of my sphere of influence. I am here
to improve the backup. Changing the system which creates
these files is out-of-scope.

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 07:15AM
On Mon, 1 Aug 2016 14:15:19 +0200
Thomas Güttler <guettliml < at > thomas-guettler.de> wrote:

[quote]Am 01.08.2016 um 13:46 schrieb Nico Kadel-Garcia:
[quote]On Mon, Aug 1, 2016 at 4:56 AM, Thomas Güttler
<guettliml < at > thomas-guettler.de> wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and
only very few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory
tree. Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change
the way we use rsnapshot.
[/quote]
Well, *that* part would be easy. The deep and unchanging section can
be backed up separately, and possibly less frequently, then the more
dynamic section for most configurations. That's easy to set up with
appropriate "--exclude" targets for the bulkier backup, and a more
directory targeted backup for that the parts you mention that have
specific requirements.
[/quote]
Maintaining include/exclude lists by hand does not work in this
context. Updates can happen in any directory.

[quote][quote]Up to now the application which creates the files can't handle a
storage API. We need something which can be mounted like a file
system.
[/quote]
This is unclear. The files created by your software need to on a
filesystem, that is then backed up by rsnapshot?
[/quote]
Yes. The software needs a big filesystem. And this filesystem gets
backed up by rsnapshost.

[quote][quote]Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
*Why* is it scattering data among 17 Million files on multiple
Terabytes? Or are the updates more focused?
[/quote]
This is out of my sphere of influence. I am here
to improve the backup. Changing the system which creates
these files is out-of-scope.

Regards,
Thomas Güttler

[/quote]

I can think of two ideas off the top of my head;

1. Increase the time between rsnapshot backups.

2. Use something like inotify to monitor and identify directories where
files have been changed.

In the former case, simply change the cron.

In the latter case, do something like exclude all, then selectively
include the changed directories via a script to modify an rsnapshot
config file, then run rsnapshot based on that config. This is of
course a simplistic overview, but at first blush seems doable. How that
can play with hard-linking, etc. is not immediately understood by me
however. Others can add reasons why this idea can or cannot work...

Just food for thought...

-C

--
Regards,
Christopher

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 07:20AM
On Mon, Aug 01, 2016 at 02:15:19PM +0200, Thomas G?ttler wrote:

[quote]Maintaining include/exclude lists by hand does not work in this context.
Updates can happen in any directory.
[/quote]
Some OSes have mechanisms to tell some piece of code when changes
happen in a filesystem, eg inotify on Linux, FSEvents on OS X. Could you
use that to maintain include/exclude lists automagically?

--
David Cantrell

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 07:24AM
[quote]I can think of two ideas off the top of my head;

1. Increase the time between rsnapshot backups.
[/quote]
Yes, this is a work-around. The drawback is clear: More data gets lost
if there is a crash ... (yes we have RAID, but this is not enough ...)

[quote]2. Use something like inotify to monitor and identify directories where
files have been changed.
[/quote]
We thought about this, too. But inotify has a limit you need to watch
every directory. Up to now I found no inotify in linux which can work
on the whole filesystem.

Regards,
Thomas

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 07:27AM
On Monday 01 August 2016 10:56:37 Thomas Güttler wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very
few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory tree.
Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change the way we
use rsnapshot.

Up to now the application which creates the files can't handle a storage
API. We need something which can be mounted like a file system.

Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
The only time i ever cam across something that big, they were using tivoli
storage manager and there was a daemon running in the background tracking
every file change. When the backup ran it would use the daemons journal instead
of trawling the filesystem.
Of course the also ran traditional incremental filesystem trawling backups on
the weekends, just in case their daemon missed something.

--
Arne Hüggenberg
System Administrator
_______________________
Sports & Bytes GmbH
Rheinlanddamm 207-209
D-44137 Dortmund
Fon: +49-231-9020-6655
Fax: +49-231-9020-6989

Geschäftsführer: Thomas Treß, Carsten Cramer

Sitz und Handelsregister: Dortmund, HRB 14983
Finanzamt Dortmund - West
Steuer-Nr.: 314/5763/0104
USt-Id. Nr.: DE 208084540

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 07:48AM
On Mon, 1 Aug 2016 16:22:28 +0200
Thomas Güttler <guettliml < at > thomas-guettler.de> wrote:

[quote][quote]I can think of two ideas off the top of my head;

1. Increase the time between rsnapshot backups.
[/quote]
Yes, this is a work-around. The drawback is clear: More data gets lost
if there is a crash ... (yes we have RAID, but this is not enough ...)

[quote]2. Use something like inotify to monitor and identify directories
where files have been changed.
[/quote]
We thought about this, too. But inotify has a limit you need to watch
every directory. Up to now I found no inotify in linux which can work
on the whole filesystem.

Regards,
Thomas

[/quote]
This may prove useful to you:
http://www.ibm.com/developerworks/library/l-ubuntu-inotify/index.html

It's a bit older, but the concept is the same. You'll need to handle
the recursive actions and updates in user space yourself, but it's
definitely doable.

Good luck, and I (and probably everyone on this list) would love to see
what you come up with.

--
Regards,
Christopher

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 07:57AM
On Mon, Aug 01, 2016 at 10:56:37AM +0200, Thomas Güttler wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very few of them change and need to get backed-up.

Only about 0.1% of all files change on one day!

Unfortunately the changed files are scattered in a deep directory tree. Rsync needs very long to discover the changes.

I think sooner or alter we need to use a different tool or change the way we use rsnapshot.

Up to now the application which creates the files can't handle a storage API. We need something which can be mounted
like a file system.

Which tool could fit?

Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
Check if you could reduce the interval levels, in other words how many
daily/weekly/monthly backups you keep on the backup server:
Too many spread hard links may slow down the backup.

In the best case, switch the backup server to ZFS, reduce the interval
to a single level and create a zfs snapshot by using the rsnapshot
cmd_preexec[¹]. Rotating backups takes very much time to, this way you take
out the need for doing this by moving the logic to ZFS.

[¹] https://github.com/zfsnap/zfsnap

--
Oliver PETER oliver < at > gfuzz.de 0x456D688F

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot: Too many files, only few ch
August 01, 2016 03:06PM
On Mon, Aug 01, 2016 at 10:56:37AM +0200, Thomas Güttler wrote:
[quote]Since several years we use rsnapshot for backups.

On some systems we reach the limit. There are too many files and only very few of them change and need to get backed-up.

Rsync needs very long to discover the changes.
[/quote]
[quote]Environment:

* 17M files (number of files)
* 2.2TBytes of data.
* one host accessing the data via RAID.
[/quote]
Are you using -H (rsync argument to preserve hard links)?

When the backup source has many files in it, rsync -H will use a lot
of memory. If rsync uses too much memory to fit in RAM, then some
of that working set will be paged out to disk, and rsync will run
slowly. Other processes running at the time may be affected too.

If this is your situation and you are not too concerned about hard
linked files from the source being duplicated (because you don't have
many hard links in the source or you have plenty of space to backup to)
then you might want to try turning off -H and see whether rsnapshot
finishes more quickly and uses less RAM.

--
___________________________________________________________________________
David Keegel <djk < at > cyber.com.au> Cyber IT Solutions Pty. Ltd.
http://www.cyber.com.au/~djk/ Linux & Unix Systems Administration

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Sorry, only registered users may post in this forum.

Click here to login