Welcome! » Log In » Create A New Profile

Reaching the limit of rsnapshot ... inotify does not help

Posted by Anonymous 
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 12:25AM
Am 01.08.2016 um 16:46 schrieb Christopher Barry:
...
[quote]
This may prove useful to you:
http://www.ibm.com/developerworks/library/l-ubuntu-inotify/index.html
[/quote]

inotify does not scale like we need it. There are too many directories.

Nevertheless, thank you for trying to help.

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 06:53AM
On Tue, 2 Aug 2016 09:22:45 +0200
Thomas Güttler <guettliml < at > thomas-guettler.de> wrote:

[quote]Am 01.08.2016 um 16:46 schrieb Christopher Barry:
...
[quote]
This may prove useful to you:
http://www.ibm.com/developerworks/library/l-ubuntu-inotify/index.html
[/quote]

inotify does not scale like we need it. There are too many directories.

Nevertheless, thank you for trying to help.

Regards,
Thomas Güttler

[/quote]
OK. Lists are for trying to help :)

Sounds like a hard problem to solve with anything other than simply
more and more time for backing up, unless you change the underlying
filesystem type (zfs sounded like a interesting idea to explore) or
replace the existing hardware to be much faster (or both).

What options have you come up with so far?
What do you think you'll end up doing to solve this problem?

--
Regards,
Christopher

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 07:31AM
Am 02.08.2016 um 15:51 schrieb Christopher Barry:
[quote]On Tue, 2 Aug 2016 09:22:45 +0200
Thomas Güttler <guettliml < at > thomas-guettler.de> wrote:

[quote]Am 01.08.2016 um 16:46 schrieb Christopher Barry:
...
[quote]
This may prove useful to you:
http://www.ibm.com/developerworks/library/l-ubuntu-inotify/index.html
[/quote]

inotify does not scale like we need it. There are too many directories.

Nevertheless, thank you for trying to help.

Regards,
Thomas Güttler

[/quote]
OK. Lists are for trying to help :)

Sounds like a hard problem to solve with anything other than simply
more and more time for backing up, unless you change the underlying
filesystem type (zfs sounded like a interesting idea to explore) or
replace the existing hardware to be much faster (or both).

What options have you come up with so far?
[/quote]
Up to now we increased the backup interval.

[quote]What do you think you'll end up doing to solve this problem?
[/quote]
I am unsure. At the moment I see theses solutions:

- Wrapping the filesystem with an overlay-filesystem which logs all changed files. Use this as include-list for
rsnapshot

- Moving the data to a storage and mount it with a tool like s3fs.

- Use a filesystem which can output all changed files.

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 08:00AM
[quote]On Aug 2, 2016, at 05:51, Christopher Barry <christopher.r.barry < at > gmail.com> wrote:

OK. Lists are for trying to help :)

Sounds like a hard problem to solve with anything other than simply
more and more time for backing up, unless you change the underlying
filesystem type (zfs sounded like a interesting idea to explore) or
replace the existing hardware to be much faster (or both).

What options have you come up with so far?
What do you think you'll end up doing to solve this problem?
[/quote]

.......And I'm curious what the data is and what creates it. That many new/changed files and only 2.x TB has to be interesting

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 08:29AM
Il 02 ago 2016 17:01, "Ken Woods" <kenwoods < at > gmail.com ([email]kenwoods < at > gmail.com[/email])> ha scrit
[quote].......And I&#39;m curious what the data is and what creates it.   That many new/changed files and only 2.x TB has to be interesting

[/quote]Any mass-hosting environment where you have thousands of websites and emails is able to create millions of small files every day
In example, if you host tons of prestashop ecommerce or WordPress sites, both with disk cache enabled, you will see millions of small files changed every night, plus the email accounts used by this sites/customers
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 09:28AM
On 2 August 2016 at 16:27, Gandalf Corvotempesta <gandalf.corvotempesta < at > gmail.com ([email]gandalf.corvotempesta < at > gmail.com[/email])> wrote:
[quote]
Il 02 ago 2016 17:01, "Ken Woods" <kenwoods < at > gmail.com ([email]kenwoods < at > gmail.com[/email])> ha scrit
[quote].......And I&#39;m curious what the data is and what creates it.   That many new/changed files and only 2.x TB has to be interesting

[/quote]Any mass-hosting environment where you have thousands of websites and emails is able to create millions of small files every day
In example, if you host tons of prestashop ecommerce or WordPress sites, both with disk cache enabled, you will see millions of small files changed every night, plus the email accounts used by this sites/customers [/quote]

In such a case the web front end would be the place to log changed files for later backup.

poc
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 12:04PM
Il 02 ago 2016 8:36 PM, "Ken Woods" <kenwoods < at > gmail.com ([email]kenwoods < at > gmail.com[/email])> ha scritto:
[quote]
If 200 sites are generating 2tb of changes in a day, I&#39;d think there&#39;s bigger issues at hand, but.......I&#39;m not in the web hosting world, so perhaps you&#39;re right.

[/quote]I&#39;ve wrote less than 1gb, not 2tb
Reaching the limit of rsnapshot ... inotify does not help
August 02, 2016 12:49PM
On Tue, Aug 02, 2016 at 04:28:46PM +0200, Thomas Güttler wrote:
[quote]...
What do you think you'll end up doing to solve this problem?

I am unsure. At the moment I see theses solutions:

- Wrapping the filesystem with an overlay-filesystem which logs all changed files. Use this as include-list for
rsnapshot
[/quote]
Or use ZFS, do filesystem based snapshots and transfer the incrementel
changes to a remote site (zfs send | zfs receive).

[quote]- Moving the data to a storage and mount it with a tool like s3fs.
[/quote]
Or use ZFS and enjoy other benefits like transparent compression and
checksumming.

[quote]- Use a filesystem which can output all changed files.
[/quote]
Or use ZFS and...

I might sound like a sales drone and/or a one trick pony but all of the
stuff you said sound over engineered and bury much potential to fail
sooner or later.
ZFS is the most pragmatic and easy to go solution I can think of.

--
Oliver PETER oliver < at > gfuzz.de 0x456D688F

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 03, 2016 12:51AM
Am 02.08.2016 um 21:47 schrieb Oliver Peter:
[quote]On Tue, Aug 02, 2016 at 04:28:46PM +0200, Thomas Güttler wrote:
[quote]...
What do you think you'll end up doing to solve this problem?

I am unsure. At the moment I see theses solutions:

- Wrapping the filesystem with an overlay-filesystem which logs all changed files. Use this as include-list for
rsnapshot
[/quote]
Or use ZFS, do filesystem based snapshots and transfer the incrementel
changes to a remote site (zfs send | zfs receive).
[/quote]
What do you mean with "incrementel changes" in above line?

... let me search ... I guess you mean this: https://en.wikipedia.org/wiki/ZFS#Sending_and_receiving_snapshots

{{{
ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a
stream representation of the file system's state. This stream can either describe complete contents of the file system
at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size
depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g. for
synchronizing offsite backups or high availability mirrors of a pool.
}}}

Yes, this sounds good. Sorry, I did not understand it the first time. First I thouht
I should run rsnapshot on the snapshot. This would not help, since scaning the huge
directory tree for changes would be done.

[quote][quote]- Moving the data to a storage and mount it with a tool like s3fs.
[/quote]
Or use ZFS and enjoy other benefits like transparent compression and
checksumming.

[quote]- Use a filesystem which can output all changed files.
[/quote]
Or use ZFS and...
[/quote]
Is ZFS the only filesystem which is open source and available for linux which
can do this?

[quote]I might sound like a sales drone and/or a one trick pony but all of the
stuff you said sound over engineered and bury much potential to fail
sooner or later.
[/quote]
Yes, you are right. I am more a programmer than an admin. But I know,
that if I do coding here, I go definitely the wrong direction.

Thank you Oliver,

Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 03, 2016 12:55AM
Am 02.08.2016 um 16:57 schrieb Ken Woods:
[quote]

[quote]On Aug 2, 2016, at 05:51, Christopher Barry <christopher.r.barry < at > gmail.com> wrote:

OK. Lists are for trying to help :)

Sounds like a hard problem to solve with anything other than simply
more and more time for backing up, unless you change the underlying
filesystem type (zfs sounded like a interesting idea to explore) or
replace the existing hardware to be much faster (or both).

What options have you come up with so far?
What do you think you'll end up doing to solve this problem?
[/quote]

.......And I'm curious what the data is and what creates it. That many new/changed files and only 2.x TB has to be interesting
[/quote]

We develop and maintain archive, workflow and issue systems.

Data: Scans, Mails and PDFs incomming into or leaving companies.

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 03, 2016 12:58AM
Am 02.08.2016 um 16:57 schrieb Ken Woods:
[quote]

[quote]On Aug 2, 2016, at 05:51, Christopher Barry <christopher.r.barry < at > gmail.com> wrote:

OK. Lists are for trying to help :)

Sounds like a hard problem to solve with anything other than simply
more and more time for backing up, unless you change the underlying
filesystem type (zfs sounded like a interesting idea to explore) or
replace the existing hardware to be much faster (or both).

What options have you come up with so far?
What do you think you'll end up doing to solve this problem?
[/quote]

.......And I'm curious what the data is and what creates it. That many new/changed files and only 2.x TB has to be interesting
[/quote]
.. there are only few changes per day. The trouble is to detect these changes. Rsync needs to crawl very long
to find these few changes. The current solution will work for the next months. But sooner or later
I want to switch.

--
Thomas Guettler http://www.thomas-guettler.de/

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 03, 2016 01:45AM
On 3 August 2016 at 08:49, Thomas Güttler <guettliml < at > thomas-guettler.de ([email]guettliml < at > thomas-guettler.de[/email])> wrote:
[quote]Is ZFS the only filesystem which is open source and available for linux which
can do this?

[/quote]

Before committing to ZFS, you should know that it belongs to Oracle and is licensed under the CDDL license, i.e. it is "open source" but not "free", which is why it&#39;s not officially supported in many Linux distros (e.g. Fedora). That may or may not matter to you.

poc
Reaching the limit of rsnapshot ... inotify does not help
August 03, 2016 03:14AM
On Wed, Aug 03, 2016 at 09:49:50AM +0200, Thomas Güttler wrote:
[quote]Am 02.08.2016 um 21:47 schrieb Oliver Peter:
[quote]On Tue, Aug 02, 2016 at 04:28:46PM +0200, Thomas Güttler wrote:
[quote]...
What do you think you'll end up doing to solve this problem?

I am unsure. At the moment I see theses solutions:

- Wrapping the filesystem with an overlay-filesystem which logs all changed files. Use this as include-list for
rsnapshot
[/quote]
Or use ZFS, do filesystem based snapshots and transfer the incrementel
changes to a remote site (zfs send | zfs receive).
[/quote]
What do you mean with "incrementel changes" in above line?

... let me search ... I guess you mean this: https://en.wikipedia.org/wiki/ZFS#Sending_and_receiving_snapshots

{{{
ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a
stream representation of the file system's state. This stream can either describe complete contents of the file system
at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size
depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g. for
synchronizing offsite backups or high availability mirrors of a pool.
}}}

Yes, this sounds good. Sorry, I did not understand it the first time. First I thouht
I should run rsnapshot on the snapshot. This would not help, since scaning the huge
directory tree for changes would be done.
[/quote]
What I meant in my first mail is that having for example daily.0 -
daily.14, weekly.0 to weekly.4 and monthly.0 to monthly.6 as rsnapshot
intervals will cause fragmented hardlinks all over the place which someone
already mentioned here consumes lot of RAM and access time.

My idea is to have ZFS on the backup server and only keep daily.0 and
nothing else for the backup. Before rsnapshot fetches the next backup it
creates a transparent snapshot of daily.0 and then it runs rsync. This
way you use the efficent snapshot technology of ZFS and don't waste your
time in resolving fragmented hard links and rotating daily backups.

If you don't like this idea - and this going to be offtopic now - you
could switch from rsnapshot to zfs snapshots which would mean that you
have to migrate your live server to another filesystem and/or operating
system.

[quote][quote][quote]- Moving the data to a storage and mount it with a tool like s3fs.
[/quote]
Or use ZFS and enjoy other benefits like transparent compression and
checksumming.

[quote]- Use a filesystem which can output all changed files.
[/quote]
Or use ZFS and...
[/quote]
Is ZFS the only filesystem which is open source and available for linux which
can do this?
[/quote]
There is also BTRFS which has a similar snapshot technology but I would
recommend to give ZOL a try:
http://zfsonlinux.org/

--
Oliver PETER oliver < at > gfuzz.de 0x456D688F

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Reaching the limit of rsnapshot ... inotify does not help
August 03, 2016 09:45AM
On Tuesday 02 August 2016 09:51:11 Christopher Barry wrote:
[quote]On Tue, 2 Aug 2016 09:22:45 +0200

Thomas Güttler <guettliml < at > thomas-guettler.de> wrote:
[quote]Am 01.08.2016 um 16:46 schrieb Christopher Barry:
...

[quote]This may prove useful to you:
http://www.ibm.com/developerworks/library/l-ubuntu-inotify/index.html
[/quote]
inotify does not scale like we need it. There are too many directories.

Nevertheless, thank you for trying to help.

Regards,

Thomas Güttler
[/quote]
OK. Lists are for trying to help :)

Sounds like a hard problem to solve with anything other than simply
more and more time for backing up, unless you change the underlying
filesystem type (zfs sounded like a interesting idea to explore) or
replace the existing hardware to be much faster (or both).

What options have you come up with so far?
What do you think you'll end up doing to solve this problem?
[/quote]
One thing that might buy you some time is tuning the vfs layer to spend more
RAM on caching directory/inode objects.
On the assumption that, with more cached directory/inode objects, rsync will
be able to build its list faster. And on the rsnapshot server it will also
speed up the cp/rm steps.

In order to make linux prefer buffer (directorys/inodes) to pagecache (file
contents) you can tune /proc/sys/vm/vfs_cache_pressure
Obviously, this can have severe impacts on other workloads, so you will have
to use your own judgement on wether/how far you can tune this for your specific
usecase, especially as regards the servers being backed up.

Heres the Kernel doc for that proc parameter

###
/proc/sys/vm/vfs_cache_pressure
------------------

This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative
performance impact. Reclaim code needs to take various locks to find freeable
directory and inode objects. With vfs_cache_pressure=1000, it will look for
ten times more freeable objects than there are.
###

So, assuming you have enough RAM to make it wortwhile and dont need the
pagecache for throughput reasons, you could incrementally lower that value and
see wether it does anything good for you.
Obviously best if you can do it on both ends, but might be worthwhile even
just on the rsnapshot server (esp. if its a dedicated server, just to keep
linux autotuning from reclaiming needed inode buffers for useless pagecache)

I have no experience with *BSD or solaris etc., but i would assume that they
have similar knobs you can fiddle with.

Regards,
--
Arne Hüggenberg
System Administrator
_______________________
Sports & Bytes GmbH
Rheinlanddamm 207-209
D-44137 Dortmund
Fon: +49-231-9020-6655
Fax: +49-231-9020-6989

Geschäftsführer: Thomas Treß, Carsten Cramer

Sitz und Handelsregister: Dortmund, HRB 14983
Finanzamt Dortmund - West
Steuer-Nr.: 314/5763/0104
USt-Id. Nr.: DE 208084540

------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Sorry, only registered users may post in this forum.

Click here to login