SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Unattended off-site replication
Author Message
Post Unattended off-site replication 
Hi,

I know many people have discussed how to achive an offsite archive of
backuppc pool. During a discussion last February [1], Timothy Massey [2]
and Jeffrey Kosowsky [3] summarized the options as follows:

1) Run two BackupPC servers and have both back up the hosts
directly. No replication at all: it just works.
2) Use some sort of block-based method of replicating the data
3) Scripts that understand the special structure of the pool and pc
trees and efficiently create lists of all hard links in pc directory.

I'll be replicating over a thin residential ISP connection (rules out
option #1) and I want it to be completely unattended (no option #2).
As for Option #3, I tried J. Kosowsky's script BackupPC_copyPcPool but
stopped it after 12 hours without completing.

One thing that all these methods have in common is that they scan the
entire pool filesystem. I accept that I will have to do that at
least initially. However, to send daily updates, it seems unnecessary
to re-scan the filesytem again when backuppc itself already computes the
information needed:

* the set of files added to the pool
* the set of hardlinks in __TOPDIR__/pc/$host/$backup
* the set of files expired

It strikes me that backuppc could be taught to write all this out to
one or more journal files that could be replayed on the remote system
after the new files are transferred.

Does this make sense? Has anyone investigated this approach?

Thanks,
-Steve


[1] http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20839.html
[2] http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20853.html
[3] http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20854.html

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
On Mon, Oct 24, 2011 at 10:19 PM, Steve M. Robbins <steve < at > sumost.ca> wrote:

I know many people have discussed how to achive an offsite archive of
backuppc pool.  During a discussion last February [1], Timothy Massey [2]
and Jeffrey Kosowsky [3] summarized the options as follows:

1) Run two BackupPC servers and have both back up the hosts
  directly.  No replication at all:  it just works.
2) Use some sort of block-based method of replicating the data
3) Scripts that understand the special structure of the pool and pc
  trees and efficiently create lists of all hard links in pc directory.

I'll be replicating over a thin residential ISP connection (rules out
option #1)

Unless you have several hosts that hold duplicate data, after you get
the initial fulls option #1 with rysnc transport over ssh or a vpn
with compression enabled won't be moving more data than other ways you
might attempt it.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
Steve M. Robbins wrote at about 22:19:44 -0500 on Monday, October 24, 2011:
Hi,

One thing that all these methods have in common is that they scan the
entire pool filesystem. I accept that I will have to do that at
least initially. However, to send daily updates, it seems unnecessary
to re-scan the filesytem again when backuppc itself already computes the
information needed:

* the set of files added to the pool
* the set of hardlinks in __TOPDIR__/pc/$host/$backup
* the set of files expired

It strikes me that backuppc could be taught to write all this out to
one or more journal files that could be replayed on the remote system
after the new files are transferred.

Does this make sense? Has anyone investigated this approach?

I and others have considered such approaches before. The problem is
that this would require modifying the BckupPC program itself to record
such journals. You also left out pool chain renumbering which is a
consequence of file/pool expiry.

Most of us have been reluctant to modify BackupPC other than to
diagnose/fix bugs because the program itself is both quite stable and
critical. So, the inclination has been to do things outside of
BackupPC to avoid unintended consequences that could potentially
destabilize the program. Additionally, this would in a sense 'fork'
backuppc itself unless Craig would buy into such changes.

Also, the v4.x version that Craig is working will reportedly make a
lot of these archive issues moot.

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
Option 2: run BackupPC on Solaris (Nexenta). Use snapshots and zfs send to replicate the data to another server. Very simple, very easy. And we've found BackupPC performance on ZFS much increased over ext3.

Win win, except for having to deal with solaris.


On 25/10/2011 1:49 PM, Steve M. Robbins wrote: Hi,

I know many people have discussed how to achive an offsite archive of
backuppc pool. During a discussion last February [1], Timothy Massey [2]
and Jeffrey Kosowsky [3] summarized the options as follows:

1) Run two BackupPC servers and have both back up the hosts
directly. No replication at all: it just works.
2) Use some sort of block-based method of replicating the data
3) Scripts that understand the special structure of the pool and pc
trees and efficiently create lists of all hard links in pc directory.

I'll be replicating over a thin residential ISP connection (rules out
option #1) and I want it to be completely unattended (no option #2).
As for Option #3, I tried J. Kosowsky's script BackupPC_copyPcPool but
stopped it after 12 hours without completing.

One thing that all these methods have in common is that they scan the
entire pool filesystem. I accept that I will have to do that at
least initially. However, to send daily updates, it seems unnecessary
to re-scan the filesytem again when backuppc itself already computes the
information needed:

* the set of files added to the pool
* the set of hardlinks in __TOPDIR__/pc/$host/$backup
* the set of files expired

It strikes me that backuppc could be taught to write all this out to
one or more journal files that could be replayed on the remote system
after the new files are transferred.

Does this make sense? Has anyone investigated this approach?

Thanks,
-Steve


[1] [url=http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20839.html]http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20839.html[/url]
[2] [url=http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20853.html]http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20853.html[/url]
[3] [url=http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20854.html]http://www.mail-archive.com/backuppc-users < at > lists.sourceforge.net/msg20854.html[/url]


------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev

_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net ([email]BackupPC-users < at > lists.sourceforge.net[/email])
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


--

Chris Parsons
System / Network Administrator
[/url] Petrosys Pty Ltd
Level 4 North, 191 Pulteney Street
Adelaide SA 5000 AUSTRALIA
Ph: +61 8 8227 2799 | Direct: +61 8 8418 1922 | Fax: +61 8 8227 2626
[url=http://www.petrosys.com.au/]www.petrosys.com.au


Post Unattended off-site replication 
On Tue, Oct 25, 2011 at 1:27 AM, Jeffrey J. Kosowsky
<backuppc < at > kosowsky.org> wrote:

 > It strikes me that backuppc could be taught to write all this out to
 > one or more journal files that could be replayed on the remote system
 > after the new files are transferred.
 >
 > Does this make sense?  Has anyone investigated this approach?

I and others have considered such approaches before.  The problem is
that this would require modifying the BckupPC program itself to record
such journals. You also left out pool chain renumbering which is a
consequence of file/pool expiry.

Another piece of this puzzle is that a network that isn't sufficient
for making remote backups probably isn't going to work for restores
either. I've found that a UPS truck has a remarkable amount of
bandwidth when you only need it occasionally - like shipping your
initial fulls off to a location that can only manage the subsequent
rsync runs.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
On Tue, Oct 25, 2011 at 12:54 AM, Chris Parsons <Chris.Parsons < at > petrosys.com.au ([email]Chris.Parsons < at > petrosys.com.au[/email])> wrote:
Option 2: run BackupPC on Solaris (Nexenta). Use snapshots and zfs send to replicate the data to another server. Very simple, very easy. And we've found BackupPC performance on ZFS much increased over ext3.



Well, that may not be necessary for too much longer. There's a few linux distro's that already have the zfsonlinux release candidate available which allows mounting. I've also got Review Requests at RPM Fusion waiting on a reviewer. Unfortunately,  they are not allowed in Fedora proper because they don't allow external kernel modules. 


Richard

Post Unattended off-site replication 
On Tue, Oct 25, 2011 at 8:13 AM, Richard Shaw <hobbes1069 < at > gmail.com> wrote:

Option 2: run BackupPC on Solaris (Nexenta). Use snapshots and zfs send to
replicate the data to another server. Very simple, very easy. And we've
found BackupPC performance on ZFS much increased over ext3.

Well, that may not be necessary for too much longer. There's a few linux
distro's that already have the zfsonlinux release candidate available which
allows mounting. I've also got Review Requests at RPM Fusion waiting on a
reviewer. Unfortunately,  they are not allowed in Fedora proper because they
don't allow external kernel modules.

Does anyone consider Fedora usable over the lifespan that you would
want for backup data?

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
On Tue, Oct 25, 2011 at 9:36 AM, Les Mikesell <lesmikesell < at > gmail.com> wrote:
On Tue, Oct 25, 2011 at 8:13 AM, Richard Shaw <hobbes1069 < at > gmail.com> wrote:

Option 2: run BackupPC on Solaris (Nexenta). Use snapshots and zfs send to
replicate the data to another server. Very simple, very easy. And we've
found BackupPC performance on ZFS much increased over ext3.

Well, that may not be necessary for too much longer. There's a few linux
distro's that already have the zfsonlinux release candidate available which
allows mounting. I've also got Review Requests at RPM Fusion waiting on a
reviewer. Unfortunately,  they are not allowed in Fedora proper because they
don't allow external kernel modules.

Does anyone consider Fedora usable over the lifespan that you would
want for backup data?

It depends. Many people do use Fedora as a server and some of the
releases have been quite stable. I use it because I only backup stuff
at home and my BackupPC server is also my desktop. I'm an engineer for
my day job Smile

The good thing about RPM Fusion is that it also supports enterprise
linux simliar to the Fedora ELEP repositories. In fact my source RPM
contains logic to differentiate between EL and Fedora based systems
since the guidelines are slightly different in some areas.

Richard

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
On Mon, Oct 24, 2011 at 11:21:26PM -0500, Les Mikesell wrote:
On Mon, Oct 24, 2011 at 10:19 PM, Steve M. Robbins <steve < at > sumost.ca> wrote:

I know many people have discussed how to achive an offsite archive of
backuppc pool.  During a discussion last February [1], Timothy Massey [2]
and Jeffrey Kosowsky [3] summarized the options as follows:

1) Run two BackupPC servers and have both back up the hosts
  directly.  No replication at all:  it just works.
2) Use some sort of block-based method of replicating the data
3) Scripts that understand the special structure of the pool and pc
  trees and efficiently create lists of all hard links in pc directory.

I'll be replicating over a thin residential ISP connection (rules out
option #1)

Unless you have several hosts that hold duplicate data, after you get
the initial fulls option #1 with rysnc transport over ssh or a vpn
with compression enabled won't be moving more data than other ways you
might attempt it.

At the risk of exposing my ignorance of BackupPC internals, I don't see
how this is possible. For a full back-up, isn't it true that all the
files are transferred to the backup host, then compared to the pool?

One host of mine has 1.4M files totalling 550 GB, but the last full
backup recorded 1400 new files totalling 57GB. Thus option #1 would
transfer all 550 GB, whereas my proposal would transfer a tenth of
that. No?

-Steve


------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post Unattended off-site replication 
On Tue, Oct 25, 2011 at 8:13 PM, Steve M. Robbins <steve < at > sumost.ca> wrote:

Unless you have several hosts that hold duplicate data, after you get
the initial fulls option #1 with rysnc transport over ssh or a vpn
with compression enabled won't be moving more data than other ways you
might attempt it.

At the risk of exposing my ignorance of BackupPC internals, I don't see
how this is possible.  For a full back-up, isn't it true that all the
files are transferred to the backup host, then compared to the pool?

With tar/smb xfers, that is true. With rsync/rsyncd, both fulls and
incrementals walk the tree of the previous full run for that host,
transferring only the differences - and after that any duplicates that
are found are linked to the pool. The difference between an rsync
full and incremental is that the incrementals quickly skip files where
the directory timestamp and length match where fulls do a read and
block checksum comparison of everything, but that does not use a lot
of bandwidth.

One host of mine has 1.4M files totalling 550 GB, but the last full
backup recorded 1400 new files totalling 57GB.  Thus option #1 would
transfer all 550 GB, whereas my proposal would transfer a tenth of
that.  No?

Depends on the xfer mode. The one place where doing some smart copy
of the pool might help a lot would be if you have a large number of
hosts and do an OS update or something similar that creates duplicate
new files on all of them. Even with rsync, a remote run will
transfer all the changes separately for each host even though they end
up pooled after the copy.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning < at > Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB