SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
When do files get re-transferred?
Author Message
Post When do files get re-transferred? 
Hi,
I have a query about file transfer while taking backups. I understand
that backuppc uses de-duplication i.e. only a single copy of the file is
stored even if multiple copies of it exist on different machines.
However, what I would like to know is when a file is transferred after
being backed up once. Is it during the next full-backup? Is it after a
certain duration has elapsed? Or is it that once a file is copied it is
never transferred again (even for full-backup) unless it is changed?

On a side note, backuppc has so far proved to be the best disk based
backup solution I have come across. I was so impressed with it that I
even configured it on dd-wrt based router
https://rahul.amaram.name/blog/2009/12/28/backuppc-lighttpd-dd-wrt .
Keep up the great work.

Looking forward to a response.

Regards,
Rahul.

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Fri, Dec 23, 2011 at 5:09 AM, Rahul Amaram <rahul < at > synovel.com> wrote:
Hi,
I have a query about file transfer while taking backups. I understand
that backuppc uses de-duplication i.e. only a single copy of the file is
stored even if multiple copies of it exist on different machines.
However, what I would like to know is when a file is transferred after
being backed up once. Is it during the next full-backup? Is it after a
certain duration has elapsed? Or is it that once a file is copied it is
never transferred again (even for full-backup) unless it is changed?

The de-dup and xfer are mostly unrelated. Only the rsync and rsyncd
xfer methods avoid subsequent transfers and they do it by comparing
against the previous full of the same host. The other xfer methods
will copy everything for full backups and only files with newer
timestamps on incrementals. On a fast local network it doesn't make
that much difference, but if bandwidth is restricted you'll probably
want to use rsync/rsyncd.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Friday 23 December 2011 17:54:40 Les Mikesell wrote:
On Fri, Dec 23, 2011 at 5:09 AM, Rahul Amaram <rahul < at > synovel.com> wrote:
Hi,
I have a query about file transfer while taking backups. I understand
that backuppc uses de-duplication i.e. only a single copy of the file is
stored even if multiple copies of it exist on different machines.
However, what I would like to know is when a file is transferred after
being backed up once. Is it during the next full-backup? Is it after a
certain duration has elapsed? Or is it that once a file is copied it is
never transferred again (even for full-backup) unless it is changed?

The de-dup and xfer are mostly unrelated. Only the rsync and rsyncd
xfer methods avoid subsequent transfers and they do it by comparing
against the previous full of the same host.

Well, actually the comparison is done against the last backup of a lower
level. And full dumps are always level 0, while for incremental you can freely
cascade any levels greater then zero.
But altough we do that on some of our backups, we haven't yet found any
logical or technical reason to actually do so. It only really matters when you
save incremental tapes...

Have fun,

Arnold

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Sat, Dec 24, 2011 at 5:35 AM, Arnold Krille <arnold < at > arnoldarts.de> wrote:
Well, actually the comparison is done against the last backup of a lower level.

Actually actually <g> from my understanding their isn't any difference
at all in BackupPC's filesystem between the two if it hasn't been
modified. In fact you can't even say that one instance of the file is
"in" the last full backup as opposed to "in" the incremental sets. The
only think that is "in" those sets is the hardlink, which even at the
underlying OS level, are identical structures, with no distinction to
one being the "master" as you would have say with symlinks.

Now in the case of a file having been modified since the last full, of
course then the two are different, and of course it only makes sense
to compare to the newest one, since in BPCs storage model there isn't
any benefit to distinguish between "incremental" vs "differential"
sets.

I seem to recall this may be an issue however with non-rsync
transports, since I never use them I don't know.


Confirmation of the above would be appreciated; I think not
understanding these issues is a source of confusion for newcomers to
BPC used to thinking in terms defined by traditional backup software
regimes.

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Fri, Dec 23, 2011 at 6:03 PM, <hansbkk < at > gmail.com> wrote:
On Sat, Dec 24, 2011 at 5:35 AM, Arnold Krille <arnold < at > arnoldarts.de> wrote:
Well, actually the comparison is done against the last backup of a lower level.

Actually actually <g> from my understanding their isn't any difference
at all in BackupPC's filesystem between the two if it hasn't been
modified. In fact you can't even say that one instance of the file is
"in" the last full backup as opposed to "in" the incremental sets. The
only think that is "in" those sets is the hardlink, which even at the
underlying OS level, are identical structures, with no distinction to
one being the "master" as you would have say with symlinks.

Now in the case of a file having been modified since the last full, of
course then the two are different, and of course it only makes sense
to compare to the newest one, since in BPCs storage model there isn't
any benefit to distinguish between "incremental" vs "differential"
sets.

I seem to recall this may be an issue however with non-rsync
transports, since I never use them I don't know.


Confirmation of the above would be appreciated; I think not
understanding these issues is a source of confusion for newcomers to
BPC used to thinking in terms defined by traditional backup software
regimes.

The distinction is between the contents of the file and the directory
entries pointing to it. The contents of hardlinked files are all the
same, but rsync doesn't know anything about the hashed filenames for
the pool links. It strictly follows the directory tree established
by the last full run (by default). The concept of incremental vs.
differential sort-of relates to the 'incremental level" setting that
permits the comparison to merge in previous incrementals back to the
last full, finding the latest version of each file . That involves a
trade-off of more server side work traversing multiple directory trees
vs. likely transferring less changed data.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Sat, Dec 24, 2011 at 8:20 AM, Les Mikesell <lesmikesell < at > gmail.com> wrote:

On Fri, Dec 23, 2011 at 6:03 PM,  <hansbkk < at > gmail.com> wrote:
it only makes sense to compare to the newest one, since in BPCs storage model there isn't any benefit to distinguish between "incremental" vs "differential" sets.

The distinction is between the contents of the file and the directory
entries pointing to it.   The contents of hardlinked files are all the
same, but rsync doesn't know anything about the hashed filenames for
the pool links.   It strictly follows the directory tree established
by the last full run (by default).   The concept of incremental vs.
differential sort-of relates to the 'incremental level" setting that
permits the comparison to merge in previous incrementals back to the
last full, finding the latest version of each file .   That involves a
trade-off of more server side work traversing multiple directory trees
vs. likely transferring less changed data.

Thanks Les. So my snip above does hold when trying to conserve
bandwidth (say over a WAN), but at the potential cost of increasing
the time the backup session requires. In a high-speed local
environment, processing time can be reduced by always using
"differential" between fulls (by not enabling the "incremental"
option).

This only becomes a question if I got it wrong Cool

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Fri, Dec 23, 2011 at 7:52 PM, <hansbkk < at > gmail.com> wrote:

The distinction is between the contents of the file and the directory
entries pointing to it.   The contents of hardlinked files are all the
same, but rsync doesn't know anything about the hashed filenames for
the pool links.   It strictly follows the directory tree established
by the last full run (by default).   The concept of incremental vs.
differential sort-of relates to the 'incremental level" setting that
permits the comparison to merge in previous incrementals back to the
last full, finding the latest version of each file .   That involves a
trade-off of more server side work traversing multiple directory trees
vs. likely transferring less changed data.

Thanks Les. So my snip above does hold when trying to conserve
bandwidth (say over a WAN), but at the potential cost of increasing
the time the backup session requires. In a high-speed local
environment, processing time can be reduced by always using
"differential" between fulls (by not enabling the "incremental"
option).

This only becomes a question if I got it wrong Cool

The more significant difference may be the wall-clock time time for a
full rsync run, which always does a full read of all the data on the
remote side for a block checksum comparison, and may need to
read/uncompress on the server side. If that isn't an issue you can
just do frequent fulls and not worry about doing rsyncs against
incremental levels. If it is an issue, or you want to use the least
bandwidth possible, then you might use incremental levels and less
frequent fulls.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Sat, Dec 24, 2011 at 9:34 AM, Les Mikesell <lesmikesell < at > gmail.com> wrote:
Thanks Les. So my snip above does hold when trying to conserve
bandwidth (say over a WAN), but at the potential cost of increasing
the time the backup session requires. In a high-speed local
environment, processing time can be reduced by always using
"differential" between fulls (by not enabling the "incremental"
option).

This only becomes a question if I got it wrong Cool

The more significant difference may be the wall-clock time time for a
full rsync run, which always does a full read of all the data on the
remote side for a block checksum comparison, and may need to
read/uncompress on the server side.   If that isn't an issue you can
just do frequent fulls and not worry about doing rsyncs against
incremental levels.   If it is an issue, or you want to use the least
bandwidth possible, then you might use incremental levels and less
frequent fulls.

Yes, in my current usage, I've only been doing fulls since figuring
out it didn't impact storage space usage. I just wanted to clarify
understanding the trade-offs between the "other flavors" for future
reference in possible other contexts.

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
Thanks for the responses. I am slightly lost in all the technical
discussion. My requirement is simple. I do backups over a WAN using
rsync over SSH. I have over 100 GB of files to be synced over the WAN.
Transferring a couple of GB of data over WAN is fine but 100 GB of data
might take a really long time.

Generally when transferring data using rsync, it compares the remote
files with the local files using some checksum algorithm and transfers
it only when they are different. From what I know, comparison using the
checksum uses only a fraction of the bandwidth required to transfer the
whole file. So while Backuppc performs a full backup, does it compare it
with the local file stored during previous full backup? Or just blindly
copy the entire file.

Also, from your response, it seems that going with more frequent
incremental backups is what you suggest. However is there any downside
to this? For instance, let us say the full-backup is about 6 months old,
and some file in it gets corrupted. Then will this be identified by the
incremental backups?

Thanks,
Rahul.

On Saturday 24 December 2011 08:18 AM, hansbkk < at > gmail.com wrote:
On Sat, Dec 24, 2011 at 9:34 AM, Les Mikesell<lesmikesell < at > gmail.com> wrote:
Thanks Les. So my snip above does hold when trying to conserve
bandwidth (say over a WAN), but at the potential cost of increasing
the time the backup session requires. In a high-speed local
environment, processing time can be reduced by always using
"differential" between fulls (by not enabling the "incremental"
option).

This only becomes a question if I got it wrong Cool
The more significant difference may be the wall-clock time time for a
full rsync run, which always does a full read of all the data on the
remote side for a block checksum comparison, and may need to
read/uncompress on the server side. If that isn't an issue you can
just do frequent fulls and not worry about doing rsyncs against
incremental levels. If it is an issue, or you want to use the least
bandwidth possible, then you might use incremental levels and less
frequent fulls.
Yes, in my current usage, I've only been doing fulls since figuring
out it didn't impact storage space usage. I just wanted to clarify
understanding the trade-offs between the "other flavors" for future
reference in possible other contexts.

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Post When do files get re-transferred? 
On Sat, Dec 24, 2011 at 8:48 PM, Rahul Amaram <rahul < at > synovel.com> wrote:

Generally when transferring data using rsync, it compares the remote
files with the local files using some checksum algorithm and transfers
it only when they are different. From what I know, comparison using the
checksum uses only a fraction of the bandwidth required to transfer the
whole file. So while Backuppc performs a full backup, does it compare it
with the local file stored during previous full backup? Or just blindly
copy the entire file.

Simple answer is that only the differences are transferred, even in full runs.

Also, from your response, it seems that going with more frequent
incremental backups is what you suggest. However is there any downside
to this? For instance, let us say the full-backup is about 6 months old,
and some file in it gets corrupted. Then will this be identified by the
incremental backups?

Actually I recommend fairly frequent fulls, with the default weekly
being reasonable for most situations. The details to consider if you
need additional tuning are that the comparison base defaults to the
last full, copying increasingly large differences with each subsequent
incremental. You can change it to merge previous incrementals for the
comparison with the $Conf{IncrLevels} setting. Fulls take much longer
than incrementals to complete because they do a read and block
checksum comparison of all the data, but rsync does not use much
bandwidth for this.

--
Les Mikesell
lesmikesell < at > gmail.com

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB