SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Bacula to the Cloud
Author Message
Post Bacula to the Cloud 
Hello, 2 year Bacula user but first-time poster. I’m currently dumping about 1.6TB to LTO2 tapes every week and I’m looking to migrate to a new storage medium.

The obvious answer, I think, is a direct-attached disk array (which I would be able to put in a remote gigabit-attached datacenter before too long). However, I’m wondering if anyone is currently doing large (or what seem to me to be large) backups to the cloud in some way? Assuming I have a gigabit connection to the Internet from my datacenter, I’m wondering how feasible it would be to either use something like Amazon S3 with s3fs (I’m guessing way too much overhead to be efficient), or a bacula-SD implementation on an EC2 node, using Elastic Block Store (EBS) as “local” disk, and VPN (Amazon VPC) between my datacenter and the SD.

Substitute your favorite cloud provider for Amazon above; I don’t use any right now so not tied to any particular provider. It just seems like Amazon has all the necessary pieces today.

To do this, and keep customers comfortable with the idea of data in the cloud, we would need to encrypt, so I’m also wondering if it would be possible for the SD to encrypt the backup volume, rather than the FD encrypt the data before sending it to SD (which is what we do now)? Easier to manage if we just handled encryption in one place for all clients.

I would love to hear what other people are either doing with Bacula and the cloud, or why you have decided not to.

Thanks

Peter Zenge
Pzenge .at. ilinc .dot. com

Post Bacula to the Cloud 
Am 02.03.2010 22:56, schrieb Peter Zenge: Hello, 2 year Bacula user but first-time poster. I’m currently dumping about 1.6TB to LTO2 tapes every week and I’m looking to migrate to a new storage medium.

The obvious answer, I think, is a direct-attached disk array (which I would be able to put in a remote gigabit-attached datacenter before too long). However, I’m wondering if anyone is currently doing large (or what seem to me to be large) backups to the cloud in some way? Assuming I have a gigabit connection to the Internet from my datacenter, I’m wondering how feasible it would be to either use something like Amazon S3 with s3fs (I’m guessing way too much overhead to be efficient), or a bacula-SD implementation on an EC2 node, using Elastic Block Store (EBS) as “local” disk, and VPN (Amazon VPC) between my datacenter and the SD.

Substitute your favorite cloud provider for Amazon above; I don’t use any right now so not tied to any particular provider. It just seems like Amazon has all the necessary pieces today.

To do this, and keep customers comfortable with the idea of data in the cloud, we would need to encrypt, so I’m also wondering if it would be possible for the SD to encrypt the backup volume, rather than the FD encrypt the data before sending it to SD (which is what we do now)? Easier to manage if we just handled encryption in one place for all clients.

I would love to hear what other people are either doing with Bacula and the cloud, or why you have decided not to.

Thanks

Peter Zenge
Pzenge .at. ilinc .dot. com


Sending unencrypted data to the SD for encryption would be OK for doing tape based backups where you do not want to lose the tapes. I would suggest not sending your unencrypted backup data to someone else and trusting the receiving system to encrypt it before someone reads it from RAM.

Depending on your needs it might be OK to do that, but AFAIK bacula does not support this mode (yet?). AFAIK you have the options of transport encryption (for the connection and data between the systems) and data encryption (for the data leaving the system, with the receiving SD not having the key to do a restore by itself).

I personally use transport and data encryption for saving data to offsite SDs in "untrusted", meaning not directly accessible, datacenters. If this takes too much CPU time for the 2x encryption you *MIGHT* want to try data encryption with transport encryption but dropping the transport encryption after authentication.. i am not sure about this though, since metadata can be read from the encrypted data and control structures are sent via this line i would also not suggest doing this.

Using data encryption with bacula, imho especially with windows, is a pain because of all the certificate management, but for me it is a requirement.

Post Bacula to the Cloud 
Following up on my own post, I had a little free time the other day and decided to investigate whether this was feasible. Setting up the necessary services on Amazon was trivial, including access control and block storage. I tried s3fs first, and it worked, but it felt like there was way too much i/o going on for that kind of data (which is pretty much what I expected). Then I tried putting my bacula-sd on an EC2 node, writing to files on EBS, and it worked great (spooling first to the “local” drive on EC2). Throughput however was somewhat less than I was hoping for, approx. 25% of what I get locally to spool and then to tape. However, I found that there was NO performance penalty for running two jobs concurrently. I didn’t try larger numbers, but my guess is you can run a large number of concurrent jobs to get a pretty good effective throughput, assuming you have lots of clients with similar data sizes.

Our problem is that 80% of our data is on one client, and it would take 130 hours to do a full backup, and our backup window simply isn’t that long. Then I thought I could break the FileSets into smaller pieces and run multiple backup jobs in parallel (and I’m assuming that my client is not the bottleneck). However, it wouldn’t run more than one job on that client concurrently. Since I can run multiple clients concurrently, I’m pretty sure my bacula-dir.conf and bacula-sd.conf settings are correct, and my bacula-fd.conf specifies “Maximum Concurrent Jobs = 20”… Any other reason why I couldn’t run say 5 parallel jobs with different filesets off the same client?

From: Peter Zenge [mailto:pzenge < at > ilinc.com]
Sent: Tuesday, March 02, 2010 2:57 PM
To: bacula-users < at > lists.sourceforge.net
Subject: [Bacula-users] Bacula to the Cloud



Hello, 2 year Bacula user but first-time poster. I’m currently dumping about 1.6TB to LTO2 tapes every week and I’m looking to migrate to a new storage medium.



The obvious answer, I think, is a direct-attached disk array (which I would be able to put in a remote gigabit-attached datacenter before too long). However, I’m wondering if anyone is currently doing large (or what seem to me to be large) backups to the cloud in some way? Assuming I have a gigabit connection to the Internet from my datacenter, I’m wondering how feasible it would be to either use something like Amazon S3 with s3fs (I’m guessing way too much overhead to be efficient), or a bacula-SD implementation on an EC2 node, using Elastic Block Store (EBS) as “local” disk, and VPN (Amazon VPC) between my datacenter and the SD.



Substitute your favorite cloud provider for Amazon above; I don’t use any right now so not tied to any particular provider. It just seems like Amazon has all the necessary pieces today.



To do this, and keep customers comfortable with the idea of data in the cloud, we would need to encrypt, so I’m also wondering if it would be possible for the SD to encrypt the backup volume, rather than the FD encrypt the data before sending it to SD (which is what we do now)? Easier to manage if we just handled encryption in one place for all clients.



I would love to hear what other people are either doing with Bacula and the cloud, or why you have decided not to.



Thanks



Peter Zenge

Pzenge .at. ilinc .dot. com

Post Bacula to the Cloud 
On 3/11/2010 4:31 PM, Peter Zenge wrote:
Following up on my own post, I had a little free time the other day and
decided to investigate whether this was feasible. Setting up the
necessary services on Amazon was trivial, including access control and
block storage. I tried s3fs first, and it worked, but it felt like there
was way too much i/o going on for that kind of data (which is pretty
much what I expected). Then I tried putting my bacula-sd on an EC2 node,
writing to files on EBS, and it worked great (spooling first to the
“local” drive on EC2). Throughput however was somewhat less than I was
hoping for, approx. 25% of what I get locally to spool and then to tape.
However, I found that there was NO performance penalty for running two
jobs concurrently. I didn’t try larger numbers, but my guess is you can
run a large number of concurrent jobs to get a pretty good effective
throughput, assuming you have lots of clients with similar data sizes.

Would you care to add the steps to the wiki? Then post the URL here please?

--
Dan Langille - http://langille.org/

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB