Include All Files; Reject Some

I had a twitter chat with @JLivens the other day where the question was "what do you back up?"  My first response was to, of course, say that I back up everything – thrice – cause I'm me. If you're curious, my critical personal data is synced on multiple computers with history using Dropbox (which I'm reconsidering based on how things have been going over there lately), then it's backed up with the free version of CrashPlan to another computer that isn't at my house, AND I can't resist the urge to throw in a Time Machine backup every once in a while.  You know what?  I haven't done one of those in a week or so.  Just a second.  My little Time Machine icon is spinning now. Ah, there I feel better.

Side note: for all my talk about tape lately, you'll notice that I don't have any tape in my setup for now.  I am about to embark on a project that may make me reconsider that as I might have an archiving need soon.  Can't keep it all on spinning disk!

Alright, back to the topic at hand.  What do I back up?  I actually do back up everything, but that is not the point I wanted to get across in this post. 

It's easy to come up with a list of directories you don't want to back up.  Your /tmp folder, your "Temporary Internet Files," your folder on your work laptop that contains the illegally downloaded movies that you should have never downloaded in the first place.  Yeah, I'm talking to you.  Pay for the media/software you consume.

But what I wanted to talk about was how to make your backup selection if you want to exclude things.  What I've found is that the human tendency is to say "just backup the Documents folder," or something like that.  And that is what I really want to talk you out of.  There is too much risk doing it this way.  You could accidentally put some important data in a directory you're not backing up.  You could create a whole other directory that contains really important data and forget to add it to the list.  The risk outweighs the benefit of excluding the other data.

If your backup software has the ability, please have it autoselect both filesystems/drives and folders/directories.  If it supports it and if you want to do so, you can also create an exclude list of the directories you definitely don't want to back up.

And that's what I came to say: backup up everything, but exclude what you don't want.  Hopefully the title makes sense now.

Continue reading

Schedule Tweets from the command-line

We at Truth in IT have several events that we need to invite people to, and twitter is one of the ways we do that.  Scheduling such tweets in advance is a great way to make sure you send the right tweet at the right time, and Twuffer.com (short for Twitter Buffer) is an easy way to make such automated tweets happen.  The only problem is that each scheduled tweet in twuffer.com takes several mouse clicks, each of which is followed by a screen refresh.

I wondered if there was an easier way.  I'm proficient in old-style Bourne shell programming in Unix/Linux (never did get very good at Perl, but I rock at Bourne Shell) and I know how to use cron, so if I could just find a way to tweet from the Linux command-line I figured I could make my own twuffer.

An Internet search for "tweet from the command line" turned up this and this article.  I got all excited, then disappointed once I realized those were using basic authentication, which was disabled in June of last year.  It was replaced by oauth authentication, allowing you to authorize an app to use your twitter account without giving them your twitter password.

A google search for "oauth twitter commandline post" turned up this post from Joe Chung's "Nothing of Value" blog called "Twitter OAuth Example."  He explains a series of separate PHP scripts that, if run and edited in the proper order, will result in you having a script called twitter.php that is actually your own properly registered and authorized twitter app that can send tweets from the command line.

While I was able to figure out Joe Chung's instructions (and I'm incredibly thankful for them and the code that comes with them), I wanted to adapt his code and instructions a little bit for those who may not be as adept at coding.  And I've also added my own code around the final tweet.php script to support scheduled tweets.

Before You Start

If you want to understand more about Oauth and how it works, you should read the original blog post.  Each major step below is also a link to the original instructions from twitter.

What You'll Need

You will need a Unix/Linux command line (or something like it), php and cron to make all of this code work.  If you don't have cron or something like it, you won't be able to send scheduled tweets, but you will still be able to send tweets from the command line.  You'll also need to have a basic understanding of the command line.  Unlike the original code from Joe, though, you won't have to edit any of the PHP scripts.

Step 0: Download my modified code

You can download all of my source files here: http://www.backupcentral.com/twitterapp.zip
Unzip them into a directory then cd into that directory.  My first six steps of my post follow the ones from the original post.   I again urge you to read the original post, as he really deserves all the credit for figuring this out.  All I did was hack his scripts to behave differently.  If you want even more information, each step is a link to the original oauth spec from twitter.com.

Step 1: Register an application with twitter

Only registered apps can send tweets via Twitter's API.  So in order to send a tweet on the command line, you need to be your own app.  (Don't worry; the code is already written.  You just need to register the code you just downloaded as your own app.)  The first step in this process is to go to twitter.com and register your app.

Here are some pointers to help you fill out the form:

  1. Whatever you put as the name of the Twitter App is what will show up when you send tweets in the "via" column.  For example, we named ours TruthinITApp, so our scheduled tweets say "via TruthinITApp" at the end.  You can name the app whatever you want, except that the name cannot have the word "twitter" in it
  2. It doesn't matter what you put in the rest of the fields, although you should probably put a valid website, and a description of what you're up to.
  3. I put Browser as my application type, but I'm not sure if that matters
  4. Specify Read & Write or Read, Write & DM access
  5. Use twitter for login

Once you have clicked Save, you will be presented with a results page.  You need to get two values from that page: Consumer Key & Consumer Secret(Record these values somewhere for later.)

Step 2: Get a request token

Now you're going to do the equivalent of a user using the app for the first time.  You will login to twitter, then try to use the app.  Twitter will ask if you authorize the app.  After you do that, it gives you another value you need.

1. Login to twitter as the user you wish to send tweets as
2. Run the following command, substituting the two values of consumer_key and consumer_secret you got in Step 1

$ php getreqtok.php consumer_key consumer_secret

This will display a URL followed by a command.  You will use those two strings in the next two steps.

Step 3: Authenticate the user and authorize the app to tweet for the user

Cut and paste the URL from the previous step into your browser.  (This is the equivalent of using the app for the first time as the user you want to tweet as.)  Once you click Authorize App, it will display a seven-digit number that will then append to the command displayed in the results of the previous command.  (Record the value for later.)

Step 4: Get the access token and secret

Now that the app has been authorized to tweet for the user, the app needs to establish a special key and secret (think username and password, but without actually giving them your password) that it will use each time it tweets on your behalf.  The command will look something like the following command, where consumer_key and consumer_secret are the values that you got when you registered your app, oauth_token and oauth_token_secret are the values the app was given when the app was authorized by the user, and authkey is the seven-digit value from the web page.

$php getacctok.php consumer_key consumer_secret oauth_token oauth_token_secret authkey

This command will display the next command that must be run, which is the actual twitter.php command, along with all the arguments you need to pass to it.  It will look something like the following, where access_token and access_token_secret are the values that the previous command got that are the unique username/password combo for this app and for this user. (Notice the access token actually starts with your twitter user ID — the number, not the name.)

$ php tweet.php "Hello World…" access_token access_token_secret consumer_key consumer_secret

Step 5: Post a tweet on the command line

Start your twitter client or monitor twitter.com for the user you're going to send the tweet as.

Run the command above, and you should see a bunch of text fly by.  As long as you don't see errors like "Invalid Token" or anything like that, your tweet should have gone through.  

You just sent your first command-line tweet!

Scheduling tweets using cron and tweet.sh

In addition to the code above that was written by Joe Chung, I wrote twitter.sh, that uses twitter.conf and twitter.txt to automate the sending of tweets using cron.  The rest of this blog post is about how to use those tools, which are also in the code you downloaded in Step 0.

Step 6: Edit tweet.conf with the appropriate keys and secrets

Put the values of consumer_key and consumer_key secret as the second and third field in the consumer_key line:

consumer_key:<consumer_key>:<consumer_key_secret>

Create a line for each user that you have authorized using the steps above and insert the appropriate values for:

username:<access_key>:<access_key_secret>

Step 7: Put a cron job that will run tweet.sh every minute for you:

* * * * * /workingdirectory/tweet.sh workingdirectory >/tmp/tweet.out 2>&1

Where workingdirectory is the directory where you installed the code.

Step 8: Edit tweet.txt and put a tweet sometime in the near future. 

The format for tweets is as follows (where "|" is the field separator):

MON DD HH:MM|username|Tweet goes here

Here's an example.  First, get the current date

$ date
Tue Jun 21 03:20:22 EDT 2011

(Yes, I'm up a little late working on this post…)

Second, add a tweet to the file for a few minutes from now

$ echo "Jun 21 03:22|testuser|Test tweet1" >>tweet.txt

Please note that I used "|" as the field separator.  This means you cannot use the "|" character in any of your tweets.  One other note: Twitter will not let you send the same tweet twice, so you will need to change your tweet phrase if you want to do more testing.

When Jun 21, 03:22 rolls around, it will send your tweet.  If tweet.php returns successfully (indicating a successful tweet), it removes it from tweet.txt and appends it to completedtweets.txt.  If there was a problem sending your tweet (such as it being a duplicate), then it leaves it in the tweet.txt file.

That's it.  All you need to do to send tweets in the future is to add them to tweet.txt and they will magically happen.  You can put blank lines, comments, or whatever other formatting you want in tweet.txt, as long as the actual tweet lines follow the format in step 8.

Please let me know if this post was helpful.  Also please post any suggestions on how to make the code better.  If I can make it work, I'll update the code and the post.

Continue reading

Other tape considerations

I've posted and talked quite a bit about tape lately.  I asked if we've put it out to pasture too soon. After participating in a Linked In thread from hell, I said that tape was a more reliable medium for long term storage.  I talked about that last post on Infosmack 102, which should be on The Register any day now.  I've also spoken about tape at my Backup Central Live! shows.  (Quick plug: We have announced the dates for Toronto, NYC, Seattle, Denver, Atlanta, Austin, Phoenix, Los Angeles, San Francisco, and Washington DC.  Click your favorite city to register!)

First let's talk about backup and recovery

Anyone who has heard me speak knows that I do not recommend using tape as the primary target for backups.  The main problem with tape and backups is that most backups are incremental backups and provide <1MB/s of performance, and modern tape drives want at least 40-50 MB/s after compression and really want much more than that.  This speed mismatch is impossible to overcome without bringing disk into the picture. I think that disk (especially deduped disk)  offers so many advantages for backup and recovery that it just makes sense to use it as your primary target for backup and recovery.  Even if you plan to build your backup system primarily out of tape (usually due to cost), you need to solve the speed mismatch problem using disk staging.  Stage to disk, then destage to tape.  You don't get the recovery benefits that disk provides, but at least you solve the shoe-shining problem.  (BTW, I read on

It also makes a lot of sense to replicate deduped backups to another device offsite, although I still believe tape is a cheaper way to accomplish the offsite requirement.  It also comes with the "air gap" feature

However,

What I do think

Continue reading

Tape more reliable than disk for long term storage

Tape is inherently a more stable magnetic medium than disk when used to store data for long periods of time.  This is simply "recording physics 101," according to Joe Jurneke of Applied Engineering Science, Inc. 

I had heard rumblings of this before, but it was Joe that finally explained it in almost plain English in a post to this thread from hell on LinkedIn.  Here's the core of his argument:

By the way, the time dependent change in magnetization of any magnetic recording is exponentially related to a term known as KuV/kt. This relates the "blocking energy" (KuV) which attempts to keep magnetization stable, driven by particle volume (V) and particle anisotropy (Ku) to the destabilizing force (kt) the temperature in degrees kelvin (t) and Boltzmans constant (k).  Modern disk systems have KuV/kt ratios of approximately 45-60. Modern production tape systems have ratios between 80 and 150. As stated earlier, it is exponentially related. The higher the ratio, the longer the magnetization is stable, and the more difficult it is to switch state…..Recording Physics 101….

I had to call him to get more information.  He explained how this came about.  Disk drives have been pushed for greater and greater densities, which caused their vendors to create a much tighter "areal density."  Tape, on the other hand, mainly got longer and fatter to accomodate more data in the same physical space.  (Yes, it increased areal density, too, but nowhere near as much as the disk drive folks did.)  The result is that the tape folks have more room to play, allowing them to use magnetic particles with a bigger particle volume (the V in the equation).  The bigger the particle volume, the more stable the magnetism is, according to the KuV/kt equation.  In addition, tapes are generally stored outside of the drive, which means their temperature is lower than disk drives.  That means they have a lower k volume (degrees kelvin), which is one of the "bad" numbers in the KuV/kt equation.  Having a higher V value and a lower t value is what translates into tape systems having ratios of 80-150, vs disk systems that have ratios of approximately 45-60. While I don't have an exact cite to point to in order to show these exact values, what he's describing makes perfect sense to me.
 

Add to this the fact that tape drives also have a lower bit error rate than disk.  SATA disk is 1:10^14, FC disk is 1:10^15, LTO is 1:10^16, and IBM 3xx0 and Oracle T10000s are 1:10^17.

Add to this the fact that tape drives always do a read after write, where disk drives do not always do this.

Sooo…

Tape drives:

  1. Write data more reliably than disk
  2. Read it after they've written it to make sure they did (where disks often don't do that)
  3. Have significantly less "bit rot" or "bit flip" than disk drives over time.

Like I said in a previous post, I think we've put these guys out to pasture a little too soon.

Continue reading