There are two very different types of de-duplication: source & target, and they work completely differently.
Source and target de-duplication are very different, both in how they work and how you use them. Let's take a look at them.
Target de-duplication is what's found in Intelligent Disk Targets (IDTs), most of which are virtual tape libraries (VTLs). You continue using whatever backup software floats your boat (as long as the VTL supports it), and send your backups to your de-dupe IDT/VTL and it will de-dupe them for you. This reduces the amount of disk needed to store your data, but it does not change the amount of bandwidth needed to get the backups to the backup server. The de-dupe can reduce bandwidth usage if the de-dupe IDT/VTL can then replicate the de-duped data to another IDT/VTL in another location. Now you have an on-site and an off-site copy without making an actual tape. (If you want to make an actual tape, you can make it from the onsite or offsite IDT/VTL.)
Source de-duplication requires you to use different backup software on the client(s) where you want to use it. They may be in your data center and they may be in a remote datacenter, or they can even be a laptop. This client software talks to the backup server (that is also running the de-dupe backup software) and says "hey, I've got this piece of data here with this hash. Have you seen that hash before?" (This piece of data is a piece of a file, not the whole file.) If the server has seen that piece of data before, it doesn't send the data again; it just notes that there's another copy of that block of data at that client. That way, if a file has already been backed up by the backup server before (such as the same file being stored by multiple people), then it won't transfer that file across the LAN/WAN. In addition, if a previous version of a file has been backed up before, de-dupe will notice the parts of the file it has seen (and not back them up again) and the parts of the file it hasn't seen (and back them up). This reduces both the amount of disk required to store your data AND the amount of bandwidth necessary to send the data.
Reduces bandwidth usage all over
Can protect a remote office without any hardware installed there (up to a certain amount of data)
Designed to use disk
Design incorporates automated onsite & offsite (and even really offsite) copies
Requires change of backup software
Typically slower than target de-dupe on large volumes of data (Many TB)
Some implementations very fast (100s of MB/s to 1000s of MB/s)
Does not require change in backup software
Considered a "band-aid" by some to help backup software that was designed to use disk
Requires hardware at each remote site to be protected via de-dupe
Onsite & offsite copies may be outside of knowledge of backup software
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.