There have been a lot of announcements in the last year or so where VMware is available on various cloud or cloud-like platforms. If you’re wondering what the difference is between these various offerings, you’re not alone.
What is VCF?
VMware Cloud Foundation (VCF) is the first thing you need to understand when considering this topic. VCF is a software stack offered by VMware that bundles vSphere (compute), vSAN (storage) and NSX (networking) into a single platform. This gives those who would like to deploy it – for themselves or by offering it as a service for customers – a complete solution to draw from. If you know a solution is based on VCF, you also know that it will cover all compute, storage, and networking needs that you may have.
What is VMC?
VMC is an unofficial designation for VMware Cloud on some platform, the first example of which was VMware Cloud on AWS. The second example of this was announced at Dell Technologies World and is called VMware Cloud on Dell EMC.
Note to the wise: These products are not called “VMC,” at least not by anyone from VMware, AWS, or Dell. The product name is “VMware Cloud on AWS” or “VMware Cloud on DellEMC.” Think about it. The real names have both brands in them; “VMC” has neither.
“VMware Cloud on x” is VMware as a service offered by VMware on the platform in question, and it is based on VCF. Customers can manage it via vSphere, without having to worry about the hardware aspect. They can provision all the VMs, storage, they like without having to worry about where the hardware will come from.
There are two important things to note in the previous paragraph. If you see “Vmware Cloud on X,” that means the service is being offered by VMware itself. Your bill and support will come from Vmware.
VMware on SDDC on IBM Cloud does not appear to use VCF, but Azure VMware Solutions is based on VCF. But the important differentiator is the service is offered by the platform vendor, not by VMware. Your bill and support will come from IBM or Microsoft, not from VMware.
Over one billions email address/password combos were recently leaked under the name “Collection #1,” suggesting there may be more. When something like this happens, I just take a deep breath and change my passwords on any sites that were affected, and I move on with my day.
Why can I do that? Because I use a password manager that notifies me of the hack and any affected sites. It will then assist me with changing the password in question, and I go back to work. (I also keep track of affected sites using Have you Been Pwned? )
Defense in Depth
Defending against cyberattacks requires a multi-faceted approach. Here’s a quick list off the top of my head. These will be very brief, because I want to focus on the last one.
Backup your data
I really don’t understand people that pay ransomware. Why don’t they just restore from backup? Oh, right. They don’t have a backup. Please backup your mobile phone and laptop data. And of course, backup your company’s servers.
Secure your physical devices
If someone gets hold of your physical device, all bets are off. Use a strong password on every device you have. If you lose a device and then get it back, do not just start using it again. You need to wipe it clean and re-install everything, because a hacker may have installed a key-logger that could steal the master password to the password manager I’m going to tell you to use in a minute.
Make sure you’re using secure sites if you’re logging in. Check anything you download for viruses. Don’t visit sketchy sites. I know you want to see that latest episode of Star Trek: Discovery and you don’t want to pay for the CBS All Access pass. But downloading a torrent is risky and may have other problems.
Watch for phishing & other social engineering attacks
Watch for those emails from companies you do business with that warn you of something and tell you that you need to login and fix it. Login manually to the real site; do not follow the link. (BTW, a password manager fixes this, too, because it won’t enter your password at the wrong site.)
Use an anti-malware product
And whatever you happen to pay for, also run some free checkers once in a while. (I run a malwarebytes free scan whenever it comes to mind).
Use multi-factor authentication whenever you can
The more important the account is, the more important it is that you use MFA. Also, one of your “important” accounts it the email address that you use everywhere. Make sure that account is protected with MFA. That way someone can’t hack it, and then use it to reset all your passwords.
Use a unique password for every single site where you login
And finally we come to the biggie. Make sure you do not reuse passwords on the sites you do business on. If any of them are hacked, you’re vulnerable everywhere that email address and password have been reused.
Doing this without a password manager is impossible if you have more that a few accounts. (I have 329 accounts in Dashlane, my password manager.)
Please use a password manager
I don’t know how anyone doesn’t use a password manager. It makes things so much more secure and so much easier. How often do you see the words secure and easier in the same sentence?
I chose Dashlane years ago for a unique combination of features that I no longer remember, but there are other password managers like 1password and Lastpass that are quite popular as well. They use one master password to give you access to all your encrypted passwords.
In the “How is Dashlane safe?” article, they have several answers.
They enforce strong passwords on your Master password, and if you lose it, you’re toast. So don’t do that. But honestly, if you’re using it regularly, you will be typing in that password many times a day, so I don’t know how you would forget it.
Your Master password is never stored on their servers. Even though they support multi-device syncing, your master password is never stored on their servers.
All data is encrypted locally w/AES-256 encryption.
They use AWS servers for added security
They continually audit their system for vulnerabilities.
No security system will ever be 100%, but I say NONSENSE to those who think that keeping passwords in your head is more secure than a password manager. How exactly is a typical user, who has dozens of online accounts, supposed to create a unique password for each account and store it in their head?
The average user is going to use the “remember password” feature of their web browser, and that’s not secure at all.
Like I said, use a password manager so that when you’re hacked, all you have to do is change one password. But please, someone leave a comment about how password managers are less secure than your brain.
If you care about your data, back it up. If you don’t care about your data enough to back it up, don’t tell me it’s your vendor’s fault when something goes awry.
This is what came to me when I read the article about the Adobe Premier Pro user that lost what he described as $250,000 worth of videos due to a bug in Adobe’s software. He said that the video cost him more than $250,000 to create, so he is suing Adobe for that amount plus additional damages. Besides the fact that I am pretty sure that Adobe – and every other software and hardware vendor – has a clause in their contract that specifies that data loss is not the responsibility, it’s just common sense. Software and hardware products make mistakes – that’s why we make backups.
Apparently, there was a bug in Adobe Premiere Pro that manifested itself when you stored your original video and Adobe’s cache directory on the same hard drive. If you cleared your cache, it would delete the original video as well.
The lawsuit alleges massive negligence on the part of Adobe during their software development and testing process. That’s a really high bar if you ask me. Even if he is able to prove that they were negligent during the development process, they would easily be able to prove that he was also negligent during his system management process.
If your job is to create video, backup the video. If your job is to create anything, backup whatever it is. I don’t care how reliable your hardware or software is, things happen. That’s why we make backups. I hate to blame the victim here, but as far as I’m concerned he is a victim of himself.
Backup anything important to you
As I’ve already said, this should go without saying. There are too many ways to easily backup your data. use a cloud-based data protection service. Use a number of open source products and a portable hard drive. I’m not personally a fan of the latter, but it’s still better than nothing. Having all your data stored on a single hard drive is simply asking for trouble.
Use a drive repair service
If you didn’t listen to the last paragraph and you find yourself with data that matters to you and no backups of that data, don’t touch the drive. Immediately look into a drive repair service. They are expensive, but they are your only choice if you didn’t backup your data. And the more you play around with the drive trying to find your data, the less likely they are to be successful. So if you are sitting there in a world of hurt with no backups, turn off your computer and start searching for a drive repair company. Some of them have flat fee services, and others are based on the number of gigabytes they recover for you. But again, if you have no backups, they are your only choice.
Go check your data
If you have computers where you store your data, and you have not yet lost any data, consider yourself lucky. My advice to you is to go check that your backup systems are in working order. If you don’t have a backup system, now is the time to get one.
But lack of backups on your part does not constitute negligence on your vendor’s part. That’s my story and I’m sticking to it.
This was the first time we decided to make our own funny music video to go along with the song. An early effort, of course. Please note the Guitar Hero guitars the band is playing, along with the rusty guitar set. (It makes an appearance in another video.)
One of the things we learned over time was that the video ALSO needs to be funny. I think we accomplished that with this one.
Fans of my books and websites may not be aware of my music parody hobby, partly because I never put them all in one place. So I recently uploaded all of them to Youtube, and am going to post them here as separate articles.
This was the very first one we did, and we didn’t have the budget to do a full video production. I wrote the lyrics, the very talented Lindsay Romney did the vocals, and her brother did the music and mixing. For the video, we used sections of Lady Gaga’s video and inter-spliced it with sections of other videos.
There is a French portion, since the original song had a French portion. It simply says “I want my files or I’m going to get fired,” in French. (Or something like that.)
I hope you enjoy this one. There are more (and better) videos to come.
It seems to me that source dedupe is the most efficient way to backup data, so why is it that very few products do it? This is what I found myself thinking about today.
Source dedupe is the way to go
This is my opinion and always has been. Ever since I first learned about Avamar in 1998 (when it was called Undoo). If you can eliminate duplicate data across your enterprise – even before its sent – why wouldn’t you want to do that? It saves bandwidth and storage. Properly done, it makes the backups faster and does not slow down restores. Its even possible to use dedupe in reverse to speed up restores.
If properly done, it also reduces the CPU load on the client. A typical incremental backup (without dedupe) and a full backup both use way more compute cycles than those that are used to generate whatever hash is being used to do the dedupe.
You save bandwidth, storage, and CPU cycles. So why don’t all products do this?
Products that have been around a while have a significant code base to maintain. Changing to source dedupe requires massive architectural changes that can’t be easily added into the mix with an existing customer. It might require a “rip and replace” from the old to the new, which isn’t what you want to do with a customer.
Update: An earlier version of the post said some things about specific products that turned out to be out of date. I’ve removed those references. My question still remains, though.
None of the source dedupe products have torn up the market. For example, if Avamar became so popular that it was displacing the vast majority of backup installations, competitors would have been forced to come up with an answer. (The same could be true of CDP products, which could also be described as a much better way to do backups and restores. Very few true CDP products have had significant success.) But the market did not create a mandate for source dedupe, and I’ve often wondered why.
Many of the source dedupe implementations had limitations that made some think that it wasn’t the way to go. The biggest one I know of is that restore speeds for larger datasets were often slower than what you would get if you used traditional disk or a target dedupe disk. It seemed that developers of source dedupe solutions had done that venerable sin of making the backup faster and better at the expense of restore speed.
Another limitation of both source and target dedupe – but ostensibly more important in source dedupe implementations – is that the typical architectures used to hold the hash index topped out at some point. The “hash index,” as it’s called, could only handle datasets of a certain size before it could no longer reliably keep up with the backup speed customers needed.
The only solution to this problem was to create another hash index, which creates a dedupe island. This reduces the effectiveness of dedupe, because apps backed up to one dedupe island will not dedupe against another dedupe island. This increases bandwidth usage and the overall cost of things, since it will store more data as well.
This is one limitation my current employer worked around by using a massively scalable no-SQL database – DynamoDB – that is available to us in AWS. Where typical dedupe products top out at a 100 TB or so, we have customers with over 10 PB of data in a single environment, all being deduped against each other. And this implementation doesn’t slow down backups or restores.
What do you think?
Did I hit the nail on the head, or is there something else I’m missing? Why didn’t the whole world go to source dedupe?
The answer is absolutely yes, and anyone who thinks you don’t need to do so should not be put in charge of your data. Also, anyone who thinks I’m saying this just because I work for a company that backs up Office365 should read this blog post from seven years ago when I basically said exactly the same thing: Cloud services need to be backed up.
I was reading a spiceworks thread on this topic and was shocked at some of the anti-backup recommendations I saw there. One person pointed to TechEd article that talks about how redundant the storage is for Office365. That has absolutely nothing to do with this topic. That’s the equivalent of saying “I have RAID, so I don’t need backups.”
I saw another post where someone explained that the recycle bin is sufficient for “oops” recovery needs, and that vendors just try to scare people with things like rogue admins to get them to buy their products. He/she went on to say nothing like that had every happened to them, so… It’s not just rogue admins, people. There are all sorts of things that can corrupt your entire datastore that can only be addressed via a good third party backup solution.
Backups aren’t included
Take a look at the feature page for Office365. You will find that backups aren’t included. The references to data protection features are more about loss prevention and things like that. They have nothing to do with recovering corrupted data.
MCSE Brian Posey points out that “the Office 365 service-level agreement addresses availability, not recoverability.” So if you or someone else messes up your Office365 data, Microsoft is under no obligation to help you.
MCSE Experts think so
Microsoft MVP Brien Posey says that “you might not have as many options for restoring your data as you might think. As such, it is critically important to understand your options for disaster recovery in an Office 365 environment.”
“Microsoft says they also perform traditional backups of Office 365 servers. However, those backups are used for internal purposes only if they experienced a catastrophic event that wiped out large volumes of customer data…”
He also points out that there is no “provision for reverting a mailbox server to an earlier point in time (such as might be necessary if a virus corrupted all the mailboxes on a server).”
You can delete your primary & secondary recycle bin
A lot of people talk about using the recycle bin to recovery accidentally deleted or corrupted folders. It is true that it can keep such items for up to 90 days, depending on your settings. However, it is also true that a well-meaning or malicious person can easily clean out both the primary and secondary recycle bin. And a malicious person would indeed do just that.
Litigation hold doesn’t protect public folders
Some say that litigation hold protects you from such things. It keeps a copy of most messages forever; however, it does not protect public folders. Someone could easily delete everything in a public folder and then empty the recycle bin, and you would no recourse if you did not have a third-party tool.
Litigation hold has no separation of powers
An important concept in many environments is the separation of powers between a person like the Exchange admin, and a backup person. That protects the organization from rogue admins doing very bad things and then covering them up by deleting the backups as well.
But litigation hold has no such protection. Office 365 administrators could (rightly or wrongly) assign themselves eDiscovery Manager rights and have full access to search and export from Exchange mailboxes, SharePoint folders, and OneDrive locations. They could even modify the Litigation Hold policies. One way to describe this is that it helps a good person to do the right thing, but it does not stop a bad or incompetent person from doing the wrong thing.
The OneDrive restore feature is all or nothing
The OneDrive restore feature is a bit puzzling. It can only restore things that are in the recycle bin, and it is all or nothing. Meaning you have to restore the entire OneDrive system to a single point in time; you cannot just restore parts of it. That has to be the most worthless restore I’ve ever heard of.
You need to backup Office365
You need to backup Exchange, OneDrive, and Sharepoint. Microsoft isn’t doing it for you, and the features that protect you against accidents do not go far enough. Look into a third-party solution, such as what my employer (Druva) provides.
Disaster recovery experts do not agree whether you should have one-and-only-one recovery time objective (RTO) and recovery point objective (RPO) for each application, or two of them. What am I talking about? Let me explain.
In case you’re not familiar with RTO & RPO, I’ll define them. RTO is the amount of time it should take to restore your data and return the application to a ready state (e.g. “This server must be up within four hours”). RPO is the amount of data you can afford to lose (e.g. “You must restore this app to within one hour of when the outage occurred”).
Please note that no one is suggesting you have one RTO/RPO for your entire site. What we’re talking about is whether or not each application should have one RTO/RPO or two. We’re also not talking about whether or not to have different values for RTO and RPO (e.g. 12-hour RPO and 4-hour RTO). Most people do that. Let me explain.
In defense of two RTOs/RPOs (for each app)
If you lose a building (e.g via a bomb blast or major fire) or a campus (e.g. via an earthquake or tsunami) it’s going to take a lot longer to get up and running than if you just have a triple-disk failure in a RAID6 array. In addition, you might have an onsite solution that gets you a nice RPO or RTO as long as the building is still intact. But when the building ceases to exist, most people are just left to their latest backup tape they sent to Iron Mountain. This is why most people feel it’s acceptable to have two RTOs/RPOs: one for onsite “disasters” and another for true, site-wide disasters.
In defense of one RTO/RPO (for each app)
It is an absolute fact that RTOs and RPOs should be based on the needs of the business unit that is using any given application. Those who feel that there can only be one RTO/RPO say that the business can either be down for a day or it can’t (24-hour RTO). It can either lose a day of data or it can’t (24-hour RPO). If they can only afford to be down for one hour (1-hour RTO), it shouldn’t matter what the cause of the outage is — they can’t afford one longer than an hour.
I’m with the first team
While I agree with the second team that the business can either afford (or not) a certain amount of downtime and/or data loss, I also understand that backup and disaster recovery solutions come with a cost. The shorter the RTO & RPO, the greater the cost. In addition, solutions that are built to survive the loss of a datacenter or campus are more expensive than those that are built to survive a simple disk or server outage. They cost more in terms of the software and hardware to make it possible — and especially in terms of the bandwidth required to satisfy an aggressive RTO or RPO. You can’t do an RPO of less than 24-36 hours with trucks; you have to do it with replication.
This is how it plays out in my head. Let’s say a given business unit says that one hour of downtime costs $1M. This is after considering all of the factors, including loss of revenue and damage to the brand, etc. So they say they decide that they can’t afford more than one hour of downtime. No problem. Now we go and design a solution to meet a 1-hour RTO. Now suppose that the solution to satisfy that one-hour RTO costs $10M. After hearing this, the IT department looks at alternatives, and it finds out that we can do a 12-hour RTO for $100K and a 6-hour RTO for $2M.
So for $10M, we are assured that we will lose only $1M in an outage. For $2M we can have a 6-hour RTO, and for $100K we can have a 12-hour RTO. That means that a severe outage would cost me $10M-11M ($10M + 1 hour of downtime at $1M), or $6M-$12M ($6M + $6M in downtime), or $100K-$12M ($100K + 12 hours of downtime).
A gambler would say that you’re looking at definitely losing (spending) $10M, $6M, or $100K and possibly losing $1M, $6M or $12M. I would probably take option two or three — probably three. I’d then put $9.9M I saved and make it work for me, and hopefully I’ll make more for the company with that $9.9M than the amount we will lose ($12M) if we have a major outage.
Now what if I told you that I could also give you an onsite 1-hour RTO for another $10K. Wouldn’t you want to spend another $10K to prevent a loss greater than $1M, knowing full well that this solution will only work if the datacenter remains intact? Of course you would.
So we’ll have a 12-hour RTO for a true disaster that takes out my datacenter, but we’ll have a 1-hour RTO as long as the outage is local and doesn’t take out the entire datacenter.
Guess what. You just agreed to have two RTOs. (All the same logic applies to RPOs, by the way.)
If everything cost the same, then I’d agree that each application should have one — and only one — RTO and RPO. However, things do not cost the same. That’s why I’m a firm believer in having two complete different sets of RTOs and RPOs. You have one that you will live up to in most situations (e.g. dead disk array) and another that you hope you never have to live up to (loss of an entire building or campus).
What do you think? Weigh in on this in the comments section.
One of the most valuable resources your company has it probably not being backed up properly – if at all. Like a lot of cloud services, the ability of salesforce customers to recover from big mistakes or a malicious attack is a bit overstated. Let’s take a look at that.
Big, bad update
Say, for example, that someone wants to change how phone numbers are stored in Salesforce. (I know this because I wanted to do this once with a large number of records.) Let’s say they are tired of the inconsistent way phone numbers are stored and want to go to a standard format. They have chosen to get rid of all parentheses and spaces, and just use dashes. (800) 555-1212 becomes 800-555-1212.
They download a CSV of all the salesforce IDs and accompanying phone numbers. They do their magic on the phone numbers and change everything to dashes. But they accidentally sort one column, completely disassociating numbers with Salesforce IDs. They then update every single one of your leads with incorrect phone numbers. Little by little, salespeople notice that some phone numbers are wrong and fix them. But it’s days before they realize that it was this update that broke everything.
This would also be a great way for a salesperson to get even with your company for not giving him the bonus he wanted. Download a bunch of records, do a quick sort on only one column, then use data loader to upload nonsense back to salesforce.
Recycle bin cannot fix updated records
The recycle bin contains deleted records, not updated records. So fixing even a few mistakenly (or maliciously) updated records is not possible with the recycle bin. It can only fix things if you accidentally delete records – as long as it’s not more records than what can fit in your recycle bin. (The number of megabytes of storage you have X 25.)
You really need to back up Salesforce
Without an external salesforce backup, you are literally one bad update away from being forced to use their “recovery service,” which may be the worst service ever. It’s so bad they don’t want you to use it. They call it a “last resort,” and tell you it’s going to take 6-8 weeks and cost $10,000. And after six weeks, all you have is a bunch of CSV files that represent your salesforce instance at a particular point in time. It will be your job to determine what needs to be uploaded, updated, replaced, etc. That process will be complicated and likely take a long time as well.
Please look into an automated way to backup you Salesforce data.
No one knows for sure whether backups are going to be included in the right to be forgotten (RTBF). Even the GDPR ICO isn’t being entirely clear about it yet. But that hasn’t stopped people from expressing very strong opinions on the subject.
I’ve now written several articles on this topic, and I’ve seen a variety of responses to my comments so far, especially comments to The Register article that Chris Mellor wrote that mentioned my articles. The ones that crack me up are the people that are absolutely sure about how GDPR works – even if the ICO isn’t. This is especially true when they’re trying to sell me something to fix a GDPR problem. Be wary of GDPR fear mongers trying to sell you something.
Just in case you’re wondering, even though I work at a data management as a service (DMaaS) vendor, I’ve got no axe to grind here. I’m aware of no products in this space that have a stronger GDPR story than Druva, so I have no reason to convince you this isn’t a problem. (We are able to help you find files that match certain parameters, and can help you delete them from backups. Like everyone else, however, we are not yet able to delete data from within a backup of a structured database. I am still unaware of any products that solve this problem.)
My only goal here is to start a conversation about this difficult topic. That I’ve clearly done. So I’m going to keep talking about it until we know better.
There are what I would call the “GDPR purists,” whose position sounds something like “what part of forgotten do you not understand?” Clearly these people feel that backups and archives are included in the RTBF, and they really don’t care how much it would cost a company to comply. Most importantly, they’re certain any companies not agreeing with them will be out of compliance and subject to huge fines.
I also get comments that are completely opposite of that, where people are certain that backups are not (and will not be) included in RTBF requests. While my opinion is that these people are closer to what is likely to happen, they are just as dangerous as those who are certain they are wrong.
It’s all conjecture
The only thing I know for certain is that anyone who says they “know” the answer to this question is definitely wrong. Consider what happened when Chris Mellor contacted the ICO for his article. It does seem at first that their response seems to favor the “definitely included” folks. They said, “Merely because it may be considered ‘technically difficult’ to comply with some of its requirements does not mean organisations can ignore their obligations.”
The ICO knows that the RTBF is going to be hard (even without the backup part of the problem), and they want you to know that you can’t just say “erasing every reference to a given person is really hard” as a defense for not doing it. Don’t even think about trying that one, they’re saying. They need everyone to know they mean business.
But I also don’t think we’re talking about technically difficult; we’re talking technically impossible. After 25 years of experience in this field, I can easily say it is technically impossible to erase data from inside a structured database inside a backup without corrupting the backup. Almost all backups are image based, not record based. Even if you could identify the blocks pertaining to a given record inside a database, deleting those blocks would corrupt the rest of the backup. You literally cannot do that.
I also want to say that the idea that you would restore every copy of the database you have, delete the record in question, then re-backup the database is simply ludicrous. And you would do that every time you got such a request? That’s just nonsense. Besides the fact that the process would be so expensive that it would be cheaper to pay the fine, there’s a huge risk element to the process. That means that it places you in possible violation with another part of the GDPR — that you must be able to safely restore personal data that was deleted or corrupted. (A mistake in the process could corrupt all backups of every database.)
There is another proposed solution of converting your backups to archives by scanning & indexing them, and then deleting the backup tapes. This process sounds interesting until you learn it doesn’t solve the problem I keep bringing up – personal data stored inside an RDBMS. So it might help, but it’s not a full solution to the problem.
My opinion is that erasing all references to a given person in your production system – while also having a process in place to make sure said person never “resurrects” from the backup system – accomplishes the goal of RTBF without placing the backup data at risk or attempting to do the impossible. If Vegas had odds on this topic, that’s where I’d place my bet. I think the ICO is going to say that as long as data is not being used to support any current business decisions, and isn’t directly accessible to production systems, it can be excluded from the RTBF process. But you need to have a process to make sure it never comes back.
No one knows for sure, though, and that’s my point. Anyone who tries to tell you they know the answer for sure either has no idea what they’re talking about, is trying to sell you something, or both.
The ICO says they’re going to make it clearer soon
The ICO could have said “backups are included. Period.” (in their comment to Chris’ article.) They didn’t. They said “The key point is that organisations should be clear with individuals as to what will happen to their data when their erasure request is fulfilled, including in respect of both production environments and backup systems. We will be providing more information on backups and the right to erasure soon.”
I for one am looking forward to that guidance.
What do you think?
I’m really curious. Anyone else want to make a guess as to how this all shakes out?
What about this? Do you think that adopting a wait-and-see approach is risky? Should you spend millions now even if we’re not sure how this is going to end up?