Carbonite Loses The Data Of 7,500 Customers
Posted by JamesB in Business News, Technology, tags: Data BackupAs the finger pointing starts around 7,500 Carbonite customers have lost their online data stored with Carbonite. In a report published this weekend the Boston Globe reports that Carbonite has filed suit against Promise Technology and Interactive Digital Systems for breech of contract, fraud, and deceptive acts and practices.
In a lawsuit filed in Suffolk Superior Court this week, Carbonite said it suffered "substantial damage" to its business and reputation from products manufactured by Promise Technology Inc. and marketed to Carbonite by Interactive Digital Systems Inc.
Carbonite’s complaint charged Promise Technology with breach of contract, fraud, and unfair and deceptive acts and practices. The complaint charged Interactive Digital Systems with breach of warranty. It seeks unspecified damages against the two companies.
�
Carbonite, a Boston company, recently opened a data center in China however the information we could find did not specifically identify where the loss occurred or when other than issues started as early as 2007. In a response to the action Promise has some pretty harsh words:
Promise responded that the suit was without merit. "Our investigation indicates that our products were neither implemented nor managed using industry best practices," the company said in a statement
For all your data protection needs visit eSecureBackups for a Free 30 Day Trial.
I would like to make sure that your readers understand two points with regard to Carbonite’s lawsuit against Promise Technologies:
1) This event happened over a year ago. We do not say this to minimize the matter. But we do want to point out that this has not happened in a long time and is not an ongoing problem.
2) The total number of Carbonite customers who were unable to retrieve their data was 54, not 7,500.
Here is what happened: The Promise servers that we were purchasing in 2006 and 2007 use RAID technology to spread data redundantly across 15 disk drives so that if any one disk drive fails, you don’t lose any data. The RAID software that makes all this work is embedded as “firmware” in the storage servers. In this case, we believe that the firmware on the servers had bugs that caused the servers to crash. Carbonite automatically restarted all 7,500 backups and more than 99% of these were completely restored without incident. Statistically, about 2 out of every 1,000 consumer hard drives will crash every week, so 54 of these customers had their PCs crash before their re-started backups were complete. Since they weren’t completely backed up when their PCs crashed, these customers were unable to restore all of their files from Carbonite. Most of the 54 got some or most of their data back. We took full responsibility for what happened and I did my best to call each of these customers personally to apologize.
As a result of our problems with the Promise servers, we switched to a popular Dell server that uses RAID6 – an improved RAID that allows for the loss of 3 of the 15 drives simultaneously before you lose any data. This configuration is in theory 36 million times more reliable than a single disk drive — the chances of 3 out of 15 drives failing at the same time are almost nil.
So far, Promise has refused to accept responsibility for their equipment’s failures, so now we are suing them to get our money back. The Dell RAID servers have been flawless and we’re extremely happy with them.
Dave Friend, CEO
Carbonite, Inc.
First off thank you for posting to our Blog. The Blog post itself was more or less just a reposting of what information was out on the web at the time with what quotes from both parties we could find.
Now in reply to your comment I wonder a bit about some of the statements you make. I don’t doubt that the “7,500 Customers lost data” was an over sell of the story however you admit the number of customers that had data corruption or loss at your Data Centers was higher than 54 and was in fact, 7,500. Just because all but 54 customers didn’t have a system emergency that required them to initiate an offsite restore to its completion does not mean you only lost 54 customers data. From your comment I take it that you had corruption of 7,500 customer’s data and to fix that you forced all those customers into a full backup so they would send you a new and updated copy of their data. If that is true then yes you lost the data of 7,500 customers.
Next you state that you were using “Promise servers” when I doubt that was the case. Promise Technology as far as I know and as far as their website states makes Storage products such as Raid Cards and Storage systems. I would not classify, nor does the industry, a storage system as a server. I would also note that if indeed you were using Promise Technologies Enterprise products such as the VTE610 they are Raid 6. If you were using older products such as the M500 then the description of that product should have said enough, “The VTrak M500f–small in price, big on capability”. In either case I would point the finger at installation, maintenance and product selection as a more likely reason for any failures vs. faulty product. As I have no clue what product failed and what was done to it I can only make a weak guess.
Lastly I’ll lump the rest into one paragraph. I do not understand your math on how Raid 6 is better than what you have prior to switching to Dell. I would hope your storage systems were configured with Raid 5 or Raid 10 in the past as the only other Raid that could even come close to being 36 million times less reliable would be Raid 0 which has ZERO Fault Tolerance. As to multiple drive failures occurring at the same time your math is again in question as it is very possible to have multiple drives fail at or near the same time, I’ve seen it, I’ve dealt with it. This is one primary reason that first off do not load any Storage array with drives from the same batch number, to have proper maintenance and monitoring in place and to have a dedicated Hot Spare or Spares in the Array.
Again thanks very much for posting to our Blog and please feel free to follow up with further details as to what products failed, why you think it failed, what you’re doing to make sure no such failure can occur in the future other than changing hardware vendors. As with all things hardware things break and there is a reason to hire Qualified Server Engineers to monitor and maintain any system.
JamesB
Microsoft Certified Systems Administrator
MSBS, MCP, MCTS: Vista, MCTS: SharePoint, MCITP: Vista Consumer, MCITP: Vista Enterprise
Watchguard Certified Professional
James: Yes, you have it exactly right. When the Promise server crashed, all the data on that server was corrupted and unrecoverable – a mortal sin, in my opinion, for such a piece of hardware. The server crash is what caused our systems to automatically restart all 7500 backups. I think we were lucky that only 54 of our customers ended up losing any data – every hour counts in such a situation. I was very upset with this failure and called each of those 54 people personally.
With respect to the RAID servers, yes it is possible to have multiple drives fail at the nearly the same time. But to have 3 drives out of 15 fail within a few hours of each other is extremely unlikely. That is especially true if you use predictive software to replace drives before they actually fail. Statistically, it is highly unlikely that in the entire life of Carbonite we will ever lose data because of simultaneous drive failure. As we learned in this case, software or firmware bugs are really the weak links.
Dave
I find it rather disconcerting that an online backup company doesn’t understand the difference between RAID and Backup. That’s somewhat like a farmer not knowing that uncontaminated dirt is important for his livelihood.
It appears to me that not only does Carbonite not understand RAID (nor how to calculate hardware failure rates), but that they also don’t have any backups of their customers’ data at all. If a “Promise server” which is really a system they built using a Promise RAID card – therefore not in anyway a Promise server – failed and all of the data was corrupt and unrecoverable, then this is a failure of the Carbonite backup and DR processes more than it is any fault of any hardware provider. If they were using only RAID 5 as it seems, then they were asking for trouble if they didn’t have a hot spare drive in the array and a number of spare hard drives sitting ready to be replaced into the array to swap out the failed drive. RAID 6 will provide redundancy against only one more hard drive failure – it still isn’t a Backup nor DR process in any way.
Now, DaveFriend claims that there were 5 out of 15 drives fail within a few hours of each other and seems to blame Promise for this. Promise make RAID controllers, not hard drives. Blame the HDD manufacturers. And with Carbonite’s previous claims of HDD failure rates on “consumer” drives, why were they using consumer drives in the first place and not enterprise drives, and in addition to this, what are they doing so poorly that they are seeing this number of failures – probably well over 100 times higher than anyone else has seen.
Carbonite seems to be a company who does not understand the main point around RAID – RAID is not a backup strategy.
Hilton Travis
Director
Quark IT
http://www.QuarkIT.com.au/
SBSC PAL – Australia
http://www.sbscpal.com/