I remember about 10 years ago hearing a story of the LDS Church and the problem they had backing up what was then several petabytes of data. Even with the fastest tape technology available it would have taken weeks to get a backup, and by the time it was committed the amount of changed data would render the prior backup all but worthless. It was an interesting challenge, and ultimately the decision was made to simply mirror the entire data set to another giant storage array. Just maintaining a simple mirror without multiple read-only recovery points is not however a good protection strategy. Data corruption, deletions, and other integrity problems will ultimately get replicated to the other side of the mirror which would result in permanent data loss.
This problem that affected only a small subset of companies a decade ago now impacts almost every business today. Backup windows continue to shrink as companies run operations around the clock, and the size of the data keeps getting larger. Where the Mormon Church was managing millions of genealogy records today's enterprises are managing massive file growth, terabytes of email, hundreds and thousands of virtual machines, and ever-growing and demanding databases that are critical to business operations. IT departments are left with the daunting task of not only deploying and supporting the aforementioned, but also figuring out how to protect those resources. Couple all of that with the need to find meaningful business intelligence in those massive data reserves and it's pretty easy to see how we get to the following conclusions.
- Data has become too large to backup using traditional methods
- Businesses require access to their data all the time meaning restores have to be fast
- Any lost data can mean huge business impact so more backups are better than few
- Companies need to find ways to monetize the data they are storing
The first problem was one IBM identified many years ago when they launched TSM. It was clear that running full backups to tape or disk was not going to be an effective strategy going forward. Most modern backup solutions tend to agree and employ some sort of method for "incremental forever" type backups. The basic idea is to grab that large data set a single time, but what happens next is entirely dependent on the backup software. Generally block level incrementals are more effective than file level as are methods that do not have to scan filesystems in order to identify changes. Crawling massive file shares and hashing through large databases put unnecessary strain on systems, and likewise storing entire file changes vs. just the blocks that are modified will result in inefficient usage of storage assets.
The second issue is one that is surprisingly overlooked. We all know the old line, "Backups are worthless, but restores are priceless", and yet most people I speak with aren't thinking about restores. In the event of an outage how quickly access is restored will determine if the IT administrator is about to experience a RGE (resume generating event). If you are leveraging hardware based snapshots, and you are taking frequent backups with them, you can most likely recover pretty quickly regardless of the size of the data. If however you run your backups to a different device (whether tape or disk) it will take some time to push the data back to a production array.
This is an important thing to be aware of since backup vendors love to talk about how fast they backup data, but rarely tout the speeds of restores. Quite frankly I don't blame them as there are a lot of factors that go into determining how quickly a restore can occur. Even if you accept the fastest quoted restore speeds from your vendor the biggest degree of variability is going to be how quickly your existing storage can handle the ingest. To get the business operational you are going to have to send data down the pipe as fast as possible, and that will have a ripple effect on performance for other things currently running on your production system.
What about taking more frequent backups? In most cases enterprises are still "stuck" in the nightly backup job mentality. They may have found ways to reduce the time it takes to backup, but they aren't taking any steps to lower those recovery points. It is pretty rare to find a company that will find it acceptable for users to work all day long only to have them lose hours of work while a restore is done from a backup that is 12 hours old. Having a backup solution that can take frequent no impact backups throughout the course of a business day should be a requirement and not just a nice to have.
This brings us to the last issue. Managed service offerings are putting economic and operational pressures on internal IT, and this is a trend that is not going to relent. IT departments need to find ways to drive even more value out of infrastructure they have on the floor. Being able to turn a throw-away budget item like backups into a revenue generating arm of the business can shine a positive light on the out-of-sight out-of-mind internal IT shop. The biggest challenge here is typically the backup device or appliance. When you look at those devices they often look a lot like an enterprise disk array. They typically have front-end and back-end ports for connectivity, bunches of disks configured with some sort of RAID, and software that ties it all together. While these systems are good at handling backup feeds they make very poor use of the hardware because they have simply substituted tape for disk, and the backups are stored in a proprietary format that applications cannot directly access. It is imperative to drive efficiencies everywhere in the enterprise, especially since internal IT is competing with the massive scale and cost benefits of cloud, and having expensive disk arrays on the floor that can only be used as tape substitute does not qualify.

The good news is NetApp has multiple ways to address all the issues outlined above. We even have the ability to provide stand-alone, turn-key backup solutions to environments that have no NetApp storage on the floor. A backup solution from NetApp can provide customers the ability to cut backup times from hours or days to minutes, bring near instant restores to large data sets, drop recovery points to minutes, and provide native access to protected data for business initiatives like data mining and reporting. The next time you are looking for an enterprise class backup solution that can provide real value to your business just reach out to your local preferred partner or NetApp sales team.