The length of downtime had nothing to do with us not having backups -- in fact we had backups right down to the last minute we were online. It had everything to do with hardware failure and response times in first diagnosing and then rectifying the problem at the DC end of the equation, and I can assure you that we are taking this up with our provider. [snip]
...you can't restore data if you have nothing to restore it to...
Disaster recovery is a thorny topic and a troublesome thing to implement. Few will spend the time or $$ to implement a fail-over system due to cost, complexity. It takes this kind of problem to motivate and show the value of a fail-over solution.
This appears a classic case where it takes a series of failures to identify the nature of the infrastructure’s (the data center) shortcomings. It sounds like the core issue is that the data center was not quick to identify or resolve their hardware problems (
) and from what you wrote, didn't have a ready solution (
). And added to that, the site’s management did not plan for or expect the data center to let them down. (
)
The good news is that the backups worked ( HURRAY
) so little or nothing was lost but time, and gave the site’s management the opportunity to see where the recovery scheme could be improved.
Bravo on the diligence and getting the site up and running in short order!