Pages: [1]   Go Down

Author Topic: Backups...  (Read 7622 times)

tommm

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 78
Backups...
« on: February 29, 2012, 01:26:50 pm »

Hi,

After a bit of advice regarding image file backups.

Currently for my Mac HD I use an external hard drive (x3, one off site) partitioned with a duplicate (bootable disc image) on one partition and a time machine archive on the other partition. For image files I store the originals on one hard drive and have this mirrored to three others (one off site) with Chronosync, deletions also mirrored and deleted files deleted immediately.

Now, this means I have four exact copies of image files and it's easy to swap between them should one disc be lost,etc. What I'm starting to wonder / get nervous about is if the original disc files became corrupted somehow but I didn't notice and carried out synchronization I might end up with all discs corrupted. Should I instead:

a. Using Chronosync, Backup incrementally instead of Mirroring and archive changed files. This would mean I had previous versions of files to go back to and should prevent disaster if originals became corrupted but would increase storage requirements and make it less easy to swap between original and backup discs for use in lightroom (I think.... would need to use Chronosync to rebuild a copy of the original disc as the backup would have multiple versions - not sure how Chronosync arranges these).

b. Use Time Machine to carry out an incremental backup of the original image file disc (I'm more used to how time machine carries out an incremental backup). As above this would mean should the original disc die I'd need to use time machine to rebuild a copy rather than just plugging in one of the current duplicates.

c. Do one of the above but also on a seperate partition / disc have a duplicate synchronized by Chronosync as I currently do, this would effectively replicate what I do for my Mac HD. Downside being doubling storage requirements and extra time required.

My questions are:

1. Are my original worries justified?

2. If so, which of the above strategies sound like the best plan or is there a better one?

3. Is there any advantage / disadvantage to using Time Machine over Chronosync for incremental backups (anyone know how Chronosync arranges the data on a backup disc as opposed to a mirror (duplicate) as I currently do)??

Many thanks for any pearls of wisdom / insights into what anyone else is doing in this regard.

Tom
« Last Edit: March 01, 2012, 10:20:32 am by tommm »
Logged

tommm

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 78
Re: Backups...
« Reply #1 on: March 04, 2012, 10:08:50 am »

Anybody?
Logged

Paul Williamson

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 75
    • http://www.mustbeart.com
Re: Backups...
« Reply #2 on: March 04, 2012, 12:25:41 pm »

The traditional solution to this quandary is to introduce hierarchy. Perhaps a daily backup that's guaranteed to be current, a weekly backup that's less current, and maybe a monthly backup too. As long as you're making regular updates to a given backup, you can't be absolutely certain that backup isn't getting corrupted. The key is to make a backup and leave it alone.

Your scheme seems very robust to disk failures (because you have multiple copies) and local disasters (because you have a copy off-site). You want to add robustness to data corruption. The perfect solution is to record every change and every intermediate version, never erasing anything, on some medium that, once written, cannot be erased or changed. Unfortunately, that's not practical, so you have to compromise somewhere. Time Machine is one such compromise: it records many versions (but not every single one), but it only saves them for a finite time (once the disk fills up), and the medium (a connected hard disk) is still vulnerable to accidental or malicious corruption.

To design a practical system for your needs, you have to make some assumptions. How likely is data corruption? When data corruption happens, how long will it take you to notice? How much unrecoverable data corruption are you willing to accept in the worst case?

For instance, if you fear that it might take you a very long time to notice data corruption, you might want to make a snapshot each month and store it permanently. That way, no matter how long it takes to notice, you can always go back to a version from before the corruption occurred. At least, until the backup drives start to fail on the shelf. There are of course infinite variations to choose from.

Chronosync (by default) uses file dates to detect changes. That's great as long as the changes were made on purpose by well-behaved software. But if you're trying to protect against accidental or malicious corruption, it's not good enough. Time Machine is better in that it tries to track changes to the actual data, but it still relies on the cooperation of the software and the reliability of the hardware. If you're really worried about data corruption, don't rely on either of these tools. You have to actually read all the data off the source disk, and either copy it all or at least calculate a secure checksum of files that are purported to be unchanged.

You have to decide what extreme measures are justified for your data. There is no perfect system. Don't go crazy or broke trying to make one.
Logged

ErikKaffehr

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 11311
    • Echophoto
Re: Backups...
« Reply #3 on: March 04, 2012, 12:45:48 pm »

Hi,

You don't get a hero by making backup, you become a hero by doing a restore. Sad but true.

My strategi is threefold:

1) Time machine backup to internal discs

2) Backup t an external RAID using rsync with cron. That's the UNIX way to do things.

3) I make an offsite backup with Carbon Copy Cloner to two external disks, that should be kept at office. I have 26 km to office.

Now, does this work? I don't know. Would I loose both original files and Time Machine backups I would loose much data but probably no images. But this backup scheme doesn't protect against degradation of data.

Best regards
Erik




Hi,

After a bit of advice regarding image file backups.

Currently for my Mac HD I use an external hard drive (x3, one off site) partitioned with a duplicate (bootable disc image) on one partition and a time machine archive on the other partition. For image files I store the originals on one hard drive and have this mirrored to three others (one off site) with Chronosync, deletions also mirrored and deleted files deleted immediately.

Now, this means I have four exact copies of image files and it's easy to swap between them should one disc be lost,etc. What I'm starting to wonder / get nervous about is if the original disc files became corrupted somehow but I didn't notice and carried out synchronization I might end up with all discs corrupted. Should I instead:

a. Using Chronosync, Backup incrementally instead of Mirroring and archive changed files. This would mean I had previous versions of files to go back to and should prevent disaster if originals became corrupted but would increase storage requirements and make it less easy to swap between original and backup discs for use in lightroom (I think.... would need to use Chronosync to rebuild a copy of the original disc as the backup would have multiple versions - not sure how Chronosync arranges these).

b. Use Time Machine to carry out an incremental backup of the original image file disc (I'm more used to how time machine carries out an incremental backup). As above this would mean should the original disc die I'd need to use time machine to rebuild a copy rather than just plugging in one of the current duplicates.

c. Do one of the above but also on a seperate partition / disc have a duplicate synchronized by Chronosync as I currently do, this would effectively replicate what I do for my Mac HD. Downside being doubling storage requirements and extra time required.

My questions are:

1. Are my original worries justified?

2. If so, which of the above strategies sound like the best plan or is there a better one?

3. Is there any advantage / disadvantage to using Time Machine over Chronosync for incremental backups (anyone know how Chronosync arranges the data on a backup disc as opposed to a mirror (duplicate) as I currently do)??

Many thanks for any pearls of wisdom / insights into what anyone else is doing in this regard.

Tom
Logged
Erik Kaffehr
 

tommm

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 78
Re: Backups...
« Reply #4 on: March 05, 2012, 03:52:25 pm »

Thanks for the feedback. I guess my main concern is loosing old photos due to them being overridden should the main disk with my images on become corrupted somehow. I'm not too worried about keeping multiple versions of images as they change over time (this shouldn't happen with the vast majority anyway as they are raw images and I use Lightroom so all adjustments, etc are stored in the lightroom catalogue and its backups on each hard drive).

I'm thinking the best solution is either using time machine and when the disk becomes full archiving it and starting a new disk or doing the same with Chronosync using backup with archive until the disk becomes full. The other alternative is to use Chronosync Mirroring as I do at the moment but being very careful to do a trial sync firsrt each time and checking that it hasn't decided to erase all my images manually.

Will think some more but thanks for the feedback.

Tom
Logged

John.Murray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 886
    • Images by Murray
Re: Backups...
« Reply #5 on: March 05, 2012, 09:32:47 pm »

The advantage of image backup is rapid restoration (windows) / alternative reboot (mac).  Think of it as saving your system's state.

The advantage of a file based backup, is that it will not propagate system filesystem corruption to the target; it will simply fail on any file(s) affected.

A great strategy is to do both.  I personally keep as little data on my system drives as possible, making a restore/reboot much faster.  

In the case of Windows, the built-in Windows backup is block based, as opposed to older file based backup schemes (ie: NTBackup).  The advantage is that backups to a given disk device are incremental and very quick (MS thinking is that any failure of the backup disk drive itself, invariably invalidates all backups on the volume - as the backup is disk blocks, not files -  I agree with that).  Using Windows backup to a network share will replace the previous backup.

On my Macs, I do not use TM (which is file based) due to the overhead - relying instead on the Disk Utility to create an image backup, replacing the previous backup; I have 2 drives in rotation for this....

I schedule system backups once a week, performing a manual backup anytime I install mission critical software installs, or major O/S updates between weekly backups (not often).

On both platforms, I push file based data to external drives that are on a network share on a daily basis.
« Last Edit: March 05, 2012, 09:44:01 pm by John.Murray »
Logged

sbay

  • Full Member
  • ***
  • Offline Offline
  • Posts: 225
    • http://stephenbayphotography.com/
Re: Backups...
« Reply #6 on: March 05, 2012, 11:26:17 pm »

Thanks for the feedback. I guess my main concern is loosing old photos due to them being overridden should the main disk with my images on become corrupted somehow. I'm not too worried about keeping multiple versions of images as they change over time (this shouldn't happen with the vast majority anyway as they are raw images and I use Lightroom so all adjustments, etc are stored in the lightroom catalogue and its backups on each hard drive).

I keep 3 copies of my files on separate hard disks with at least one offsite at all times (using rsync to backup the working disks). These get swapped and rotated as needed. However, in this setup it is possible that the main file becomes corrupted which then works it way to the two other copies. To prevent this / minimize risk I do two other things: (1) keep a hash of my raw files and check them every once in while and (2) maintain a write once copy of my raws. Every year (or so) I will take a dump of my raw files and write it to a hard-disk once and put the disk in storage and never reuse it. For this purpose, I usually use my older hard disks which are too small to be convenient for daily use. My image library is only 1.5tb (750gb of raw) so this is feasible to do at low / insignificant cost (what else am I going to do with all my older drives?)

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2848
Re: Backups...
« Reply #7 on: March 06, 2012, 02:34:50 am »

The advantage of image backup is rapid restoration (windows) / alternative reboot (mac).  Think of it as saving your system's state.

John - I've never tried it, but in theory you can create a Windows VHD and boot to it.  Any experience or comments?
Logged
Phil Brown

John.Murray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 886
    • Images by Murray
Re: Backups...
« Reply #8 on: March 06, 2012, 10:47:48 am »

Phil:  Yes you can, running a hypervisor (Windows Server Hyper-V - free).  Desktop version of Windows currently only support 32-bit VM's, Windows 8 will support 64-Bit, but (to my knowledge) will still require a full host O/S to load before launching any instance of a VM....

Here's an interesting tool:
http://archive.msdn.microsoft.com/vhdtool

It allows you to convert a raw .img to a vhd
« Last Edit: March 06, 2012, 11:46:54 am by John.Murray »
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2848
Re: Backups...
« Reply #9 on: March 06, 2012, 06:37:17 pm »

Cool - thanks, John.
Logged
Phil Brown

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Backups...
« Reply #10 on: March 27, 2012, 06:24:27 pm »

Short answer for question 1 is, yes.

Insanely long answer:

Do a search for silent data corruption. There are lots of articles on this and is the reason why more modern file systems being worked on are integrating chunk based checksums, RAID 1, and background scrubbing. It's know that SDC can corrupt any data on the disk: file system data, metadata, or actual data.

File system corruption can exhibit itself in all sorts of ways depending on the nature of the corruption. Files, directories or an entire disk could be inaccessible. But it's possible the file system will point to bogus allocation blocks - so when you access the file, the file itself appears corrupt even though its data blocks are OK on-disk.

Metadata and data corruption can likewise manifest in variable ways.

JHFS+/X does not use any checksumming for data, metadata or the journal. It depends entirely on the drive mechanism's error detection and correction. If the disk doesn't detect an error, or thinks it has corrected correctly but hasn't, the file system accepts the bogus data as completely valid. It has no way of verifying it's correct.

Once corruption sneaks into a primary system, with a cascading backup system without archiving like you describe, it's just a matter of time before all recycled backups contain the corruption on a primary system. Mind you this likely affects a small number of files. That's why it's insidious. It's not like a disk failure where you have a clear, cut, and dried indicator you need to fallback on backups.

Consider that 3-4 corrupt files that aren't ever accessed, get migrated to the next primary storage system, not just your backups. Over several years you have several more files that become corrupted silently and you'll copy all files including prior corrupted files.

Here's an example of using a DNG 1.2 spec checksum feature. It cannot fix corruption, but can most likely detect it, by using MD5 hashes at the time the DNG is created.

http://dpbestflow.org/data-validation/dng-validation

ZFS and Btrfs file systems use checksumming to identify if a file is corrupt/altered (even an embedded virus would cause a distinction to be made), and uses the chunk based RAID 1 feature to retrieve a good copy and replace the bad copy automatically.

It's a huge reason why I'm not a fan of RAID 0 with conventional file systems. It's asking for trouble. Yes you get speed, but RAID 0 systems should be strictly for speed, not longer term storage. I'd consider them useful for active working files and scratch space. Therefore they don't need to be particularly large, except for video people.

A ZFS (or eventually Btrfs once stable) based NAS is affordable, such a FreeNAS or TrueNAS product. It can do periodic snapshots, akin to rolling Time Machine backups where you can go back in time to retrieve earlier versions, and also have those snapshots remotely rsync'd to a 2nd NAS off-site.

A smaller scale solution might be one of the products from Ten's Complement, which is a ZFS product for Mac OS X using DAS (direct attached storage).

Really the problem is that short of per checksummed files, you don't have a practical way of detecting corruption, and you need detection before you can correct it (by replacing with a known good copy that isn't corrupt).

I have done very rudimentary, intentional *deletion* of a byte from a DNG, and Camera Raw will refuse to open the image. If I change (corrupt) a single byte, I get a "file is damaged" message, but it still can be opened and edited. I haven't tried this in Lightroom but I suspect similar results.

Last, I would only buy Advanced Format (so called AF) disks from now on. The larger physical sector size of 4096 bytes (compared to conventional disks for the past 20+ years of 512 bytes) allows for a more efficient and effective error detection and correction scheme to be used. Most new disks should be AF disks by now. (There are two kinds, 512e and 4Kn. The 512e are common and work just like conventional hard drives. The 4Kn ones are only just about to start shipping, and I'm not sure Mac OS X supports them yet or not.)

This probably raises more questions than it provides answers. But you have a couple of ways of at least attempting to keep some handle on detecting errors in DNG by using the optional MD5 hash checksumming.
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Backups...
« Reply #11 on: March 28, 2012, 04:17:35 am »

Hi

I'm using win7, so not all things are probably equal.

I use multiple (4x) external disks (sust sata disks with a seperate docking station), mostly seperate file backups.  At least one disk is always off site.

I do use hash checking to verify if the files are changed (and yes, it happened on my main disk and I was able to get correct ones from one of my backup disks).  I use hashdeep for md5 checking, but it has it limitations.

I also use ln.exe for "delorean backups", which gives me snapshots done with "hard links".  Very nice and fast, but hashdeep does check all the files on every snapshot --> very long time, but most files are verified 10-20 times ;-).

If also try to set a disk aside to get a frozen status form a certain date (mostly a backup disk that's not big enough for the regular rolling backup system anymore ) and I do a hash check at least every year.  This is very usefull because disks can repair small degradation and do this even when reading from the disk.  As far as I understand the drive will reallocate or rewrite a sector when it has to use it's error correcting functions (to hard).

BTW. I do make win7 images from my system disk, but that the absolute minimum one can do.  For me that just to recover from a defect SSD (disk) not long time data backup.   
Logged
Pages: [1]   Go Up