Pages: [1]   Go Down

Author Topic: Backups - how long?  (Read 541 times)

Jonathan Cross

  • Full Member
  • ***
  • Offline Offline
  • Posts: 104
Backups - how long?
« on: October 06, 2017, 06:57:24 AM »

Everyone realises the value of backups.  I have been reading about 12TB HD announcements, and wonder about backing up such a disc.  One company is quoting 250MB/s sustained data rates.  In practice, transferring many files could reduce this figure.  Even if only 9TB is used, a full backup at 250MB/s would take about 10 hours to move.  So what is intriguing me is how people backup now that HD capacity is getting larger and many of us are accumulating more data (images, I guess for most on this forum).  I use backup software that does 4 incremental backups, then a full one and then repeats the cycle.  Even though I have a 2TB HD at 7200rpm, and only have abut 1TB on it, an incremental backup can take an hour and a full one is an overnight job (all bar my images for this year and last year are on other hard drives).  I like to backup to 2 external HDs, but have been put off a NAS.  I have an i7 PC using USB3 as the connection method. I tried putting a 250gb memory stick on my router, but soon found out that transfer speeds were dire!

Given that I need to do regular backups, has anyone solved the time it takes, or am I missing a trick? If I go to SSD, will my processor and connection method not make best use of it?

Jonathan


Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2236
Re: Backups - how long?
« Reply #1 on: October 06, 2017, 07:20:51 AM »

USB 3.1 or TB or eSATA.  Those are your fastest options.  You can improve that by using a DAS (direct attached storage) with RAID to improve write performance, so long as the source drive/RAID is fast enough to supply it.

Also, if you're on Windows and you're not using Directory Opus then you should be.  It's faster than the OS, and conservatively about a zillion times more powerful, useful, and just downright better than any other file management app around (it will replace File Explorer for you).  Seriously, I know some Mac users who have a Windows partition just to use DirOpus.

Anyway, automation is your friend and/or multiple (split) back ups so you can power it to multiple devices (but again that depends on the source being fast enough).

Personally, I'm just using an Orico DAS RAID via USB 3.1 - it sits around the 250MB/s mark sustained and it uses negligible processor time (hex core, dodeca thread) and I either have it running automated at night or if I do it manually other than if I want to access the images, nothing else is slowed down.  I use hybrid HDDs (so normal spinning media with flash RAM that helps the speed a little bit.

It's not overly expensive, it's simple and reliable and it has redundancy.  On top of that is a USB 3.1 HDD dock so I can (and so) also make single drive backups so I'm not reliant on needing a RAID.  And, on top of that, Crashplan which just keeps updating as I add or remove stuff.  Slow, of course, but it just trickles away in the background - sometimes for weeks after I add a lot of images.

Did I mention that everyone on Windows who wants to do file management should be using Directory Opus?  I have no affiliation, other than having used the software since the Amiga days in the early 90's.
Logged
Phil Brown

Eric Myrvaagnes

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 13366
    • http://myrvaagnes.com
Re: Backups - how long?
« Reply #2 on: October 06, 2017, 10:23:54 AM »

Another vote for Directory Opus. I couldn't survive without it.
Logged
-Eric Myrvaagnes    (A sampler of my new book is on my website.)
http://myrvaagnes.com  Visit my photo website (Server is back up). New images each season. Also visit my new website: http://ericneedsakidney.org

Joe Towner

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 731
Re: Backups - how long?
« Reply #3 on: October 06, 2017, 04:05:00 PM »

So this is where backup & replication comes into play.  Backup is great for your system, where versioning & recovery is big.  Backup isn't ideal for photo archives, in that you've got TB's of data, most of which won't change much.  You want a second/third/forth copy, but the only time 'backup' is the phrase to use is when dealing with online backup services.
Logged
t: @PNWMF

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2236
Re: Backups - how long?
« Reply #4 on: October 06, 2017, 05:02:27 PM »

So this is where backup & replication comes into play.  Backup is great for your system, where versioning & recovery is big.  Backup isn't ideal for photo archives, in that you've got TB's of data, most of which won't change much.  You want a second/third/forth copy, but the only time 'backup' is the phrase to use is when dealing with online backup services.

That's very relevant.  My entire archives aren't re-written all the time, just the differences (yay again for DirOpus) and Crashplan does the same.

It is worth doing entire re-writes from time to time from a technical point of view to ensure that data on the backups is valid, but you shouldn't be copying 12TB a night (although I can easily add 250GB after a holiday of shooting, but only those increments need updating).
Logged
Phil Brown

BobShaw

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 998
    • Aspiration Images
Re: Backups - how long?
« Reply #5 on: October 06, 2017, 05:40:36 PM »

I wrote blog a long time ago on backups called Your Digital Legacy.
Look for it at the earliest ones here
http://aspirationimages.com/blog/

Essentially it follows the Chase Jarvis system.

Timemachine on a server. Mine is a Mac Mini and backs up every night at 1AM.
I have 3 rotating disks and a NAS as the backup of last resort.
The backups go back about two years.
You need that as you don't know when a file gets corrupted until you try to open it.
Logged
Website - http://AspirationImages.com
Blog - http://AspirationImages.com/blog
Landscape, Portrait, Product Photography - Sydney, Australia

FabienP

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 75
Re: Backups - how long?
« Reply #6 on: October 09, 2017, 02:41:12 PM »

That's very relevant.  My entire archives aren't re-written all the time, just the differences (yay again for DirOpus) and Crashplan does the same.

It is worth doing entire re-writes from time to time from a technical point of view to ensure that data on the backups is valid, but you shouldn't be copying 12TB a night (although I can easily add 250GB after a holiday of shooting, but only those increments need updating).

Phil,

if I understand you correctly, the way of ensuring that your backups are valid could actually lead to data corruption and do not check the validity of backups at all! Assuming that your data source is no longer clean, doing what you describe will only propagate data corruption from a corrupted data source to healthy copies. After some time, you would end up with no remaining valid data.

I think this is why Joe's advice about replication (as opposed to backup, see his message above) is so important: you only want to copy/synchonise/replicate what was willingly changed since the last replication, as expressed by the last update attribute of files. The reason to do so is not only for performance reasons, but also to ensure that you do not replicate silent data corruption which could have occured on the original files.

In order to check the quality of replicated files in copies of your data, I would recommend using an application (Hashdeep, SyncBack or FreeFileSync, there might be others) that does recursive bitwise hashing of files to audit that the files are identical on both source and target(s). This is done without overwriting the files on the target(s). This way you could detect mismatched files, and provided you have multiple copies, could identify where the corruption happened (at source or in one of the copies).

I would also verify files when migrating data to a new drive or disk array. Applications that verify file integrity on the fly as part of the copy process are not so great since you don't know if what is "verified" is the permanent copy on disk at target (what is really wanted) or the transient copy in the cache of your computer or your NAS before the file was committed to disk.

Cheers,

Fabien
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2236
Re: Backups - how long?
« Reply #7 on: October 09, 2017, 05:55:53 PM »

if I understand you correctly, the way of ensuring that your backups are valid could actually lead to data corruption and do not check the validity of backups at all! Assuming that your data source is no longer clean, doing what you describe will only propagate data corruption from a corrupted data source to healthy copies. After some time, you would end up with no remaining valid data.

Verify your master is correct, and then re-write the entire backup from time to time.  Why?  Because hard disks just sitting around do develop errors or lose data and it's better than piecemeal checking and updating (which you do in between the big re-writes).
Logged
Phil Brown

BobShaw

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 998
    • Aspiration Images
Re: Backups - how long?
« Reply #8 on: October 09, 2017, 06:35:50 PM »

In practice eventually your backup disk becomes full and you need to start a new one anyway. My 1TB ones become 2TB, then 3TB and now 4TB. The full ones eventually end up in a Drobo.

Rather than overthink it, it is better to just check it is happening and rotate the disks.
If you encrypt or modify backups then you may lose one of the best features (in Time machine at least), which is to be able to selectively drag and drop to recover a file.
Logged
Website - http://AspirationImages.com
Blog - http://AspirationImages.com/blog
Landscape, Portrait, Product Photography - Sydney, Australia

FabienP

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 75
Re: Backups - how long?
« Reply #9 on: October 10, 2017, 05:10:29 PM »

Verify your master is correct, and then re-write the entire backup from time to time.  Why?  Because hard disks just sitting around do develop errors or lose data and it's better than piecemeal checking and updating (which you do in between the big re-writes).

Out of curiosity, how do you verify your master data?

If you exclusively use DNG files, I think those can be validated against an internal verification hash, which would be close enough to an ideal solution. However, in the absence of such a solution, for native RAW files and derivative work stored in TIFF or PSD/PSB files, you would need another solution, a kind of reference repository to compare each file against a known good duplicate. Doing a check on the file system with utilities such as chkdsk (on Windows) or fsck (BSD and Unix, possibly including Macs) will only verify file metadata and not file integrity (unless using ZFS or BTRFS as a file system). There might be other solutions which I am not aware of, hence my question how you verify your master data.

I still don't see how rewriting your backups would enable you to successfully thwart the danger of file corruption on copies, since you do not mention a verification of the copy action afterwards. You might have a fresher copy of your data, but as long as it hasn't been validated against the master data, the fact that it is identical to the original is still an untested assumption.

Bit rot can happen at the time when the files are copied (defective sector on target disk, defective cable, lousy power supply unit, etc.) and might not be detectable unless the file is read again and compared against a known good duplicate. There might be disk sectors that will progressively lose their magnetic polarisation, but this can be solved by rereading the sectors (provided sector values can be read, the disk will rewrite individual bits that have a weak polarisation, so that further degradation will not continue unnoticed), so here again, there is no benefit to a rewrite done by the end user. An SSD would be another story with the proprietary internal wear leveling algorithms which might move data to other cells, but the same behaviour for preventing bit rot would be expected.

A verification that the copy is identical to the master data would however ensure that the data is there and is valid. Which is why I suggested performing an actual in place validation of the data instead of a rewrite. Such a rewrite would IMO only make sense when data needs to be migrated to a different disk, as suggested by Bob.

Cheers,

Fabien
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2236
Re: Backups - how long?
« Reply #10 on: October 10, 2017, 06:09:03 PM »

Yeah, I haven't gone into excruciating detail here :-)

On Windows, you can use the MS File Checksum Integrity Verifier Tool to compute MD5 of SHA1 crypto hashes or you can download any number of similar utilities or even purchase higher end products.  You can then use Directory Opus to do the copy which/synch and then verify the end data again.  Similar options exist for OSX/MacOS.

The purpose of the refresh/full rewrite is an opportunity to check the disk (format it, check it) then write cleanly then verify it.
Logged
Phil Brown
Pages: [1]   Go Up