Pages: 1 2 3 [4] 5   Go Down

Author Topic: Now we're talkin STORAGE  (Read 26796 times)

jani

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1624
    • Øyet
Now we're talkin STORAGE
« Reply #60 on: January 13, 2007, 01:32:42 pm »

Quote
I backed up to DVD for about six months then lost interest.  There just got to be too many of the darn things.  And as much of a PITA as backing up to DVD is, I don't even want to think about having to restore half a TB or more from DVD.  Yikes!
Yes, that is a bummer, isn't it?

Considering that most people don't want a backup, they want a restore when their data is gone, that may be pretty significant.

Quote
And I still don't understand the appeal of a mirroring RAID system (e.g., RAID10) over, say, RAID5 plus redundant on- and off-site HDD backups.  Yes it provides more protection against some failure in the RAID itself, but not against all the other stuff that can bite you... OS burps, viruses, surges, operator headspace errors, etc.  Anything that writes a bad file to, or deletes a good file from, one side of the mirror does the same to the other side, does it not?  What good is that?  I'm sure I'm missing something here.
That is correct, but what good is RAID 5 or any kind of redundancy in the same system?

Sure, in-system redundancy only takes care of problems with the physical drives themselves, not with software or users. PEBCAK can't be solved with hardware.

However, OS "burps" that destroy already stored data are extremely rare, even with Windows, at least as long as you use NTFS as your file system instead of FAT. Malware that destroys data is perhaps a bit more frequent.

But you can be certain that one of your hard disks will fail sometime in the future, you just don't know when. Using several hard drives increases the risk significantly, of course.

Surges? You don't have a surge protector? That's irresponsible, and it's such a cheap measure, too.

You might also want to consider a UPS to protect against brown-outs and brief black-outs.

Quote
My live data is on a 1.1TB RAID5.  It backs up, automatically and nightly, to external firewire drives.  Those get backed up, manually and weekly or so, to another external drive that otherwise stays at my next door neighbor's.

Hmmm... I probably shouldn't be backing up from my backups, should I.  Maybe I'll change that.
As long as you verify that the data is the same in both backups (use e.g. MD5 checksums of each file, stored to a list, and an MD5 checksum of that file, too), there is nothing inherently wrong with that.

There is also a benefit with that kind of setup; you can make sure your main system remains available and unencumbered while making the second backup.
Logged
Jan

narikin

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1371
Now we're talkin STORAGE
« Reply #61 on: January 13, 2007, 02:26:31 pm »

Quote
If your RAID card fries, so does your data (most likely and unless you want to spend thousands on data recovery service which might or might not recover anything). So you still need backups even if you're running a redundant RAID setup
[a href=\"index.php?act=findpost&pid=95418\"][{POST_SNAPBACK}][/a]
no it doesn't.
a raid card can go down, and you can just put in a new (same model) raid card and its all back as it was.

hope this clears that up.

(+another reason to use 3rd party raid card, rather than onboard raid)
« Last Edit: January 13, 2007, 02:26:51 pm by narikin »
Logged

feppe

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2906
  • Oh this shows up in here!
    • Harri Jahkola Photography
Now we're talkin STORAGE
« Reply #62 on: January 13, 2007, 02:44:12 pm »

Quote
You might also want to consider a UPS to protect against brown-outs and brief black-outs.
As long as you verify that the data is the same in both backups (use e.g. MD5 checksums of each file, stored to a list, and an MD5 checksum of that file, too), there is nothing inherently wrong with that.
[a href=\"index.php?act=findpost&pid=95505\"][{POST_SNAPBACK}][/a]

What's the checksum of a checksum for? I don't know much about the MD5 algorithm itself, but doesn't it include data in itself to verify its own integrity? This would yield a checksum of a checksum useless.

Nill Toulme

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 738
    • http://www.toulmephoto.com
Now we're talkin STORAGE
« Reply #63 on: January 13, 2007, 04:34:04 pm »

Quote
That is correct, but what good is RAID 5 or any kind of redundancy in the same system?
Reliability.  In the nine months I've been running that RAID I've had a drive fail twice.  Now maybe that's just exceptionally bad luck, but it happened.  But when it happened, I lost absolutely nothing other than the small amount of time it took to yank out one drive and plug in another.  Then the RAID rebuilt itself.  No downtime, no restore, no "Yikes, is my backup good?" — just roll on.


Quote
However, OS "burps" that destroy already stored data are extremely rare, even with Windows, at least as long as you use NTFS as your file system instead of FAT. Malware that destroys data is perhaps a bit more frequent.

...Surges? You don't have a surge protector? That's irresponsible, and it's such a cheap measure, too.

You might also want to consider a UPS to protect against brown-outs and brief black-outs.
I have those things, of course.  User stupidity is my greatest exposure, and as I am the user, that exposure is exceptionally high in my case.  So I need systems that protect against that exposure as well as all the others.  ;-)


Quote
As long as you verify that the data is the same in both backups (use e.g. MD5 checksums of each file, stored to a list, and an MD5 checksum of that file, too), there is nothing inherently wrong with that.
That makes sense.  I am using checksums on my compares, but not against a stored list.  I'll have to figure out how to do that.

Thanks,

Nill
~~
www.toulme.net
Logged

jani

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1624
    • Øyet
Now we're talkin STORAGE
« Reply #64 on: January 13, 2007, 04:45:18 pm »

Quote
What's the checksum of a checksum for? I don't know much about the MD5 algorithm itself, but doesn't it include data in itself to verify its own integrity? This would yield a checksum of a checksum useless.
No, it's not a checksum of a checksum.

It's a checksum of the file containing all the other checksums.

That way, you can check -- with a reasonable degree of reliability -- whether your list of checksums is correct or corrupt.

Of course, that doesn't help you much if you find that the list is corrupt, but if you have another copy of the list, ... Storage is cheap for this kind of things, so having multiple copies of the checksum list is easy and prudent practice.
Logged
Jan

jani

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1624
    • Øyet
Now we're talkin STORAGE
« Reply #65 on: January 13, 2007, 04:51:56 pm »

Quote
Reliability.  In the nine months I've been running that RAID I've had a drive fail twice.  Now maybe that's just exceptionally bad luck, but it happened.  But when it happened, I lost absolutely nothing other than the small amount of time it took to yank out one drive and plug in another.  Then the RAID rebuilt itself.  No downtime, no restore, no "Yikes, is my backup good?" — just roll on.
In that case I misunderstood you, I thought you objected to this, because this is the point of a stripe of mirrors (RAID 10), too, only that you'll have better write and read performance, as well as better protection against failures (up to fully half the number of disks may fail without data loss).

The downside is that RAID 10 doesn't scale very well, whereas e.g. RAID 6 scales pretty well.
Logged
Jan

Nill Toulme

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 738
    • http://www.toulmephoto.com
Now we're talkin STORAGE
« Reply #66 on: January 13, 2007, 04:59:28 pm »

Quote
In that case I misunderstood you, I thought you objected to this, because this is the point of a stripe of mirrors (RAID 10), too, only that you'll have better write and read performance, as well as better protection against failures (up to fully half the number of disks may fail without data loss).
[{POST_SNAPBACK}][/a]
Well I suppose I'm thinking also from the standpoint of cost/benefit at the very bottom end of the scale (where I dwell).  Of course if cost is no object then RAID10 or even RAID6 affords additional reliability.  It just seems to me that RAID5, at the cost of only one additional drive's capacity, provides a very significant reliability enhancement.  Beyond that, I'd personally rather spend my few available dollars on redundant backups that protect me better against my biggest threat — my own screw-ups.  ;-)

Nill
~~
[a href=\"http://www.toulme.net]www.toulme.net[/url]
Logged

feppe

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2906
  • Oh this shows up in here!
    • Harri Jahkola Photography
Now we're talkin STORAGE
« Reply #67 on: January 13, 2007, 05:27:32 pm »

Quote
No, it's not a checksum of a checksum.

It's a checksum of the file containing all the other checksums.

That way, you can check -- with a reasonable degree of reliability -- whether your list of checksums is correct or corrupt.

Of course, that doesn't help you much if you find that the list is corrupt, but if you have another copy of the list, ... Storage is cheap for this kind of things, so having multiple copies of the checksum list is easy and prudent practice.
[a href=\"index.php?act=findpost&pid=95555\"][{POST_SNAPBACK}][/a]

Well, that's splitting hairs.

But I'm still under the impression that an MD5 checksum already includes integrity verification in itself. ie. if the MD5 checksum file is corrupted, it won't work nor give false results with the actual backed up data. Therefore getting checksums of checksums isn't necessary or even useful - just having multiple copies is enough. Correct me if I'm wrong.

Ray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 10365
Now we're talkin STORAGE
« Reply #68 on: January 13, 2007, 06:29:18 pm »

Quote
Since you were so fond of quoting only a small part of a pretty long Wikipedia article, only to support your personal view, here's another bit of a Wikipedia article for you:
[a href=\"index.php?act=findpost&pid=95497\"][{POST_SNAPBACK}][/a]

It's not a personal view, Jani. More a personal summary of the thrust of that documentary, which reminds me in some ways of our current predicament regarding the reliability of recorded optical media.

We don't really know how reliable they are, statistically. We have the occasional report of discoloration on the surface of the disc and unplayability associated with 'bit rot'. We probably have many, many more instances of software and hardware incompatibility resulting from either the disc not being recorded properly in the first instance, or not being playable on all systems, perhaps due to someone not ticking the 'full compatibility' box, or perhaps not. That's an unfornate condition of the state of computers that we have to live with.

What I'm suggesting here is, if it were possible to collect the facts on all the DVD discs that have ever been produced, remove all the scams from the equation where known rejects have been sold, and remove all the instances where people have unwitingly applied adhesive labels to their discs which have chemically attacked the disc etc, we might find that only, say 0.000001% of those billions of discs have suffered from physical deterioration that makes them now unreadable, ie. one in a million. Those could be considered pretty good odds against things going wrong. But we simply don't know what the actual figure is. In the absense of the facts, people's fears often seem to take over.

From my personal experience, I'm very happy with the readability and longevity of all my optical discs. However, there have been many occasions, more frequently in the past that at present, when I have not been happy with the burn success rate. I have a stack of failed recordings about 9" high. I've kept the discs because I thought I might find a decorative use for them   .
Logged

larryg

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 475
    • Larry gaskill photography
Now we're talkin STORAGE
« Reply #69 on: January 13, 2007, 07:00:03 pm »

Quote
RAID 0 isn't really RAID.

RAID = Redundant Array of Inexpensive Disks

There is no redundancy in RAID 0.
[a href=\"index.php?act=findpost&pid=95466\"][{POST_SNAPBACK}][/a]


Exactly, this is why I also beckup to external maxtors and the DVD's

Raid 0 (as best I can surmise) is mostly good when one hard drive crashes.  You then have all the data on another mirror drive.  This (at least in my situation) probably will not help if your system crashes.
Logged

kaelaria

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2223
    • http://www.bgpictures.com
Now we're talkin STORAGE
« Reply #70 on: January 13, 2007, 07:06:46 pm »

OMG you guys are too much!

raid 0 is for performance ONLY!  If one drive crashes, you lose everything.  You do have ~40% better transfer rates though - it makes a nice difference in some applications!

Some of you guys are great free entertainment!  
Logged

jani

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1624
    • Øyet
Now we're talkin STORAGE
« Reply #71 on: January 13, 2007, 07:25:15 pm »

Quote
Well, that's splitting hairs.
Eh, no, it isn't.

There's a pretty big difference between doing a checksum of each checksum, and doing a checksum of a huge list of checksums.

Quote
But I'm still under the impression that an MD5 checksum already includes integrity verification in itself. ie. if the MD5 checksum file is corrupted, it won't work nor give false results with the actual backed up data. Therefore getting checksums of checksums isn't necessary or even useful - just having multiple copies is enough. Correct me if I'm wrong.
You're wrong; the checksum isn't self-correcting.

It's only a checksum of other data.

My point is that the list of checksums -- which for 500 GB of data can very well mean a pretty large file -- is also exposed to the same risk of bit rot per byte as any other kind of data.

Bit rot may change the printed hash of d41d8cd98f00b204e9800998ecf8427e to d41d8cd98f00b204e98%0998ecf8427e (obviously wrong) or d41d8cd98f00b204e9810998ecf8427e (possibly correct for some file).

Now if you have a checksum that's wrong, how can you know if it's the checksum or the file that's wrong?

Knowledge of the MD5 hash algorithm may rescue you, as may software coded by someone with knowledge of the algorithm.

I know that if I change _one_ bit in the middle of a JPEG, it will probably still display seemingly identically to the original, but that the checksum will be very different:

91f4fa04f6a82e7ec87e664eef1e3165  9366_Kurt_Maflin.jpg
823375454caf2a0a3670a858c8cfc87e  9366_Kurt_Maflin2.jpg

However, checking this for a large file of checksums may be impractical, and it's really cheap to make a checksum of the file itself, and just store that, too.

There are several ways of doing it; I kindof like storing the md5sum of the checksum file in the file name of the checksum file (example is based on the bash shell in a unix or cygwin environment):

1) Create a list of checksums of JPEGs, raw files and Photoshop files in the file md5$sums:
$ md5sum *.jpg *.CR2 *.psd > md5sums
2) Generate a checksum of the file md5sums:
$ sum=$(md5sum md5sums|cut -f1 -d' ')
3) Rename the file:
$ mv md5sum $sum
4) Check the result:
$ ls -l md5*
-rw-r--r--  1 jani jani 168 Jan 14 01:14 md5sums.8d11a346ebd32e93a5a6adcebd55dee6

Now if I do a checksum of the file, and I get something else than what follows the dot, I know I have a problem.
Logged
Jan

jani

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1624
    • Øyet
Now we're talkin STORAGE
« Reply #72 on: January 13, 2007, 07:29:03 pm »

Quote
Exactly, this is why I also beckup to external maxtors and the DVD's

Raid 0 (as best I can surmise) is mostly good when one hard drive crashes.  You then have all the data on another mirror drive.
No, that's exactly what you haven't got.

If you run RAID 0, and one hard drive crashes, you've lost, because with RAID 0, the data is striped across several drives, with -- I repeat myself here -- no redundancy.

So if you have a RAID 0 of three drives, and one drive crashes, you suddenly have 0 drives worth of data.

However, if you run RAID 1 -- which is a real RAID solution -- you will get the behaviour you describe; one disk may crash, but the mirror drive is (probably) okay.


Nobody should be calling "RAID 0" for RAID at all, as it's just confusing you and many others who don't really know what's going on behind the scenes. But it's too late for that.
Logged
Jan

jani

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1624
    • Øyet
Now we're talkin STORAGE
« Reply #73 on: January 13, 2007, 07:32:34 pm »

Quote
What I'm suggesting here is, if it were possible to collect the facts on all the DVD discs that have ever been produced, remove all the scams from the equation where known rejects have been sold, and remove all the instances where people have unwitingly applied adhesive labels to their discs which have chemically attacked the disc etc, we might find that only, say 0.000001% of those billions of discs have suffered from physical deterioration that makes them now unreadable, ie. one in a million.
While it's unlikely that the figures would be this good -- one in ten thousand or one in a thousand seems more likely -- I agree with the principle.

Quote
I have a stack of failed recordings about 9" high. I've kept the discs because I thought I might find a decorative use for them   .
You could, I suppose, run them carefully (hehe) through your microwave, the patterns will be different for each disk, and they might actually be cool to look at.

Caveat lector: I cannot take responsibility for what may happen with you, your microwave, kitchen, furniture, apartment and/or house if you actually try this.
Logged
Jan

Ray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 10365
Now we're talkin STORAGE
« Reply #74 on: January 13, 2007, 07:39:11 pm »

I see that I've been exaggerating when describing my first Kodak Photo CD disc as being 14 to 15 years old. I've just pulled out the first disc I ever had burned, still in its original case, and notice a date of 28th January 1995 on the cover, just a few days short of 12 years. It seems I've been using computers for not much more than 11 years.

I popped the disc in my Pioneer DVD drive and opened the first image on the disc. On my first computer, it took 2 minutes to open the full res file in 8 bit mode(18mb). Today it took just 10 seconds to read the file and display on monitor in 16 bit mode. Below is the image just as Kodak scanned it, unmodified apart from jpeg compression. This is the very first Kodachrome slide I had transferred to CD-ROM.

Image shot 42 years ago (Nepal); transferred to CD 12 years ago; displayed on LL today.

[attachment=1535:attachment]
Logged

Nill Toulme

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 738
    • http://www.toulmephoto.com
Now we're talkin STORAGE
« Reply #75 on: January 13, 2007, 08:00:07 pm »

Nice image — glad it still loads.  ;-)

Nill
~~
www.toulme.net
Logged

Ray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 10365
Now we're talkin STORAGE
« Reply #76 on: January 13, 2007, 08:59:47 pm »

Quote
Nice image — glad it still loads.  ;-)

Nill
~~
www.toulme.net
[a href=\"index.php?act=findpost&pid=95605\"][{POST_SNAPBACK}][/a]

Thanks! I've got a suspicion my photographic skills haven't improved much in the past 42 years   .

If this image had refused to load, and refused to load on every other DVD drive I have, it wouldn't be a major concern, because I've rescanned this image more than once with higher resolution scanners than Kodak used at that time.

I see here in this thread, concerns that might not apply to the amateur. If an image is half decent, I'll convert it and play around with it. It will exist in both RAW and TIFF formats on probably 2 or more different optical media discs, as well as a hard drive or two. If I consider the image to be really good, there'll be even more duplicates of it in one format or another, one rendition or another.

It really would be paranoia for me to worry about losing digitised images that are important to me. I'm far more worried about the physical deterioration of unscanned slides and negatives that go back to the days when my father was a school boy. Is it worth scanning them to preserve the memory for future generations that might not give a stuff? Should I scan them purely for the experience of nostalgia I might get and the satisfaction of doing the best job I can in extracting the most detail? Have I got better things to do?
« Last Edit: January 13, 2007, 09:11:07 pm by Ray »
Logged

feppe

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2906
  • Oh this shows up in here!
    • Harri Jahkola Photography
Now we're talkin STORAGE
« Reply #77 on: January 13, 2007, 09:31:22 pm »

Quote
Eh, no, it isn't.

There's a pretty big difference between doing a checksum of each checksum, and doing a checksum of a huge list of checksums.
You're wrong; the checksum isn't self-correcting.

I never said anything about MD5 being self-correcting; I know the difference between them and PAR files.

I know how checksums work, but I don't know if the software can tell if it's the checksum itself that's corrupt, or the data. If MD5 algorithm was designed cleverly - to have a checksum of itself in it - your practice would be unnecessary.

Ray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 10365
Now we're talkin STORAGE
« Reply #78 on: January 13, 2007, 09:58:21 pm »

Here's image no. 14 on the same disc. Taken with Pentax Spotmatic and 135mm lens. Unfortunately the image is not as sharp as I would have liked it to be, a result of the slow film speed and telephoto lens (not my inability to hold the camera steady, I'll add   ).

Notice the equality of the sexes in this underdeveloped country. Both men and women carrying the same load   .

[attachment=1536:attachment]

ps. I don't bother with check sums. I'm a cave man   .
« Last Edit: January 13, 2007, 10:32:16 pm by Ray »
Logged

Nill Toulme

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 738
    • http://www.toulmephoto.com
Now we're talkin STORAGE
« Reply #79 on: January 13, 2007, 10:03:49 pm »

Quote
...Is it worth scanning them to preserve the memory for future generations that might not give a stuff? Should I scan them purely for the experience of nostalgia I might get and the satisfaction of doing the best job I can in extracting the most detail? Have I got better things to do?
[{POST_SNAPBACK}][/a]
Yes; yes; possibly, right at the moment, but at some point...

Nill
~~
[a href=\"http://www.toulme.net]www.toulme.net[/url]
Logged
Pages: 1 2 3 [4] 5   Go Up