Pages: 1 2 3 [4] 5   Go Down

Author Topic: Drive capacity/fullness vs. performance  (Read 37013 times)

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #60 on: April 29, 2012, 03:07:13 am »

Not in the paragraph you quoted. I already said what my primary business is. Storage and networking are presently subjects of R&D, because I find present practices are deficient. The challenge is how to apply enterprise best practices to photographers, without high cost or knowledge requirements. But even Linux/BSD hobbyists have storage technology (software) that's more robust than what most people are using. Even in small-businesses.

Present your methodology so others can reproduce your results. What platform (Solaris, OpenSolaris, FreeBSD, OpenIndiana)? What version of ZFS filesystem and pool? What controllers and drives, how many of each, were multipliers used? What benchmarking tools did you use? What's the breakdown of the individual read/write, random/sequential testing? Preferably this is already documented on the opensolaris ZFS discussion list.

Who did you ask, what did you ask, what was their response? ZFS code is licensed under the CDDL, it's developed by a community who monitor the opensolaris ZFS discuss list. Your findings and methods should be asked there, and they would reply there. If you have a URL for this conversation, please provide it.

That entry was written by Roch Bourbonnais who is a principle engineer at Oracle, working on ZFS. Oracle is the maintainer of ZFS as they acquired it via Solaris when they bought Sun. Do you care to be more specific about your complaint about what he wrote in the article? You've "asked those who wrote [ZFS]" and yet you're going to describe one of them as waiving hands about it? Who are you?

Yes, I'm going to say "those who wrote ZFS" because I don't think it would be very cool to name drop. I still don't understand the ranting above unless you're chest beating. Mentioning the details of the hardware used for testing, would just see this thread descend into debate over the pro's and con's of various hardware, drivers, etc. Further, correcting you on various technical points above would not help anyone. But it seems like you've definitely had lots of ZFS cool-aid to drink.

Quote
You're confused, again. For me to want NAS as just backup, the DAS would have to be large enough to be primary storage. I have not suggested that. Nor have I suggested any fault tolerance for DAS, so it can hardly be primary storage for important things.[1]

The purpose of scripting (directly or with a GUI wrapper interface) is merely to get work in progress files automatically "swept" (pushed) to primary storage featuring resilience, fault tolerance, and (ideally) data replication. It's not a requirement. Instead you could certainly copy your WIP files to the NAS yourself with a conventional drag-drop file/folder copy, whenever you like. Maybe you've had a bad day and don't even like the WIP files so you delete them instead.


[1] One can create a setup where DAS is fast, large, resilient, faul tolerant, replicated and primary, with NAS as both online secondary and backup. It would be expensive, and there are alternatives to consider, per usual.

Ok, the way you're talking here makes it clear that you're completely unfamiliar with Lightroom and its work flow. Since you're an open source freak, I'll suggest that you look at using darktable (which I'm unfamiliar with) since I believe that it is similar in work flow and design to Lightroom. If I was only working with photoshop (or gimp) then what you describe would be relevant.
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #61 on: April 29, 2012, 04:28:43 am »

There is no parity used in RAID 1. Conventional RAID 1 involves making identical block copies between two block devices (physical or virtual).[1] There is no error correction offered by RAID 1 except what's offered by the disk firmware themselves. If the disks return what they each consider valid data, but actually conflict, neither the RAID implementation nor file system can resolve the ambiguity, to determine which block of data is correct.

Same with RAID 5, while there is parity, it's ambiguous whether the data is correct or the parity is. With RAID 6 dual parity, it's unambiguous.

The advantage of resilient file systems is they can resolve these ambiguities, and self-heal.


[1] Btrfs implements RAID 1 that is chunk based, and doesn't require pairs. So it means "at least two copies of data and metadata on separate disks" rather than identical disk pairs.

Indeed
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #62 on: April 29, 2012, 05:10:19 am »

If you look at the majority of photographers, I dare say that asking them to become heavily involved in alternate file systems and other operating systems etc is folly.  Many don't impliment currently available techniques nor have any desire to spend more time in front of the computer than necessary (let alone spending it on non-photography related tasks).

Simple products that are reasonably fast and require minimal maintenance and setup are ideal.  This is the big picture that seems to often get missed by technical folks in these discussions (and whilst I'm not at the level of some of the people here with the technology, I am able to follow it and could impliment it reasonably easily, I have a long background with computers, including various *nix flavours etc).  Most photogs and most users generally don't have that level of knowledge nor do they want it.  Offering solutions that require an IT admin usually won't be accepted.  So, we're looking at compromises that will actually be used.
Yes I know to many photographers that (almost) never make backups on an external drive.  In the "best" case they have a single external usb drive.
They need simple solutions.

Unfortunately most DAM programs (including LR) don't have a simple way to check all files they use.  Adding a md-5 checksum for each file to the DAM database and scrubbing all files isn't that difficult to do.
The only program that I know specific for photo's is Imageverifier. 

Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #63 on: April 29, 2012, 11:35:55 am »

Unfortunately most DAM programs (including LR) don't have a simple way to check all files they use.  Adding a md-5 checksum for each file to the DAM database and scrubbing all files isn't that difficult to do.
The only program that I know specific for photo's is Imageverifier. 

This is not 100% true: DNG files have a checksum builtin of the image data that ACR and LR will validate when you work on an image.
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #64 on: April 29, 2012, 01:07:08 pm »

This is not 100% true: DNG files have a checksum builtin of the image data that ACR and LR will validate when you work on an image.
Are you suggesting that opening all DNG files one for one is a valid alternative to do an automatic check of all files? 

BTW.  And this is only for DNG files, not RAW's.
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #65 on: April 29, 2012, 01:17:37 pm »

Are you suggesting that opening all DNG files one for one is a valid alternative to do an automatic check of all files? 

No, I'm not. I'm just saying that LR has a way to check one particular type of file that it supports- DNG. The checksum is present in the DNG file, so that part of the equation is solved. What it doesn't allow you to do is verify all image checksums, rather only the current one that you're working with.

Quote
BTW.  And this is only for DNG files, not RAW's.

DNG are RAW files.

Adobe supply a converter so that you can convert your CR2 or NEF files to DNG, including allowing you to embed the original file if you so desire.

So if you're worried about bit-rot, moving to use DNG will allow you to detect when bit rot occurs and potentially (for example if you put the original file inside the DNG) correct it (assuming that only part of the DNG data and not the original data has rotted.)

This is one of those things that makes me wish every camera generated DNG files because that way the data is protected from the moment the camera generates it.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #66 on: April 29, 2012, 05:43:31 pm »


Yes, I'm going to say "those who wrote ZFS" because I don't think it would be very cool to name drop.

Right because the ZFS developers are wildly popular famous people and by saying their names your credibility will actually go down as a result.

I don't think it's cool to appeal to a false (unnamed) authority. None of how ZFS works is some proprietary secret, it's an open source project.

For such "very well known issues with ZFS and RAIDZ performance" it's curious you had to go all the way to those who wrote it for clarification.

Quote
Mentioning the details of the hardware used for testing, would just see this thread descend into debate over the pro's and con's of various hardware, drivers, etc.

Peer review. It's actually just to understand the problem better. Most important is what platform and pool version, what benchmarking tools and what *relative* values between single disk and multidisk RAIDZ the various tests produce.

My strong suspicion is that you've confused the importance of small file random IO in a photographer's context, and diminish the value of large file sequential IO which is where RAIDZ performs quite well. Your data might suggest a net neutral result of RAIDZ for Lightroom catalog performance, which entails small random IO.

But while I'm not suggesting the Lightroom catalog go on a RAIDZ array, it might be a valuable test to confirm/deny this because ZFS is a reality on Mac OS X, and soon so will RAIDZ, as a commercial product.

Quote
Further, correcting you on various technical points above would not help anyone. But it seems like you've definitely had lots of ZFS cool-aid to drink.

Except, I thoroughly enjoy being corrected on technical points because I have an innate affinity for being technically correct. The thing is you haven't provided a single reference, or explanation at all for any of your claims.But you appear to have an ample supply of memes and non-responses.

Quote
Ok, the way you're talking here makes it clear that you're completely unfamiliar with Lightroom and its work flow.

You're right, that's why I keep the 1.0 beta email list archive handy for easy reference. But did you have a question? Or do you think these pot shots you take are adequate distractions from not answering questions you've been asked?

Quote
Since you're an open source freak, I'll suggest that you look at using darktable (which I'm unfamiliar with) since I believe that it is similar in work flow and design to Lightroom.

Pass, I'm reasonably pleased with LR, but thanks for the suggestion. It's the most useful data you've provided so far, insofar as it's the only data you've provided so far. I'd never heard of darktable before.

Quote
If I was only working with photoshop (or gimp) then what you describe would be relevant.

Perhaps. But you're welcome to explain why you think so. I will submit the following:

a.) Lightroom benefits more than Photoshop from small file random IO performing disks. This includes high RPM disks, RAID 0 array, and SSD. Such a fast disk may not be fault tolerant.

b.) The LR Catalog contains image file metadata. In normal (default) operation, metadata is not automatically saved to image XMP (sidecar or in DNG).[1] So the catalog is an important file to backup, and Lightroom has a feature to do this. But if you want a more frequent backup than those options provide, it might be nice to automate pushing the file to more reliable storage.

c.) The LR preview data is expendable. If your fast disk dies or needs to be reformatted, you can rebuild the previews and wait it out.

However, Lightroom stores them as individual files in a database, so a syncing program (such as those mentioned) would only push new and changed previews from fast media to more reliable storage. Totally optional. Might save you a few hours one day. But certainly not the end of the world to not back these up.


DNG are RAW files.

I think it's confusing to conflate DNG and Raw. DNG can contain entirely non-Raw data: e.g. JPEG and TIFF can be converted into DNG. Thus DNG could contain output-referred data, or camera-referred linear-demosaiced data, or camera-referred mosaiced data.[2]

Quote
So if you're worried about bit-rot, moving to use DNG will allow you to detect when bit rot occurs and potentially (for example if you put the original file inside the DNG) correct it (assuming that only part of the DNG data and not the original data has rotted.)

There's a potential for ambiguity in that the GUI may not distinguish which data is corrupt, in which case you don't have an easy way to correct the problem. I think the error detection and correction needs to be more automated than this. Since we need duplicates anyway for fault tolerance, it makes sense for the file system to simply manage the error detection and correction, including removal/replacement of the corrupt image with a known good copy, and leave me alone.

Quote
This is one of those things that makes me wish every camera generated DNG files because that way the data is protected from the moment the camera generates it.

It would be nice. But it's important to distinguish between the ability to detect error and the ability to correct it. Camera generated DNG with checksum would allow for error detection but not correction (of that particular DNG). Detection may be better than nothing, but I think we should expect better than just being notified of a problem.


[1] This is why it's a good idea to periodically "Save Metadata to Files".
[2] Some call it scene-referred. I distinguish between the dynamic range of the scene vs the camera, but it's often a smallish distinction.
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #67 on: April 29, 2012, 08:24:17 pm »

... Garbage deleted ...
... even more garbage deleted ...

Again, I strongly recommend buying and watching the entire video from March 2009 (or even one of the current tutorials) on managing your image files. What you're unfamiliar with here is what many consider to be the best practice method for working in and work flow for Lightroom that's been documented in videos that are available on this website. If you don't want to make them available to yourself, then I can't help you further.

Quote
... more textbook comments ...

Rather than comment on Lightroom and how it works from reading about it, I would suggest that you get some operational experience, including when you've got a terabyte or two of images in your Lightroom library.

Oh, except for one thing:
Quote
a.) Lightroom benefits more than Photoshop from small file random IO performing disks. This includes high RPM disks, RAID 0 array, and SSD. Such a fast disk may not be fault tolerant.

Excuse me while I recover from my state of shock as you've recommended a RAID solution that wasn't RAIDZ.

Finally:
Quote
Except, I thoroughly enjoy being corrected on technical points because I have an innate affinity for being technically correct.

At least now I understand why your posts revolve around specific measurements you've made and minute detail of such.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #68 on: April 29, 2012, 08:48:14 pm »

If you don't want to make them available to yourself, then I can't help you further.

You're unwilling or unable to provide cites for statements you've made purporting to be fact. This has absolutely nothing to do with Lightroom or Lightroom videos.

Quote
Rather than comment on Lightroom and how it works from reading about it, I would suggest that you get some operational experience, including when you've got a terabyte or two of images in your Lightroom library.

I've been using it since before it was made public, and I'm amused at your arbitrary metric for determining who is qualified to use or comment on it. Coming from someone who makes false statements, references unnamed authorities, cites no data to back up claims, this is laughable.

Quote
Oh, except for one thing:
Excuse me while I recover from my state of shock as you've recommended a RAID solution that wasn't RAIDZ.

You're clueless. I recommended RAID 0 or 10 in my very first post in this thread:
And I'd keep the data on them limited (operating system, applications and short term data such as scratch/working disks including a smallish RAID 0 or 10).

Quote
Finally:
At least now I understand why your posts revolve around specific measurements you've made and minute detail of such.

Better than being wrong.
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #69 on: April 30, 2012, 11:05:47 am »

You're unwilling or unable to provide cites for statements you've made purporting to be fact.

That's because the information being relayed comes from face to face conversations for which there are no URLs.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #70 on: April 30, 2012, 12:34:02 pm »

That's because the information being relayed comes from face to face conversations for which there are no URLs.

I'll propose your ears were clogged during a portion of the conversation, because your conclusion and total lack of explanation for the conclusion, are incongruent with established understanding of how RAIDZ works, what it does and does not do well. In the context you chose, the behavior is rather exhaustively explained in the Oracle blog post I cited, by a ZFS engineer.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #71 on: April 30, 2012, 12:44:53 pm »

Will an SSD Improve Adobe Lightroom Performance?

Ian Lyons did some benchmarking a year ago, and updated it last month. It focuses on the generation of Library and Develop module previews, which are CPU and RAM bound processes. Since this is the time Raw/DNG images would be most aggressively read off disk, it's interesting to note the minimal performance difference of using SSD versus even FW800. It's suggestive that a GigE NAS containing the image library (the Raw/DNGs) would not negatively impact performance in a significant way. The raw performance numbers imply a differential that the application may not be fully utilizing anyway.

The last paragraph suggests more frequent day to day usage tests that are probably difficult to objectively design. But that's where I'd expect faster local storage for lrcat and lrdata to make a difference.

A great suggestion of the article relates to boosting the CR Cache value, which is otherwise an easy to miss setting.
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #72 on: April 30, 2012, 02:07:50 pm »

I'll propose your ears were clogged during a portion of the conversation, because your conclusion and total lack of explanation for the conclusion, are incongruent with established understanding of how RAIDZ works, what it does and does not do well. In the context you chose, the behavior is rather exhaustively explained in the Oracle blog post I cited, by a ZFS engineer.

Nope. Roch's blog confirms what I said - that RAIDZ is the slowest way to use ZFS.

What's more, it demonstrates that RAIDZ groups do not overcome the performance penalty associated with RAIDZ.

I really don't understand your argument as you've jumped around all over the place to try and say otherwise.

When I mentioned that RAIDZ was slow [#26] (and why), you then mentioned that RAIDZ groups should be used [#28] (even though this doesn't bring you up to par.) When I mentioned that this required more disks [#36], you complained that I was changing things [#40] whereas the more appropriate comment is that in mentioning RAIDZ groups, you'd forgotten to recognise that there is a minimum configuration that is somewhat larger than normal before it becomes relevant. That was rather bad of you to exclude that rather pertinent information.

Although your posts have made one thing clear - you've never actually had to deal with disk data corruption with ZFS (I'm also given to wonder if you've actually used ZFS at all.)
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #73 on: April 30, 2012, 03:35:16 pm »

Nope. Roch's blog confirms what I said - that RAIDZ is the slowest way to use ZFS.

Now you're just making things up.

From the article:
an N-disk RAID-Z group will behave as a single device in terms of delivered random input IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will globally act as a 200-IOPS capable RAID-Z group.

Since when is 200 less than 200? That's "the same" as in "performance neutral." Exactly where do you get "slow" let alone "slowest"?

Further it's directly stated in the quote, that this neutral performance of adding disks affects random input. The whole article's premise is that random input IOPS doesn't scale by adding disks, to get more random IOPS you need to stripe RAIDZ groups. Even without striping, RAIDZ sequential reads and writes works "extremely well" and helps "streaming input loads significantly".

What planet are you on?

Quote
What's more, it demonstrates that RAIDZ groups do not overcome the performance penalty associated with RAIDZ.

Where? Be exact please.

In the grid, about 1/3 of the way down, it shows how striped RAIDZ groups overcome the random IO performance NEUTRALITY of RAIDZ. Two groups, double the IOPS (including random input). Five groups, you get five times IOPS (including random input).

What are you smoking? No need to be exact, I don't want any. Somehow you don't understand how RAID 0 works because that's all that striped RAIDZ groups employs, to scale random IO.

Quote
I really don't understand your argument as you've jumped around all over the place to try and say otherwise.

Projection.

Here you are now citing the very article a few posts ago you were trying to discredit as hand waiving. And you're completely misrepresenting everything about the article on top of it. It's willful deception at this point.
« Last Edit: April 30, 2012, 03:40:11 pm by chrismurphy »
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #74 on: April 30, 2012, 05:11:06 pm »

Now you're just making things up.

From the article:
an N-disk RAID-Z group will behave as a single device in terms of delivered random input IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will globally act as a 200-IOPS capable RAID-Z group.

Since when is 200 less than 200? That's "the same" as in "performance neutral." Exactly where do you get "slow" let alone "slowest"?

Because RAIDZ is not the only way in which ZFS can be used.

Quote
In the grid, about 1/3 of the way down, it shows how striped RAIDZ groups overcome the random IO performance NEUTRALITY of RAIDZ. Two groups, double the IOPS (including random input). Five groups, you get five times IOPS (including random input).

Yes, and compare the results with the use of ZFS without RAIDZ.

To make it easy for you, I'll cut-n-paste the numbers here:
---
   Config      Blocks Available   FS Blocks /sec
    ------------   ----------------   ---------
    Z 1  x (99+1)    9900 GB             200    
    Z 2  x (49+1)   9800 GB             400    
    Z 5  x (19+1)   9500 GB          1000    
    Z 10 x (9+1)   9000 GB          2000    
    Z 20 x (4+1)   8000 GB          4000    
    Z 33 x (2+1)   6600 GB          6600    

    M  2 x (50)    5000 GB         20000    
    S  1 x (100)   10000 GB      20000    
---
Z# = ZFS RAIDZ with # groups
M = ZFS Mirror
S = ZFS Simple striping

All of the above results are with ZFS, the only difference is how the zpool is created for the filesystem. I don't know how there's any other way to interpret the above results as meaning that RAIDZ is not the slowest way to use ZFS.
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #75 on: April 30, 2012, 07:09:41 pm »

Because RAIDZ is not the only way in which ZFS can be used.

Yes, and compare the results with the use of ZFS without RAIDZ.

To make it easy for you, I'll cut-n-paste the numbers here:
---
   Config      Blocks Available   FS Blocks /sec
    ------------   ----------------   ---------
    Z 1  x (99+1)    9900 GB             200    
    Z 2  x (49+1)   9800 GB             400    
    Z 5  x (19+1)   9500 GB          1000    
    Z 10 x (9+1)   9000 GB          2000    
    Z 20 x (4+1)   8000 GB          4000    
    Z 33 x (2+1)   6600 GB          6600    

    M  2 x (50)    5000 GB         20000    
    S  1 x (100)   10000 GB      20000    
---
Z# = ZFS RAIDZ with # groups
M = ZFS Mirror
S = ZFS Simple striping

All of the above results are with ZFS, the only difference is how the zpool is created for the filesystem. I don't know how there's any other way to interpret the above results as meaning that RAIDZ is not the slowest way to use ZFS.
It's extremely unlikely that a ZFS system in use for a photographer will be in need for massive random input IOPS.  Certainly not when it's run over "current" LAN speeds.

But for input IOPS inside a ZFS zpool there's the possibility to add a -small- ZIL device (which can be a mirror (or RAIDZ) in itself), for example using a SSD made for caching (several thousand iops sustained) or even a battery back upped RAM drive (>100.000 IOPS).

 
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #76 on: April 30, 2012, 07:14:31 pm »

This section is RAIDZ only, all except the first are striped:

    Config            Blocks Available   Random FS Blocks /sec
   ------------       ----------------      ----------------------
    Z 1  x (99+1)   9900 GB             200    
    Z 2  x (49+1)   9800 GB             400    
    Z 5  x (19+1)   9500 GB            1000    
    Z 10 x (9+1)    9000 GB            2000    
    Z 20 x (4+1)    8000 GB            4000    
    Z 33 x (2+1)    6600 GB            6600   
    
You said:
it demonstrates that RAIDZ groups do not overcome the performance penalty associated with RAIDZ

I said on April 22:
Solvable by striped RAIDZ groups.

Anyone can see that as you add striped groups, performance goes up linearly, contrary to your assertion.

The "penalty" you keep referring to happens with random IO. These graphs are "Random FS Blocks" not sequential, and are not bandwidth.

In #42 I asked you to distinguish clearly your performance "penalty" claim. I asked if scaled out IOPS is important for a photographer, vs bandwidth. I asked you if you likewise disqualify RAID 5 which actually has a similar random IO scaling issue for the same basic reason as RAIDZ. You deleted all of those questions and refused to answer them.

Quote
All of the above results are with ZFS, the only difference is how the zpool is created for the filesystem.

The entire frigging blog post is expressly about *RANDOM* IO. It's at the top of the columns you copy-pasted (but conveniently forgot to paste the word random - nice touch by the way).

Quote
I don't know how there's any other way to interpret the above results as meaning that RAIDZ is not the slowest way to use ZFS.

That's because you refuse to distinguish between random and sequential IO. And because you refuse to acknowledge the distinction between IOPS and bandwidth.

And every time I've asked you to acknowledge these distinctions you delete the questions and requests, and don't respond. Yet you keep writing bunk that's simply not true and not relevant at all to a photographer even if it were true.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #77 on: April 30, 2012, 07:22:32 pm »

It's extremely unlikely that a ZFS system in use for a photographer will be in need for massive random input IOPS.  Certainly not when it's run over "current" LAN speeds.

Right, exactly. Raw, DNG, large PSD and TIFFs hardly qualify for random IO. These will be sequential IO, the IO requests will be aggregated by ZFS, and RAIDZ will parallelize the data streams from all disks in the RAIDZ group. It would even be fine for 10 GigE usage, but most certainly the entire conversation is insanely moot for 1GigE when a single disk saturates GigE.

Quote
But for input IOPS inside a ZFS zpool there's the possibility to add a -small- ZIL device (which can be a mirror (or RAIDZ) in itself), for example using a SSD made for caching (several thousand iops sustained) or even a battery back upped RAM drive (>100.000 IOPS).

Yeah for large file writing there may not be that much filesystem metadata needing to be cached, which is what the ZIL is for. For caching data read/writes, that's L2ARC where you actually get a sort of automated "hot file" and "cold" file distinction at a file system level. Very cool.
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #78 on: April 30, 2012, 11:16:40 pm »

This section is RAIDZ only, all except the first are striped:

    Config            Blocks Available   Random FS Blocks /sec
   ------------       ----------------      ----------------------
    Z 1  x (99+1)   9900 GB             200     
    Z 2  x (49+1)   9800 GB             400     
    Z 5  x (19+1)   9500 GB            1000     
    Z 10 x (9+1)    9000 GB            2000     
    Z 20 x (4+1)    8000 GB            4000     
    Z 33 x (2+1)    6600 GB            6600

Anyone can see that as you add striped groups, performance goes up linearly, contrary to your assertion.

So what?

That's not contrary to what I'm saying which is that RAIDZ is slower than non-RAIDZ.

How many times do I have to repeat this?

RAIDZ is the slowest method of using ZFS.

I'm not saying that some forms of RAIDZ aren't faster than others, I'm saying RAIDZ (in any configuration) is slower than using the other methods supported by ZFS.

None of the RAIDZ performance figures come even close to mirror or striping.

I don't know what more I can say here. That you deleted the lines from the performance results that mentioned mirroring and striping is probably significant, I don't know. If you think that RAIDZ is the only way to use ZFS then you're wrong.

For a NAS connected via Gigabit ethernet, sure, RAIDZ vs non-RAIDZ may not be noticable. But for locally hosted storage connected via eSATA or faster, it can be.

Quote
The "penalty" you keep referring to happens with random IO. These graphs are "Random FS Blocks" not sequential, and are not bandwidth.

No, it happens with all IO. For any given operation, using RAIDZ will be slower than any non-RAIDZ equivalent. Again, this is because it treats all of the disks within a given group as a single disk. I'm not interested in comparing RAIDZ with 2 RAIDZ groups, I'm interesting in comparing RAIDZ with the non-RAIDZ equivalents (mirroring and striping.)

Which is to say that RAIDZ will also be slower in terms of raw bandwidth than is available than ZFS with mirroring or striping.

Quote
In #42 I asked you to distinguish clearly your performance "penalty" claim. I asked if scaled out IOPS is important for a photographer, vs bandwidth. I asked you if you likewise disqualify RAID 5 which actually has a similar random IO scaling issue for the same basic reason as RAIDZ. You deleted all of those questions and refused to answer them.

That's because RAID 5 isn't RAIDZ and thus discussion of it (RAID 5) is not relevant in a discussion thread on RAIDZ except to serve as a distraction and a destination for more rat-holing. As for what's important to the photographer, that's likely a subjective thing. The most pertinent message on that topic has been the desire for external disks to perform at the same speed as internal disks. That will mean both IOPS and bandwidth. What they are interested in is connecting an internal disk to their system and getting similar performance to their internal disks. For me, that says they want eSATA, USB 3.0 or better.

Quote
You deleted all of those questions and refused to answer them.

Let he who has not deleted questions and thus refused to answer them cast the first stone.

Quote
The entire frigging blog post is expressly about *RANDOM* IO. It's at the top of the columns you copy-pasted (but conveniently forgot to paste the word random - nice touch by the way).

Ah, that exclusion wasn't deliberate but having said that, you won't believe it.

But to take this a step further, unless you're streaming video content or working with really large files (100s of megabytes in size) then random IO is the correct model to use for determining whether something is good or bad. It's definitely the right model to use for any disk that has images on it.

Quote
That's because you refuse to distinguish between random and sequential IO. And because you refuse to acknowledge the distinction between IOPS and bandwidth.

See above.

Quote
And every time I've asked you to acknowledge these distinctions you delete the questions and requests, and don't respond. Yet you keep writing bunk that's simply not true and not relevant at all to a photographer even if it were true.

ok, so now you've resorted to insults. At this point I'm not going to engage any further after posting this because I don't want to be involved in discussions that are clearly going downhill and I also don't want to further encourage you.
« Last Edit: April 30, 2012, 11:20:31 pm by dreed »
Logged

John.Murray

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 886
    • Images by Murray
Re: Drive capacity/fullness vs. performance
« Reply #79 on: May 01, 2012, 02:03:09 am »

Logged
Pages: 1 2 3 [4] 5   Go Up