Pages: 1 2 [3] 4 5   Go Down

Author Topic: Drive capacity/fullness vs. performance  (Read 37022 times)

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #40 on: April 23, 2012, 06:18:56 pm »

This only works in configurations where you have more than 6 disks. Otherwise you do not have enough disks to create more than one pool with a pair of disks plus parity. I suspect that nearly everyone here is looking at something with around half that number of disks.

Now you're shifting the goal posts. If you're going to complain about performance, but then take away the whole reason for the problem in the first place, it obviates the problem. Let's leave it at, the problem you are talking about is not really a problem. People can get plenty of performance from ZFS over either GigE or 10 GigE. That's a configuration question. It's not a reason to not use ZFS.

Unless you're a full on computer geek, ZFS should not be part of the equation. There are other aspects of the solution that will be far more important, such as what interfaces the product provides, how do you connect it, manage it, etc. ...

Probably the best solution is to find an external RAID thing that does USB 3.0

It's interesting you discount the file system, in favor of something like USB, which the vast majority of chipsets out there do not pass through ATA commands, meaning no ATA Secure Erase, no hardware-based full disk encryption, and no SMART monitoring. i.e. it is important to check the capabilities of your USB controller if such things are important to you.

If one is concerned primarily with performance, DAS with a native file system for your operating system is the way to go. But the combination of DAS with these file systems, without battery backup is a recipe for data loss. The file systems are not guaranteed to be consistent through such events, they weren't designed with that in mind. ZFS, Btrfs, ReFS are.

In contrast, NAS is better suited for high(er) availability, better file systems for data integrity, UPS integration, and SMART support. If you also want speed, you can get it but it'll cost you a 10 GigE network. And in reality, ZFS is no more difficult to set up in a NAS than any other file system, so it's hardly the realm of "full on computer geek" as if a NAS is not.

Quote
http://www.lian-li.com/v2/en/product/product06.php?pr_index=549&cl_index=12&sc_index=42&ss_index=115&g=f

External interface eSATA is 3Gb/s which is 300MB/s. Adequate but for something that holds five disks, it's bandwidth limited by the external interface. Each disk will push between 120MB/sec and 150MB/sec. So your interface saturation occurs at two disks, sustained. Not just burst.
« Last Edit: April 23, 2012, 10:19:58 pm by chrismurphy »
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #41 on: April 24, 2012, 11:44:06 am »

Now you're shifting the goal posts. If you're going to complain about performance, but then take away the whole reason for the problem in the first place, it obviates the problem. Let's leave it at, the problem you are talking about is not really a problem. People can get plenty of performance from ZFS over either GigE or 10 GigE. That's a configuration question. It's not a reason to not use ZFS.

No I'm not shifting the goal posts, you did by suggesting that RADIZ groups be used and forgot to mention that there are minimum requirements in order to make that work. There are very well known issues with ZFS and RAIDZ performance and that overcoming them isn't trivial because it requires specific configurations to get around them with. RAIDZ just isn't suitable for small disk configurations, such as those most people here will use.

Quote
It's interesting you discount the file system, in favor of something like USB, which the vast majority of chipsets out there do not pass through ATA commands, meaning no ATA Secure Erase, no hardware-based full disk encryption, and no SMART monitoring. i.e. it is important to check the capabilities of your USB controller if such things are important to you.

Replacing USB with eSATA fixes all of the above.

Quote
...
External interface eSATA is 3Gb/s which is 300MB/s. Adequate but for something that holds five disks, it's bandwidth limited by the external interface. Each disk will push between 120MB/sec and 150MB/sec. So your interface saturation occurs at two disks, sustained. Not just burst.

So what?

As mentioned, what people want is to access disk at speeds they are used to. USB and firewire that most people have today is slower than that. Further, they're not likely to be running a huge DB or file serving lots of clients, just their own PC. Thus eSATA is perfect because it is the speed of the local disk except to something external. Whether or not there is interface saturation is beside the point. They're not architecting enterprise storage solutions, just trying to move beyond USB because it is perceptibly slower than internal disk. Similarly, eSATA is going to be quicker for them to access than any NAS (Network Attached Storage) that's connected to Gigabit ethernet (3Gb/s > 1Gb/s). And by the time 10GE is affordable, they'll want something new anyway.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #42 on: April 24, 2012, 03:28:55 pm »

No I'm not shifting the goal posts, you did by suggesting that RADIZ groups be used and forgot to mention that there are minimum requirements in order to make that work.

The problem you referred to, from the outset, occurs with many drives.

Quote
There are very well known issues with ZFS and RAIDZ performance and that overcoming them isn't trivial because it requires specific configurations to get around them with. RAIDZ just isn't suitable for small disk configurations, such as those most people here will use.

For someone claiming very well known issues with ZFS and RAIDZ performance, why did you forget to mention that this RAIDZ performance problem is one of IOPS, not bandwidth? In the case of large files and sequential reads or writes, RAIDZ parallelizes available bandwidth very well. I do not consider Raw/DNG or working PSD/TIFFs for most photographers to be small files. JPEGs might be, it depends.

Do you suppose scaling out IOPS is important for a photographer?

Do you likewise disqualify RAID 5? If not, why not?


Quote
So what?

What's the advantage of this enclosure over sticking disks inside the desktop computer, which would invariably provide better striping performance? By a lot. And would be cheaper.

Also, the specs are confusing: Is it 15TB, 10TB, or 6TB capacity? Is this built-in or host RAID?

Quote
Thus eSATA is perfect because it is the speed of the local disk except to something external.

Whether it's perfect or not depends on the other hardware in the workflow, which we either don't know, or I missed it even though I went back and looked. I think on the one hand this enclosure is overkill on the quantity of storage, but it's bottlenecked by interface. I would consider a different approach, but again it depends on other hardware.

DAS is for performance, it's for hot files. NAS is for availability, it's for cold files.

Hot files are: scratch space, preview/cache files, work-in-progress PSDs and TIFFs.

Cold files are: those pixels that haven't been touched in a month, let alone today.

It really does not make sense spending extra money on fast large DAS for cold files. At least not until we have more reliable file systems that can be both resilient and fast, by pooling (aggregating) those disks together.

So I would bias the budget for DAS to be small, but as fast as practical for the size I need for daily work: hot files.

And I'd bias the budget for NAS to be large, not as fast, but higher availability, for the cold files. Plus I get SMART and UPS monitoring built-in, a more reliable file system, and automated replication features (to either an on-site or off-site duplicate NAS or cloud storage).

I could even have a "sweep" script that moves all fast DAS files to the NAS once or twice a day. And then after 7 days of aging, deletes them off the NAS. This, in case I "forget" to move my hot files to more reliable storage.


Quote
Similarly, eSATA is going to be quicker for them to access than any NAS (Network Attached Storage) that's connected to Gigabit ethernet (3Gb/s > 1Gb/s).

GigE NAS comes close to single local disk performance, ~ 90MB/s is reasonable to expect although I've seen them push 110MB/s.
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2848
Re: Drive capacity/fullness vs. performance
« Reply #43 on: April 24, 2012, 07:22:16 pm »

The advantage of the Lian Li enclosure is that compared to a regular NAS, it's much, much faster (it's both eSATA and USB 3).  For many users, they don't have the knowledge or don't want to deal with setting up RAID on their own system or perhaps it's already saturated for local performance or storage and they just want more storage at a decent price that's easy to setup and, most importantly, doesn't give them a huge performance hit.

This particular enclosure is also suitable for people who want to increase storage but don't have room in their own computer (perhaps it's a laptop) but want more speed than a NAS.

If you really want high performance in your desktop you'd add something like a Revodrive (one of the larger, workstation class devices), but that would also empty a large portion of your back account :-)
Logged
Phil Brown

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #44 on: April 24, 2012, 11:50:28 pm »

Quote from: chrismurphy link=topic=65613.msg523739#msg523739 date=1335295735
For someone claiming [i
very well known issues with ZFS and RAIDZ performance[/i], why did you forget to mention that this RAIDZ performance problem is one of IOPS, not bandwidth?

Go back and read my comments, specifically where I state that RAIDZ works all drives in sync rather than independently.

Quote
What's the advantage of this enclosure over sticking disks inside the desktop computer, which would invariably provide better striping performance? By a lot. And would be cheaper.

Ok, given your other comments, I'm going to call "troll" on a bunch of what you said here.

Quote
DAS is for performance, it's for hot files. NAS is for availability, it's for cold files.

... other comments deleted ...

I'd recommend that you go back and either buy or re-watch some of the Luminous Landscape videos that talk on this topic. One of the early ones by Seth that recommends keeping all of your images on one device comes to mind. He's a professional photographer that clearly has his work flow and IT issues sorted without all of this nonsense about DAS/NAS.

The complexity required to use shell scripts to do what you suggest is far beyond what I'd expect for any photographer and on top of that, you're managing photographs outside of an application such as Lightroom, thus requiring manual steps inside of that (or whatever application is being used) too. If you feel comfortable doing that, fine but be aware that not everyone is a technical expert in these areas and nor should they have to be.

The advantage of the Lian Li enclosure is that compared to a regular NAS, it's much, much faster (it's both eSATA and USB 3).  For many users, they don't have the knowledge or don't want to deal with setting up RAID on their own system or perhaps it's already saturated for local performance or storage and they just want more storage at a decent price that's easy to setup and, most importantly, doesn't give them a huge performance hit.

This particular enclosure is also suitable for people who want to increase storage but don't have room in their own computer (perhaps it's a laptop) but want more speed than a NAS.

Exactly! The only question I had was whether there were any inbuilt limitations on hard drive sizes that it would work with but once you've got 8TB of storage, filling that is going to take time and by the time it is full, you may want to replace the storage solution in total for something newer.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #45 on: April 25, 2012, 03:51:00 pm »

The advantage of the Lian Li enclosure is that compared to a regular NAS, it's much, much faster . . . easy to setup and, most importantly, doesn't give them a huge performance hit.

Scroll half way down, there are USB 3.0 benchmarks putting it at 160MB/s.
http://homeservershow.com/lian-li-ex-503b.html

4th comment on RAID5 performance with eSATA.
http://forums.whirlpool.net.au/archive/1667234

You're saying ~160 MB/s is "much much" faster than ~100 MB/s, and ~100MB/s is a "huge performance hit" compared to ~160MB/s? OK.

One internal WD VelocityRaptor is 200+MB/s sustained.

One internal Mercury Aura Pro is 511+MB/s sustained.

I still question what the RAID implementation is. If it's proprietary, the data on the array is locked behind that. If you plan to do RAID 5, depending on the enclosure's RAID implementation, some consumer disks may not work. WDC specifically says RAID 1 and 0 only for their Green, Blue and Black consumer drives. There are other combinations that can work, but it depends on the RAID implementation. The wrong combinations lead to array failures (not disk failures).

So what's the backup plan with the Lian Li? Get two? Or is the data expendable?

Go back and read my comments, specifically where I state that RAIDZ works all drives in sync rather than independently.

If you meant IOPS, you should have said it. But, setting aside the vague and confusing phraseology, your blanket proscription of RAIDZ doesn't make sense. IOPS is important for small files and random access performance. Photographers working on Raw, DNG, PSD, TIFF, need bandwidth and RAIDZ parallelizes bandwidth across disks quite well.

Quote
Ok, given your other comments, I'm going to call "troll" on a bunch of what you said here.

Please proceed with the name calling at your convenience.

Quote
I'd recommend that you go back and either buy or re-watch some of the Luminous Landscape videos that talk on this topic. One of the early ones by Seth that recommends keeping all of your images on one device comes to mind. He's a professional photographer that clearly has his work flow and IT issues sorted without all of this nonsense about DAS/NAS.

The March 2009 article? What were SSD's going for then, price/performance ratio wise?

The suggested storage solution is a compromise that gets you Jack of some trades, master of none. The aim is ostensibly speed, but while somewhat better than NAS, it's not a master of speed. I keep hearing the "NAS is slow" complaint, and the resulting myopia leads to a solution that is deficient in every other category.

My customers with high volume workflows, with the exception of video, use 1 GigE for designer/retoucher stations. The designer copies project files from the NAS, works on them locally, and then files are pushed back to the NAS. These folks have dedicated IT, but they do not support one, let alone each designer, having their own array attached. Why? Because it's a support nightmare; the data is not high availability; the data tends to not get backed up (in particular when the computer has been shut down); they get no notifications until it's too late.

And there is no contradiction at all with the suggestion you keep all images in one storage pool, while "checking out" files onto much faster media for work in progress, and then "checking in" those files back to storage.

Quote
The complexity required to use shell scripts to do what you suggest is far beyond what I'd expect for any photographer and on top of that,

Try to see this as a continuum rather than as binary. There are apps with a GUI that will script for you. Carbon Copy Cloner uses rsync and Super Duper uses ditto. The web GUI interface on a NAS produces scripts to automate replication features.
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2848
Re: Drive capacity/fullness vs. performance
« Reply #46 on: April 25, 2012, 06:41:15 pm »

The backup plan?  Whatever you want it to be.  It might be part of your backup, or you might use a second one, or you might use a NAS, etc. etc.  Many options.

Comparing the Lian Li enclosure to a raptor?  Are you kidding?  Do you have an 8TB raptor?

There are other reviews of USB3 connection performing a bit higher, but let's take 160.  That's 60% better and you're assuming that you'll actually get 100MB/s - the reality is you won't get full saturation, particularly if you're doing anything else over that network.  If you think 60% isn't significant, well, so be it (but you're wrong).

And for speed, a raptor is OK (I use some myself - scratch disk), but as I said, if you are really talking about speed there are much better options (that blow away your Mercury Aura Pro).  But we're not talking about the highest possible performance - we're talking about connected/additional storage, reasonable performance, reasonable cost and so on.
Logged
Phil Brown

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #47 on: April 26, 2012, 05:14:59 pm »

If you meant IOPS, you should have said it. But, setting aside the vague and confusing phraseology, your blanket proscription of RAIDZ doesn't make sense.

To anyone that is familiar with RAIDZ, what I said is neither vague nor confusing.

I'm happy for your customers and I hope that they're happy with your work but it doesn't sound like that they're individual photographers each with their own library of photographs and video to work with.

Quote
Quote
I'd recommend that you go back and either buy or re-watch some of the Luminous Landscape videos that talk on this topic. One of the early ones by Seth that recommends keeping all of your images on one device comes to mind. He's a professional photographer that clearly has his work flow and IT issues sorted without all of this nonsense about DAS/NAS.
The March 2009 article?

Yes, but and watch the entire video, not just the free preview.

Quote
Quote
The complexity required to use shell scripts to do what you suggest is far beyond what I'd expect for any photographer and on top of that,
Try to see this as a continuum rather than as binary. There are apps with a GUI that will script for you. Carbon Copy Cloner uses rsync and Super Duper uses ditto. The web GUI interface on a NAS produces scripts to automate replication features.

Both of those solve a problem that is unrelated to the problem of managing your files outside of Lightroom and needing to synchronise Lightrooom manually.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #48 on: April 27, 2012, 12:17:42 pm »

The backup plan?  Whatever you want it to be.

It's easier to take shots if you're more specific. Better that the drawing board idea has leaks than the actual implementation.

Quote
Comparing the Lian Li enclosure to a raptor?  Are you kidding?  Do you have an 8TB raptor?

Consider how you can take advantage of very fast storage to speed up the thing you do most, rather than worrying about the performance of things you don't do very often.

And there are open questions remaining on the enclosure. The enclosure incentivizes the use of RAID 5, and consumer disks mostly proscribe RAID 5. WDC Green, Blue and Black disks are expressly RAID 1 and 0 disks, not RAID 5. If the data is expendable, things are much easier and you don't need to know such things.

Quote
That's 60% better and you're assuming that you'll actually get 100MB/s - the reality is you won't get full saturation, particularly if you're doing anything else over that network.  If you think 60% isn't significant, well, so be it (but you're wrong).

I've pushed data at 100MB/s. I've seen it pushed to 115MB/s, although I'm pretty sure that involved jumbo frames. Gotta have a clean network though, this is true.

Depending on what you mean by "doing anything else over that network" the entire strategy may be altered. It might make sense to use LACP for a 2Gb/s connection NAS to the switch; or 10 GigE from NAS to the switch, for multiple workstations.

Now then, do you consider 60% slower transfers, occurring twice a day (check-out & check-in) with a GigE NAS, to be a real problem if you increase the performance of something you do 24 times a day (Photoshop save to a fast local disk) by 100%? Or 300%? I don't think that's an unreasonable trade off.

To anyone that is familiar with RAIDZ, what I said is neither vague nor confusing.

OK, so then you're either confused, or unfamiliar with RAIDZ, to have categorically admonished its usage due to the inapplicable performance reason you were referencing.

WHEN TO (AND NOT TO) USE RAID-Z
https://blogs.oracle.com/roch/entry/when_to_and_not_to

I quote from the article a relevant point:
Because of  the  ZFS Copy-On-Write (COW) design, we actually do expect this reduction in number of device level I/Os to work extremely well for just about any write intensive workloads. We also expect it to help streaming input loads significantly. The situation of random inputs is one that needs special attention when considering RAID-Z.

Quote
I'm happy for your customers and I hope that they're happy with your work but it doesn't sound like that they're individual photographers each with their own library of photographs and video to work with.

My business remains primarily color management related, while my expertise extends substantially beyond it for historical, as well as interest, reasons. I come across a wide assortment of customers from enterprise to individual photographers. And I'm privy to their various success and fail scenarios regarding networking and storage.

Consider that even the individual photographer approaches, and many individual pros are at, enterprise level storage requirements. Yet they don't have enterprise level IT staff on hand. The solutions I commonly see in non-enterprise environments work fine until they don't. And I don't just mean outright failures or data corruption, but even problems with performance, migration, expansion, and even backup restoration. It's as if data restoration hasn't been modeled let alone practiced.

Quote
Both of those solve a problem that is unrelated to the problem of managing your files outside of Lightroom and needing to synchronise Lightrooom manually.

The example task being automated is a unidirectional push of files/folders from fast local storage to a NAS: today's Photoshop files, as well as the Lightroom preview cache and catalog files.
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2848
Re: Drive capacity/fullness vs. performance
« Reply #49 on: April 27, 2012, 06:18:37 pm »

You still don't get it.  You're taking the worst possible usage for a particular thing and comparing that against the best possible usage of another (your talk of 115MB/s is a classic).

Personally, the Lian Li would be a very cost effective way of adding external storage that was significantly faster than NAS (at any comparable price point).  My workstation is essentially full, physically.  I will be upgraded in the next few months to an Ivy Bridge system and at that time I may suffle things around.

You're just looking to fine ways to shoot things down (as you say yourself), but you're not actually looking at the same picture as everyone else because you're apparently only interested in your own vision.  Doing individual point comparison and failing to comprehend the big picture is a real problem.
Logged
Phil Brown

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #50 on: April 27, 2012, 06:37:18 pm »

You still don't get it.

Feel free to explain in more verbose and simple terms then, rather than just repeating yourself.

Quote
You're taking the worst possible usage for a particular thing and comparing that against the best possible usage of another (your talk of 115MB/s is a classic).

I cited two benchmarks for the product you mentioned, and used the better of the two. You said you found better benchmarks, but you didn't cite them, and you accepted 160MB/s. Conversely, the other cite I provided benchmarked 110MB/s reads and 40MB/s writes. If I were actually being fair, instead of giving the Lian Li the benefit of the doubt, I'd have averaged the two, bringing its performance to 135MB/s reads, and 100MB/s writes.

Quote
Personally, the Lian Li would be a very cost effective way of adding external storage that was significantly faster than NAS (at any comparable price point).

Speculation. You have no evidence.

The benchmark data submitted thus far doesn't support your contention. You persist in ignoring every deficiency of the product or its specs except this hypothetical performance difference.

Quote
You're just looking to fine ways to shoot things down (as you say yourself), but you're not actually looking at the same picture as everyone else because you're apparently only interested in your own vision.  Doing individual point comparison and failing to comprehend the big picture is a real problem.

And what is it you're doing when you ignore all unknowns and deficiencies of a product except the "individual point comparison" of speed, which really isn't even that significant?
Logged

dreed

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1715
Re: Drive capacity/fullness vs. performance
« Reply #51 on: April 28, 2012, 04:43:03 am »

Consider how you can take advantage of very fast storage to speed up the thing you do most, rather than worrying about the performance of things you don't do very often.

And there are open questions remaining on the enclosure. The enclosure incentivizes the use of RAID 5, and consumer disks mostly proscribe RAID 5. WDC Green, Blue and Black disks are expressly RAID 1 and 0 disks, not RAID 5. If the data is expendable, things are much easier and you don't need to know such things.

I've pushed data at 100MB/s. I've seen it pushed to 115MB/s, although I'm pretty sure that involved jumbo frames. Gotta have a clean network though, this is true.

Depending on what you mean by "doing anything else over that network" the entire strategy may be altered. It might make sense to use LACP for a 2Gb/s connection NAS to the switch; or 10 GigE from NAS to the switch, for multiple workstations.

Now then, do you consider 60% slower transfers, occurring twice a day (check-out & check-in) with a GigE NAS, to be a real problem if you increase the performance of something you do 24 times a day (Photoshop save to a fast local disk) by 100%? Or 300%? I don't think that's an unreasonable trade off.

Are you trying to sell yourself or your services?

Quote
OK, so then you're either confused, or unfamiliar with RAIDZ, to have categorically admonished its usage due to the inapplicable performance reason you were referencing.

No, I've benchmarked it (run various tests) and found it to be slowest way to use ZFS. I then asked those who wrote it why that was so. Case closed.

Quote
WHEN TO (AND NOT TO) USE RAID-Z
https://blogs.oracle.com/roch/entry/when_to_and_not_to

I quote from the article a relevant point:
Because of  the  ZFS Copy-On-Write (COW) design, we actually do expect this reduction in number of device level I/Os to work extremely well for just about any write intensive workloads. We also expect it to help streaming input loads significantly. The situation of random inputs is one that needs special attention when considering RAID-Z.

Why don't you find a blog entry where they actually measure and report on the performance difference between RAIDZ and other things rather than one that just hand waves about it?

Quote
The example task being automated is a unidirectional push of files/folders from fast local storage to a NAS: today's Photoshop files, as well as the Lightroom preview cache and catalog files.

Previously you were talking about using DAS (hot storage) as a local cache of things that you are working on and the NAS (cold storage) was were everything was held. Now it seems like you want the NAS just to be a backup solution?

Anyway, I still highly recommend watching the entire video from the March 2009 article. At least then you'll have a common point to talk to with everyone else.
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #52 on: April 28, 2012, 06:23:46 pm »

...
The March 2009 article?


Yes, but and watch the entire video, not just the free preview.
...
There's are very good lessons inside the complete video : 
- Take at least one backup off line, meaning disconnect it physical from the LAN and power connectors.
- Don't trust an on line backup provider.
- Take separate backups from system (make those reboot-able, maybe with a little bit of trouble) and data.
- Remember that everything in the current location can be lost. --> Keep a backup on a separate place "far" away.
  (Think about burglars, lightning, fire,...)

Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #53 on: April 28, 2012, 06:32:57 pm »

Are you trying to sell yourself or your services?

Not in the paragraph you quoted. I already said what my primary business is. Storage and networking are presently subjects of R&D, because I find present practices are deficient. The challenge is how to apply enterprise best practices to photographers, without high cost or knowledge requirements. But even Linux/BSD hobbyists have storage technology (software) that's more robust than what most people are using. Even in small-businesses.

Quote
No, I've benchmarked it (run various tests) and found it to be slowest way to use ZFS.

Present your methodology so others can reproduce your results. What platform (Solaris, OpenSolaris, FreeBSD, OpenIndiana)? What version of ZFS filesystem and pool? What controllers and drives, how many of each, were multipliers used? What benchmarking tools did you use? What's the breakdown of the individual read/write, random/sequential testing? Preferably this is already documented on the opensolaris ZFS discussion list.

Quote
I then asked those who wrote it why that was so. Case closed.

Who did you ask, what did you ask, what was their response? ZFS code is licensed under the CDDL, it's developed by a community who monitor the opensolaris ZFS discuss list. Your findings and methods should be asked there, and they would reply there. If you have a URL for this conversation, please provide it.

Quote
Why don't you find a blog entry where they actually measure and report on the performance difference between RAIDZ and other things rather than one that just hand waves about it?

That entry was written by Roch Bourbonnais who is a principle engineer at Oracle, working on ZFS. Oracle is the maintainer of ZFS as they acquired it via Solaris when they bought Sun. Do you care to be more specific about your complaint about what he wrote in the article? You've "asked those who wrote [ZFS]" and yet you're going to describe one of them as waiving hands about it? Who are you?

Quote
Previously you were talking about using DAS (hot storage) as a local cache of things that you are working on and the NAS (cold storage) was were everything was held. Now it seems like you want the NAS just to be a backup solution?

You're confused, again. For me to want NAS as just backup, the DAS would have to be large enough to be primary storage. I have not suggested that. Nor have I suggested any fault tolerance for DAS, so it can hardly be primary storage for important things.[1]

The purpose of scripting (directly or with a GUI wrapper interface) is merely to get work in progress files automatically "swept" (pushed) to primary storage featuring resilience, fault tolerance, and (ideally) data replication. It's not a requirement. Instead you could certainly copy your WIP files to the NAS yourself with a conventional drag-drop file/folder copy, whenever you like. Maybe you've had a bad day and don't even like the WIP files so you delete them instead.


[1] One can create a setup where DAS is fast, large, resilient, faul tolerant, replicated and primary, with NAS as both online secondary and backup. It would be expensive, and there are alternatives to consider, per usual.
« Last Edit: April 28, 2012, 06:43:33 pm by chrismurphy »
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #54 on: April 28, 2012, 06:37:22 pm »

Personally, the Lian Li would be a very cost effective way of adding external storage that was significantly faster than NAS (at any comparable price point).  My workstation is essentially full, physically.  I will be upgraded in the next few months to an Ivy Bridge system and at that time I may suffle things around.
This is a valid reason, but I would stay away from RAID-5.
- The implementation is probably enclosure specific.  You probably need the same exact enclosure to recover from an enclosure defect.  
- Rebuilding a RAID-5 with large disks runs a very long time and and error then is probably fatal for you're RAID-5 data.

Two 3TB disks in raid1 are fast and the second drive protects you from some disk errors.  If you're using usb-3, I would think about two separate 2 disk enclosures.  Those two will cost less than 1 bigger and can be faster.

It's simple and fast but it won't have: snapshots, testing the data (scrubbing), using two parity drives, easy use from several computers,... [EDIT added:] automatic error correction for "bit rot"
« Last Edit: April 28, 2012, 06:50:09 pm by alain »
Logged

alain

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 465
Re: Drive capacity/fullness vs. performance
« Reply #55 on: April 28, 2012, 06:47:35 pm »

...
Hot files are: scratch space, preview/cache files, work-in-progress PSDs and TIFFs.

Cold files are: those pixels that haven't been touched in a month, let alone today.

It really does not make sense spending extra money on fast large DAS for cold files. At least not until we have more reliable file systems that can be both resilient and fast, by pooling (aggregating) those disks together.

So I would bias the budget for DAS to be small, but as fast as practical for the size I need for daily work: hot files.

And I'd bias the budget for NAS to be large, not as fast, but higher availability, for the cold files. Plus I get SMART and UPS monitoring built-in, a more reliable file system, and automated replication features (to either an on-site or off-site duplicate NAS or cloud storage).

I could even have a "sweep" script that moves all fast DAS files to the NAS once or twice a day. And then after 7 days of aging, deletes them off the NAS. This, in case I "forget" to move my hot files to more reliable storage.

GigE NAS comes close to single local disk performance, ~ 90MB/s is reasonable to expect although I've seen them push 110MB/s.

This is indeed a good reason for a NAS.  The more I read on the web, the more ZFS seems to be getting better. 

Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #56 on: April 28, 2012, 07:00:55 pm »

This is a valid reason, but I would stay away from RAID-5.

I agree. It can be used in small arrays, with not a lot of data, and good (frequent) backup plans. But TB drives can take days to restripe, as you point out. It might be better than no fault tolerance, but in enterprise situations it's borderline malfeasance to setup RAID 5 for important data these days. Especially big data, many disks, or large disks.

Quote
Two 3TB disks in raid1 are fast and the parity protects you from some disk errors.

There is no parity used in RAID 1. Conventional RAID 1 involves making identical block copies between two block devices (physical or virtual).[1] There is no error correction offered by RAID 1 except what's offered by the disk firmware themselves. If the disks return what they each consider valid data, but actually conflict, neither the RAID implementation nor file system can resolve the ambiguity, to determine which block of data is correct.

Same with RAID 5, while there is parity, it's ambiguous whether the data is correct or the parity is. With RAID 6 dual parity, it's unambiguous.

The advantage of resilient file systems is they can resolve these ambiguities, and self-heal.


[1] Btrfs implements RAID 1 that is chunk based, and doesn't require pairs. So it means "at least two copies of data and metadata on separate disks" rather than identical disk pairs.
Logged

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #57 on: April 28, 2012, 07:28:55 pm »

This is indeed a good reason for a NAS.  The more I read on the web, the more ZFS seems to be getting better. 

I'm actually substantially more familiar with Btrfs than ZFS, although they have similarities. I've been using Btrfs for about three years, and recently have committed one of my backups to it. But it is not yet considered production ready, as development is still heavy. Of the resilient copy-on-write file systems, ZFS is the most mature and is production stable.

Another plus of such file systems, is very fast file system checking and repairing. With today's journaled file systems, the idea is the journal doesn't actually make the file system more reliable, it simply makes it much faster to check it for consistency after a crash or power failure. If there is any inconsistency in the journal, it requires a fully traversed file system check and repair, and on multi-TB file systems found on arrays this can take hours if you're lucky or days if the file system is very large.

Copy On Write file systems include file system metadata while write operations occur and are considered always consistent (except when they aren't). ZFS doesn't even have an fsck tool to this day. So their consistency by design aids in getting back online quickly in the event of crash or power failure. The ZFS equivalent to a full traversal of the file system is scrubbing, which is an online background process - meaning your data is fully available while the file system is being checked and any corrupted files are repaired. Btrfs scrub works similarly. I'd expect ReFS to have similar features.
Logged

Farmer

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2848
Re: Drive capacity/fullness vs. performance
« Reply #58 on: April 28, 2012, 08:14:01 pm »

FWIW, I have local drives, direct attached (eSATA) backups, local NAS (RAID5, but I don't mind if it takes time to rebuild or has to be completely recopied it's just for uptime convenience), offsite single disk (multiples, but meaning no RAID) and cloud (Crashplan - via a hard disk upload and online maintenance).

I'm not a pro photog, but my images are important to me and this is a cost effective system that works for me (and has for many years, excluding Crashplan which is new).

With a new PC on the nearby horizon, I'll likely indulge in a vanity-build and get some technology that I don't need, and at the same time I'll have a look at my redundancy, availability/accessability and backup options.

My data is in original raw, copies in DNG (which has a checksum) and derived works in varying formats over the years (TIFF, PSD, .xmp etc).  I use PS and LR (not being a pro, I can afford to dabble and test and experiment and play and <insert appropriate variation here> etc).

If you look at the majority of photographers, I dare say that asking them to become heavily involved in alternate file systems and other operating systems etc is folly.  Many don't impliment currently available techniques nor have any desire to spend more time in front of the computer than necessary (let alone spending it on non-photography related tasks).

Simple products that are reasonably fast and require minimal maintenance and setup are ideal.  This is the big picture that seems to often get missed by technical folks in these discussions (and whilst I'm not at the level of some of the people here with the technology, I am able to follow it and could impliment it reasonably easily, I have a long background with computers, including various *nix flavours etc).  Most photogs and most users generally don't have that level of knowledge nor do they want it.  Offering solutions that require an IT admin usually won't be accepted.  So, we're looking at compromises that will actually be used.
Logged
Phil Brown

chrismurphy

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 77
Re: Drive capacity/fullness vs. performance
« Reply #59 on: April 28, 2012, 11:43:37 pm »

I appreciate the conversation, and the sharing of your setup. Off-site backup (e.g. cloud) is missing from significant majorities of photographer's plans, and there's much to consider: cost, upload times, what to backup (if not everything), encryption and privacy control, and so on.

If you look at the majority of photographers, I dare say that asking them to become heavily involved in alternate file systems and other operating systems etc is folly.

I understand the concern. But I'm not asking what you must think I'm asking.

For example, most everyone has a wireless router. These routers have alternate file systems and operating systems on them (routers don't run Mac OS or Windows). Do users get heavily involved in either the router file or operating system? No. Why not? Default behavior and the web interface abstracts them from such things.

Quote
Many don't impliment currently available techniques nor have any desire to spend more time in front of the computer than necessary (let alone spending it on non-photography related tasks).

I agree. But in some ways I don't blame people for not wanting to implement current ad hoc techniques that even the geeks and pros don't universally agree on.

Why not make your printed editions and throw the digital files away? All of them. Print files, Raws, DNGs. It's not a new concept. Artists destroyed strike plates well before digital came long. So I'll even refuse the premise that backing up and archiving is an inherently good or required behavior. It's only good if you particularly value those photos beyond producing a printed edition, enough to protect photos with a constant supply of cash rather than buying something else.

Quote
Simple products that are reasonably fast and require minimal maintenance and setup are ideal.  This is the big picture that seems to often get missed by technical folks in these discussions (and whilst I'm not at the level of some of the people here with the technology, I am able to follow it and could impliment it reasonably easily, I have a long background with computers, including various *nix flavours etc).

Not every conversation is a recommendation, or one that's designed to call people to action rather than simply consideration of alternatives.

Technical folks come in lazy versions. I don't like fixing the same thing over and over again. I'm not looking for high maintenance and setup. I'm not eager to suggest overly complicated solutions, as they tend to be fragile. No one likes that.

Quote
Most photogs and most users generally don't have that level of knowledge nor do they want it.  Offering solutions that require an IT admin usually won't be accepted.  So, we're looking at compromises that will actually be used.

And that's sensible. I can't imagine it working any other way. The dilemma is really about choosing the familiar old versus the unfamiliar new, not about choosing simple versus complicated. Not many people intentionally (or rationally) choose complicated over simple if the outcomes were the same.
Logged
Pages: 1 2 [3] 4 5   Go Up