Jim Kasson


« Reply #80 on: May 29, 2014, 12:00:52 PM » 
Reply

Here are the MTF10 (or MTFNyquist, whichever is higher) plots for a simulated Otus 55mm f/1.4 lens from f/2.8 through f/16 and sensel pitch 2 um through 5.7 um. The MTF units are cycles/picture height, assuming a 24x36mm sensor. In 3D: As a contour plot, with sensel pitch across the bottom and fstop the vertical axis: As a family of curves in 2D. You can see the places where the MTF10 occurs at a greater spatial frequency than the Nyquist frequency on the two coarsest sensel pitches  they're the flat spots on the curves: And as a quiver plot: Going from MTF50 to MTF10 as the metric makes it look like sensor resolution is relativly more valuable than lens resolution. Jim


« Last Edit: May 29, 2014, 12:12:36 PM by Jim Kasson »

Logged




BartvanderWolf


« Reply #81 on: May 29, 2014, 12:56:19 PM » 
Reply

Going from MTF50 to MTF10 as the metric makes it look like sensor resolution is relativly more valuable than lens resolution.
Hi Jim, Thanks for the effort. Indeed, it's relatively easier to achieve more resolution by denser sampling, unless lens quality or diffraction throw a spanner in the works. Interesting to see my observations confirmed that the (steepest ascent) improvements also point towards narrower apertures than with MTF50 as a quality metric. It also seems to confirm that a diffraction pattern diameter of 1.5  2 x the sensel pitch is where diffraction kicks in at the highest spatial frequencies, which was of course already explainable from more sensels joining in to generate resolution even with perfect phase alignment. Cheers, Bart



Logged




Jim Kasson


« Reply #82 on: May 29, 2014, 01:05:28 PM » 
Reply

The cascaded kernel will also grow significantly with each additional convolution, but ultimately it should produce fewer multiplication+addition operations because you end up with a single convolution kernel for all image inputs. Also, given that the resulting kernel may turn out looking pretty simple, it would be possible to truncate or filter it to a smaller kernel support size which would also speed up things once the cascaded kernel is available in floating point precision. And once you indeed find that you can replace the compound result with a separable Gaussian, things can be sped up hugely.
But the difficulty is in finding the fastest way to cascade the kernels. Maybe a Symbolic solver like Mathematica can assist, although having to deal with discrete sampled sensel apertures instead of continuous point samples does make life more difficult.
Bart, that helps, but I'm still a bit confused. Associativity applies to convolution, right? So, if I can appropriate "*" as the symbol for convolution, and I is an image, and k1, k2, and k3 are kernels, here's what I'm doing now: FilteredImage = k3 * (k2 * (k1 * I)) But if I did this, I should get the same result: FilteredImage = I * (k3 * (k1 * k2)) However, when I use discrete convolution filtering on an image, the outer pixels are not accurate, so to make this work, I need to pad the kernels to larger sizes by adding zeros, and then crop the result back down after each convolution. So I don't need the symbolic approach. I must be missing something here. Any help is appreciated. Jim



Logged




BartvanderWolf


« Reply #83 on: May 29, 2014, 02:09:51 PM » 
Reply

Bart, that helps, but I'm still a bit confused. Associativity applies to convolution, right? So, if I can appropriate "*" as the symbol for convolution, and I is an image, and k1, k2, and k3 are kernels, here's what I'm doing now:
FilteredImage = k3 * (k2 * (k1 * I))
But if I did this, I should get the same result:
FilteredImage = I * (k3 * (k1 * k2))
However, when I use discrete convolution filtering on an image, the outer pixels are not accurate, so to make this work, I need to pad the kernels to larger sizes by adding zeros, and then crop the result back down after each convolution. Hi Jim, That's correct, the kernels need to be padded, the resulting kernel grows with each convolution. I think you only crop the final result, to keep full accuracy. Of course, when the final kernel edges don't add enough to be significant, that kernel can be cropped or windowed a bit more smoothly. So I don't need the symbolic approach. Indeed, it's not needed, but it sometimes allows to simplify calculations (like separation of a Gaussian). In this case, there are so many variables involved that there may be little simplification possible, unless the final kernel can be relatively accurately approximated by a Gaussian. I'm just hoping for some speedup to reduce the waiting for a result to happen, that's all. Cheers, Bart



Logged




Jim Kasson


« Reply #84 on: May 29, 2014, 02:19:44 PM » 
Reply

That's correct, the kernels need to be padded, the resulting kernel grows with each convolution. I think you only crop the final result, to keep full accuracy. Of course, when the final kernel edges don't add enough to be significant, that kernel can be cropped or windowed a bit more smoothly.
Indeed, it's not needed, but it sometimes allows to simplify calculations (like separation of a Gaussian). In this case, there are so many variables involved that there may be little simplification possible, unless the final kernel can be relatively accurately approximated by a Gaussian. I'm just hoping for some speedup to reduce the waiting for a result to happen, that's all. Thanks, Bart. That helps. I don't think I'll have to worry about the time necessary to convolve the kernels. Even if I pad the heck out of them, they'll still be a lot smaller than my 12000x8000 pixel target. Jim



Logged




eronald


« Reply #85 on: May 30, 2014, 06:54:17 AM » 
Reply

Thanks, Bart. That helps. I don't think I'll have to worry about the time necessary to convolve the kernels. Even if I pad the heck out of them, they'll still be a lot smaller than my 12000x8000 pixel target.
Jim
I think it's much faster to apply the convolution as a product in the fourier domain because of the FFT speedup. In other words if F is the filter and I the image, paradoxically instead of doing F*I you go much faster doing invFourier (Fourier(F) Fourier(I) ) . Edmund


« Last Edit: May 30, 2014, 06:56:43 AM by eronald »

Logged




BartvanderWolf


« Reply #86 on: May 30, 2014, 08:02:10 AM » 
Reply

I think it's much faster to apply the convolution as a product in the fourier domain because of the FFT speedup. In other words if F is the filter and I the image, paradoxically instead of doing F*I you go much faster doing invFourier (Fourier(F) Fourier(I) ) .
For large images that's correct. Of course there are some potential pitfalls when going from continuous (optical) input signals such as diffraction, to a discrete representation. But once we have a final cascaded set of discrete kernel values, and if the image is large enough to benefit from requiring fewer multiplications with added overhead of conversions to and from the frequency domain, then multiplication in Fourier space is a useful speedup. It would certainly be an optimization worth considering once the spatial domain approach is working as intended. That would also allow a comparison to see whether errors are made in the conversion algorithms (which may require padding with certain values, and/or windowing), or whether machine precision calculations create issues for values near zero. Cheers, Bart


« Last Edit: May 30, 2014, 08:04:35 AM by BartvanderWolf »

Logged




eronald


« Reply #87 on: May 30, 2014, 02:23:38 PM » 
Reply

Jim has 12K by 8K images, the speedup should be huge. But of course he knows this already Edmund For large images that's correct. Of course there are some potential pitfalls when going from continuous (optical) input signals such as diffraction, to a discrete representation. But once we have a final cascaded set of discrete kernel values, and if the image is large enough to benefit from requiring fewer multiplications with added overhead of conversions to and from the frequency domain, then multiplication in Fourier space is a useful speedup.
It would certainly be an optimization worth considering once the spatial domain approach is working as intended. That would also allow a comparison to see whether errors are made in the conversion algorithms (which may require padding with certain values, and/or windowing), or whether machine precision calculations create issues for values near zero.
Cheers, Bart



Logged




Jim Kasson


« Reply #88 on: May 31, 2014, 05:45:07 PM » 
Reply

Jim has 12K by 8K images, the speedup should be huge. But of course he knows this already Edmund and Bart, I had already started coding up the kernelconvolving approach, so I thought I'd finish it. I set up a class for color kernels that could have different values for each plane and knew how to expand and contract themselves under operations like this: a = aColorKernel.convolveWith(anotherColorKernel); It took a while to get that coded and debugged. When I ran it, I found that it was slower than the original code, and used one virtual core most of the time versus the original's using at least 12. So I wrote a little test of the FFT way of doing things: And when I ran it, I saw this: So I'm going to rerecode. And to Edmund I say the three sweetest words in the English language: "You were right." Jim



Logged




eronald


« Reply #89 on: May 31, 2014, 10:07:10 PM » 
Reply

Edmund and Bart, I had already started coding up the kernelconvolving approach, so I thought I'd finish it. I set up a class for color kernels that could have different values for each plane and knew how to expand and contract themselves under operations like this: a = aColorKernel.convolveWith(anotherColorKernel); It took a while to get that coded and debugged. When I ran it, I found that it was slower than the original code, and used one virtual core most of the time versus the original's using at least 12. So I wrote a little test of the FFT way of doing things: And when I ran it, I saw this: So I'm going to rerecode. And to Edmund I say the three sweetest words in the English language: "You were right." Jim Jim, I'm like a broken clock: precisely right twice a day, and quite exasperating all day long I must have missed something  what's the speedup factor on a 16Kx16K image ? I'm just seeing an absolute time for the Fourier multiplier method. Always have trouble understanding the work that other people do  so I hardly understand what you guys are saying. You're like virtuosos and I'm trying to figure out where the notes are on the piano. But I'm going to be writing some simple code meself. Just wish I could code Matlab fluently like you. Edmund PS. I suspect there is some simple optimisation which you can do to reflect the fact that your image data and filter kernel are real, so there were will be symmetries or adjuncts in fourier space. I suspect you only need to compute the multiplication in one quadrant etc. In fact all of these optimisations are probably done in the image processing toolbox, but by taking a hard look at the equations I believe you should get an immediate speedup of 4. Edmund


« Last Edit: May 31, 2014, 10:48:39 PM by eronald »

Logged




hjulenissen


« Reply #90 on: June 01, 2014, 03:11:11 AM » 
Reply

For large images that's correct.
The cost of (1d) convolution is bigO (N*M), for N samples and M coefficients The cost of doing this in the DFTdomain is bigO (K*log2(K)), where K is some number >=max(N,M) depending on padding requirements, poweroftwo performance etc. If e.g. N=M=K=512, then N*M > K*log2(K) and FFT convolution seems like the right thing. If M < log2(N) then it might not be worth it to work in the transformed domain. I guess this could be the case for an moderate or large size image and a compact convolution kernel? For large data sets (16000x16000 pixels x 3 channels x 8 byte), memory might become the bottleneck. If you work in the spatial domain, you might be able to load tiles of pixels from the source file, process, and save the partial result. I have an intuitive understanding of how to do padding in the spatial domain: repeat, reflect, etc signal samples so as to reduce discontinuity in the derivative (or 2nd derivative). In the frequency domain the right thing is not so intuitive (for me at least), as the transform does an inherent spatial "wraparound". h


« Last Edit: June 01, 2014, 03:22:09 AM by hjulenissen »

Logged




hjulenissen


« Reply #91 on: June 01, 2014, 03:15:37 AM » 
Reply

Did you try replacing var(16384,16384,3) = 0; With something like: var = zeros(16384,16384,3,'single'); If done carefully, this should force MATLAB to do its calculations in singleprecision float. In principle, the x86 hardware and the FFTW library should do singleprecision calculations at approximately twice the speed of doubleprecision calculations, but it has been my experience that MATLAB often does not behave like one might hope in this respect. h



Logged




Jim Kasson


« Reply #92 on: June 01, 2014, 11:37:25 AM » 
Reply

Did you try replacing var(16384,16384,3) = 0;
With something like: var = zeros(16384,16384,3,'single');
The first statement pads var with zeros to the large square matrix. The second fills var with zeros, wiping out the image that's already in var. At least, that's the way it appears to me; maybe I'm missing something. As to doing the operations in single precision floating point, that's a good idea, but it's down the road for me. My current FFT implementation has some odd results from sfrmat3 in the spatial frequency region approaching twice the Nyquist frequency; there are small ripples in the MTF curve, and I don't want to reduce precision until I'm getting the same results from the FFT and the convolution approaches. Thanks, Jim



Logged




Jim Kasson


« Reply #93 on: June 01, 2014, 12:03:51 PM » 
Reply

The cost of (1d) convolution is bigO (N*M), for N samples and M coefficients
The cost of doing this in the DFTdomain is bigO (K*log2(K)), where K is some number >=max(N,M) depending on padding requirements, poweroftwo performance etc.
If e.g. N=M=K=512, then N*M > K*log2(K) and FFT convolution seems like the right thing. If M < log2(N) then it might not be worth it to work in the transformed domain. I guess this could be the case for an moderate or large size image and a compact convolution kernel? I'll run some tests at various sized kernels. For large data sets (16000x16000 pixels x 3 channels x 8 byte), memory might become the bottleneck. If you work in the spatial domain, you might be able to load tiles of pixels from the source file, process, and save the partial result. I'm running this on a 192 GB machine; actually, it's a 256 GB machine, but Win 7 only recognizes the smaller number. The FFT approach is more memoryintensive than convolution, especially when I keep the old FFT'd kernels around, but at no time in this series of tests have I seen it go over 100 GB. I have an intuitive understanding of how to do padding in the spatial domain: repeat, reflect, etc signal samples so as to reduce discontinuity in the derivative (or 2nd derivative). In the frequency domain the right thing is not so intuitive (for me at least), as the transform does an inherent spatial "wraparound".
That's a good point; maybe my padding by filling with zeros is a bad idea; it's the approach recommended in the Matlab Image Toolbox support materials. Having the matrix dimensions an integer power of two is also recommended for speed. Jim



Logged




Jim Kasson


« Reply #94 on: June 01, 2014, 03:54:56 PM » 
Reply

I set up a test of convolution vs multiplying in the frequency domain wrt compute time for a 12000x8000 pixel image.. The convolution program: Yes, I know it would run faster if I just let Matlab do the three color planes together, but I need the ability to have different kernels for each color plane to simulate the wavelength dependency of diffraction. Note that I'm not blowing up the image and mirroring it at the edges, which gives convolution somewhat of an unfair advantage. The FFT program: The results: Jim


« Last Edit: June 01, 2014, 04:16:31 PM by Jim Kasson »

Logged




Jim Kasson


« Reply #95 on: June 01, 2014, 05:33:06 PM » 
Reply

In trying to figure out why the FFT version of the camera simulator gave me different results than the convolution version, I discovered that I was getting what looked to be differential shifting among the color planes after diffraction with the FFT version, giving rise to color fringing that, after a point got worse as a diffractionlimited lens was stopped down, causeing wierd MTFs curves above the Nyquist frequency. So I went back to the images from the timing test program that I posted immediately above and looked at them. They're different from each other, and the differences get greater as the kernel size increases. Here's the output of convolving the 12000x8000 slanted edge test image with a 1201x1201 pixel averaging kernel: And here's what you get with the FFT code with the same kernel: Oops! I'm padding with zeros to the right and down before I do the FFT. Should I be centering the image in the 16384x16384 pixel field? Um, maybe I've got it: the FFT method returns an image that's larger than the input image by the size of the kernel, and shifts it down and to the right. How to make the shift independent of the kernel size? Jim


« Last Edit: June 01, 2014, 06:00:21 PM by Jim Kasson »

Logged




Jim Kasson


« Reply #96 on: June 02, 2014, 03:28:04 PM » 
Reply

I've done enough experimenting to understand that right/lower zero padding of the image and the kernels prior to FFT yields, after multiplication and inverse FFT, a rightward and downward shift of the synthetically convolved image of (kernelSize  1) / 2 pixels. This would be no problem if the kernels were the same in each color plane of the image, since the output image could be brought back into register by cropping.
However, in my case, since I'm using different kernels for different planes to simulate diffraction, the shifting of the round trip through the FFT causes the planes of the output image to be out of registration with each other, which causes errors in the SFR analysis.
As I see it, I have two choices. I could keep track of the kernel sizes in each plane while I'm multiplying in the frequency domain, and shift the planes of the image after the inverse FFT. I've tested this in isolation, and it works. Alternatively, I could set up the Airy disk kernels so they're all the same pixel size, the size of the red one. The blue one will just have more rings. Right now they all have the same number of rings: 10.
I'm liking the latter approach better, since it seems less susceptible to programming errors.
I'll report back when I've results.
If anybody thinks I'm on the wrong track, sing out, please.
Jim



Logged




Jim Kasson


« Reply #97 on: June 02, 2014, 05:52:17 PM » 
Reply

As I see it, I have two choices. I could keep track of the kernel sizes in each plane while I'm multiplying in the frequency domain, and shift the planes of the image after the inverse FFT. I've tested this in isolation, and it works. Alternatively, I could set up the Airy disk kernels so they're all the same pixel size, the size of the red one. The blue one will just have more rings. Right now they all have the same number of rings: 10.
I'm liking the latter approach better, since it seems less susceptible to programming errors.
I implemented the second approach, and I now get very similar (but not identical) results for convolution and fft models. And the speed? At a target to sensor resolution ratio of 32, at f/8 and 4.77 um pixel pitch I get the same execution times. Narrower fstops and/or finer pixel pitches favor the FFT approach. With the FFT I'll be able to take the pitches to under 2 um, which was my limit before because of execution times. Matlab is known to be a rapacious consumer of memory, and the FFT has made it more so: Jim



Logged




eronald


« Reply #98 on: June 02, 2014, 09:37:48 PM » 
Reply

what I really would like to have instructions for is how to do some filtering on a synthetic image and then stuff it back in a raw converter.
Edmund



Logged




Jim Kasson


« Reply #99 on: June 02, 2014, 10:49:52 PM » 
Reply

what I really would like to have instructions for is how to do some filtering on a synthetic image and then stuff it back in a raw converter.
Me, too. Is the DNG SDK the only game in town? Jim



Logged




