Topic: Camera calibration using a neural network: questions (Read 11245 times)

GWGill · « **Reply #20 on:** May 19, 2019, 09:45:39 pm »

Quote from: Guillermo Luijk on May 17, 2019, 06:17:59 am

A question for GWGill: in LUT based ICC profiles, who takes the decision of how many samples are those LUT made of?

I can't speak for other profiling software, but in ArgyllCMS it is set by the "quality" parameter, or by an explicit override parameter. (I've often considered changing the name of "quality" to "speed" or "slowness", since people often misinterpret the tradeoffs being made.)

Quote

And secondly: how/where is later decided the way to interpolate between the LUT values? for example when opening profiled image data in Photoshop and assigning it to a given ICC profile. What kind of interpolation used here, just linear or something more sofisticated?.

By convention it is linear interpolation, although the implementation has discretion on exactly how this is done. Typical choices are multi-linear or simplex interpolation. The latter is faster, and will have better accuracy when the colorspace has the neutral axis along the diagonal (i.e. device spaces). The former is often better when the output space has the neutral parallel to an axis (i.e. L*a*b* output space). I guess if an implementation wanted to use higher order interpolation it could, but speed would suffer greatly, and memory consumption may get high. (Although it's not usual, consider that the interpolation could be in up to 15 dimensional space. The higher order terms will be very numerous.)

Jack Hogan · « **Reply #21 on:** May 20, 2019, 03:59:41 am »

Quote from: 32BT on May 17, 2019, 04:30:11 am

In that respect it would probably be a better experiment to let the NN solve the entire path from camera RGB to perceptual Lab.

Reading through the thread, this was also my first thought. Possibly stopping at XYZ because of the rudimentary way that Lab deals with adaptation. Although Lab would be useful for a perceptual DE2000-like cost function.

Thinking outloud, get the Spectral Reflectances of each patch and generate corresponding XYZ/LAB values for a large set of synthetic illuminants (e.g. P1800, P1900, ..., A, P3000, ..., D4000, D4200, D4400. .., D10000 or maybe just concentrate on daylight or whatever). Then take it from there.

Jack

Jack Hogan · « **Reply #22 on:** May 20, 2019, 04:18:39 am »

Quote from: Guillermo Luijk on May 17, 2019, 06:17:59 am

Being the XYZ to Lab conversion (or to any RGB output colour space) a well known deterministic formula (much likely as WB is), I don't see the point in making it a part of the NN.

One of the problems with Lab is that it is not bad but also not good: it is not very perceptually 'accurate' and not very good at adapting for different illuminants (the values are normalized to the white point of the illuminant, i.e. XYZ scaling, which is considered to be the worst kind of adaptation). When you solve for a matrix using the normal equation with reference values in XYZ, results are not nearly as good as when using DE2000 as a cost function, which gives more weight to perceptual effects.

And so called 'WB' (area under SSFs normalized to 1) is a completely arbitrary convention: only one matrix is needed to project to wherever one wants to go, though it may be broken down as the product of 2 or more (for instance M = diag(WB)*rgb_2_xyzW1*xyzW1_2_xyzW2*xyzW2_2_sRGB, shown in reversed order for clarity).

So it would be good if the learning algorithm were able to incorporate the latter while dealing with the non-linear perceptual issues of the former.

Jack

Jack Hogan · « **Reply #23 on:** May 20, 2019, 05:16:11 am »

Quote from: Guillermo Luijk on May 17, 2019, 06:17:59 am

in LUT based ICC profiles, who takes the decision of how many samples are those LUT made of? And secondly: how/where is later decided the way to interpolate between the LUT values? for example when opening profiled image data in Photoshop and assigning it to a given ICC profile. What kind of interpolation used here, just linear or something more sophisticated?

FYI in the DNG/dcp world the look-up tables can be any size but are often 90x30x30 (Hue, Saturation, Value, with Value often gamma encoded). HSV is a cylinder reached from XYZ via ProPhotoRGB. Resulting values are interpolated 'tri-linear'ly.

Jack

Guillermo Luijk · « **Reply #24 on:** May 21, 2019, 03:28:49 pm »

Quote from: Jack Hogan on May 20, 2019, 05:16:11 am

FYI in the DNG/dcp world the look-up tables can be any size but are often 90x30x30 (Hue, Saturation, Value, with Value often gamma encoded). HSV is a cylinder reached from XYZ via ProPhotoRGB. Resulting values are interpolated 'tri-linear'ly.

Jack

That's quite a compact definition Jack! Thanks for all your comments.

I began to try some NN for the WB-RAW RGB to XYZ conversion. Brute force for now, not caring of reserving a validation set nor overfitting effects.

First I tried a simple NN with no hidden layers. This is equivalent to a 3x3 matrix plus three bias terms (want to investigate if the bias terms can be set to 0 before training to get a real 3x3 camera matrix conversion):

Correlations show the good intentions of the linear transform but are far from perfect. I have no clear interpretation of the fact that the largest errors take place along the gray patches. Any idea?

These are the weights of the NN (i.e. the 3x3 RGB to XYZ matrix):
[array([[ 0.79471269, 0.35876139, 0.06018896],
[ 0.27784208, 0.99636603, -0.26390342],
[ 0.12707532, -0.11484679, 1.21830966]])]

and here the bias terms (0..1 range), very close to 0 as expected:
[array([-0.01081357, -0.01022717, -0.00390697])]

Then I tried a dense 2 hidden layers NN with 200 neurons each. The result improves a lot (gray patches get their colour right), but I must confess I expected a perfect fit for such a dense NN and there're still some deviations:

Will work on the XYZ to Lab conversion to measure DeltaE's before training more NN structures.

The code here.

Regards

Jack Hogan · « **Reply #25 on:** May 22, 2019, 03:04:45 am »

Quote from: Guillermo Luijk on May 21, 2019, 03:28:49 pm

Correlations show the good intentions of the linear transform but are far from perfect. I have no clear interpretation of the fact that the largest errors take place along the gray patches. Any idea?

These are the weights of the NN (i.e. the 3x3 RGB to XYZ matrix):
[array([[ 0.79471269, 0.35876139, 0.06018896],
[ 0.27784208, 0.99636603, -0.26390342],
[ 0.12707532, -0.11484679, 1.21830966]])]

and here the bias terms (0..1 range), very close to 0 as expected:
[array([-0.01081357, -0.01022717, -0.00390697])]

Assuming that the input to the matrix is white balanced data and that the matrix is in the form shown in Figure 1 here, when the rgb input is a neutral tone the xyz output will be proportional to the sum of the matrix rows. For instance, with rgb = [1,1,1] you should get the illuminant white point in xyz; rgb = [0.18,0.18,0.18] should result in 0.18 of those coordinates, etc. If you know the white point of the illuminant in XYZ, you only have to solve for 6 variables (vs 9).

The matrix is a compromise, which means that some tones will be better achieved through it than others. If you use the normal equation to solve for the matrix you get the maximum likelihood solution. Your network found another solution, one where the vertex of the cube (white point) is not very accurate: with an overdetermined system like this one if you don't specify a criterion you can end up with an infinite number of them.

In your matrix the white point in XYZ (the sum of rows, normalized so that green is 1) is [1.2013 1.0000 1.2180] , which corresponds to a CCT of 4327K with large Duv of -0.036. Does this sound plausible - in other words, what was the illuminant at the time of capture? Have you tried the normal equation as a reference?

If you make the rgb data before white balance available with their reference values I have routines to easily calculate a 'normal' matrix and one based on a de2000 cost function. I am not used to python but use Matlab instead.

Jack

32BT · « **Reply #26 on:** May 22, 2019, 04:26:35 am »

Quote from: Guillermo Luijk on May 21, 2019, 03:28:49 pm

First I tried a simple NN with no hidden layers. This is equivalent to a 3x3 matrix plus three bias terms (want to investigate if the bias terms can be set to 0 before training to get a real 3x3 camera matrix conversion):

Are the output nodes linear functions or sigmoid functions? In the latter case the NN tries to overcome the curve, which it likely can't in such a small NN. Clearly, you don't want that, unless you output to Lab.

32BT · « **Reply #27 on:** May 22, 2019, 04:32:05 am »

Also: are you comparing your specific sample of the IT8 with an average of several IT8? Or with a well measured version of your specific sample?

Guillermo Luijk · « **Reply #28 on:** May 27, 2019, 08:47:54 pm »

Quote from: Jack Hogan on May 22, 2019, 03:04:45 am

If you make the rgb data before white balance available with their reference values I have routines to easily calculate a 'normal' matrix and one based on a de2000 cost function. I am not used to python but use Matlab instead.

Thanks for such valuable information Jack, all this is new to me, will look at it closely. I'll send you the RGB file over the weekend in case you want to have a look at it.

Quote from: 32BT on May 22, 2019, 04:26:35 am

Are the output nodes linear functions or sigmoid functions? In the latter case the NN tries to overcome the curve, which it likely can't in such a small NN. Clearly, you don't want that, unless you output to Lab.

Also: are you comparing your specific sample of the IT8 with an average of several IT8? Or with a well measured version of your specific sample?

For regression with MLP I always use linear output (identity function). In the hidden layers of the second NN, after trying different activation functions I stayed with ReLU (it's also linear in the positive range but clips negative values).
I am comparing the IT8 shot vs an accurate measurement of that precise chart (the author of the chart and of the measurement claims a measurement error below DeltaE=0,1).

I calculated the deltaE for the two NN, and the results seem nice. This is the Delta E distribution over the 288 patches for the linear NN and the dense NN (2 hidden layers with 200 neurons each):

If this is not wrong:
""" ΔE Quality:
<1 = Excellent (imperceptible)
1-2 = Good
2-4 = Normal
4-5 = Sufficient
>5 = Bad
"""

Even the linear solution produces good results for all 288 patches. The deep NN has a Max(DeltaE)=0.064, that is considered really good right?.

Regards

Guillermo Luijk · « **Reply #29 on:** May 27, 2019, 09:35:40 pm »

I just realised I wrote a superb XYZ to Lab routine, but didn't use it before calculating the Delta E values

So the previous histogram represents euclidean XYZ, not Lab, distances. Will calculate it right tomorrow.

Regards

32BT · « **Reply #30 on:** May 28, 2019, 03:09:52 am »

My advice would be to stop using the 2x200 NN. It really obscures a lot of potentially interesting results. Use instead something like 2x4 with a sigmoid function. (Or at least add it as an additional configuration.)

Guillermo Luijk · « **Reply #31 on:** May 28, 2019, 03:16:43 pm »

Back to reality, when properly calculating Delta E over Lab values the scale gets around 2 orders of magitude higher:

() NN:
ΔE_max = 29.579874621089253 and ΔE_mean = 3.4460693382616014

(200, 200) NN:
ΔE_max = 7.035319348235826 and ΔE_mean = 1.035222151090219

Yes, I want to try sigmoids and also less complex NN structures. And I'll train the NN to produce straight Lab output; I guess this makes sense for powerful NN's that can model the non-linear transformations, but minimizing the losss (error) in the final space where Delta E's are going to be measured. The ideal training should minimize the Delta E formula, but unfortunately the loss function can't be set arbitrarily.

Regards

Guillermo Luijk · « **Reply #32 on:** May 28, 2019, 07:26:58 pm »

Tested (200, 200) Lab output NN with sigmoid function in hidden layers and identity output:

All three variables (L,a,b) show very low errors, although colour accuracy is higher than luminance which shows higher variance.

(200, 200) NN:
ΔE_max = 5.659828256166601 and ΔE_mean = 0.8287066247766278

It definitively improves the result. Now mean error is under 1, which I've been told is an excellent result. Max error also gets reduced to less than 6.

I also tested the 3x3 matrix model for RGB to Lab conversion, but as expected the linear model is totally unusable in approximating such a non-linear transformation.

Regards

32BT · « **Reply #33 on:** May 28, 2019, 11:56:20 pm »

Ha, now it would be interesting to know the results for 2x3, then 2x4, 2x5, 2x6 etc and see whether it yields an optimum.

Additionally it remains interesting to know how a 1x3 or 1x4 sigmoid would solve rgb to lab...

Guillermo Luijk · « **Reply #34 on:** June 02, 2019, 11:19:31 am »

I have loop nested several NN's to save training time. I don't like too much this way of doing things (just try a gridsearch combination of hyperparameters and see which one performs best, without knowing why), but here it is. The format is: XYZ/Lab output, NN hidden layers, hidden layers activation function, output activation function:

MLP_XYZ_()_relu_identity : ΔE_max = 29.5782 , ΔE_mean = 3.4459 , ΔE_median = 2.4032
MLP_Lab_()_relu_identity : ΔE_max = 82.2025 , ΔE_mean = 28.3684 , ΔE_median = 21.2329
MLP_XYZ_()_logistic_identity : ΔE_max = 29.5782 , ΔE_mean = 3.4459 , ΔE_median = 2.4032
MLP_Lab_()_logistic_identity : ΔE_max = 82.2578 , ΔE_mean = 28.3821 , ΔE_median = 21.2219
MLP_XYZ_(3, 3)_relu_identity : ΔE_max = 108.3430 , ΔE_mean = 41.8331 , ΔE_median = 37.0897
MLP_Lab_(3, 3)_relu_identity : ΔE_max = 112.5269 , ΔE_mean = 42.6017 , ΔE_median = 39.5577
MLP_XYZ_(3, 3)_logistic_identity : ΔE_max = 23.9131 , ΔE_mean = 4.5645 , ΔE_median = 2.9977
MLP_Lab_(3, 3)_logistic_identity : ΔE_max = 77.6024 , ΔE_mean = 25.7275 , ΔE_median = 20.5809
MLP_XYZ_(50, 50)_relu_identity : ΔE_max = 13.7616 , ΔE_mean = 2.1762 , ΔE_median = 1.6024
MLP_Lab_(50, 50)_relu_identity : ΔE_max = 12.9015 , ΔE_mean = 3.6170 , ΔE_median = 3.1430
MLP_XYZ_(50, 50)_logistic_identity : ΔE_max = 22.5918 , ΔE_mean = 4.0708 , ΔE_median = 2.6891
MLP_Lab_(50, 50)_logistic_identity : ΔE_max = 6.0237 , ΔE_mean = 0.9943 , ΔE_median = 0.6923
MLP_XYZ_(200, 200)_relu_identity : ΔE_max = 7.0373 , ΔE_mean = 1.0364 , ΔE_median = 0.6827
MLP_Lab_(200, 200)_relu_identity : ΔE_max = 7.4150 , ΔE_mean = 1.1333 , ΔE_median = 0.8822
MLP_XYZ_(200, 200)_logistic_identity : ΔE_max = 14.8826 , ΔE_mean = 2.7814 , ΔE_median = 1.8480
MLP_Lab_(200, 200)_logistic_identity : ΔE_max = 5.6598 , ΔE_mean = 0.8287 , ΔE_median = 0.4912
MLP_XYZ_(200, 200, 200)_relu_identity : ΔE_max = 6.3270 , ΔE_mean = 1.2530 , ΔE_median = 0.7609
MLP_Lab_(200, 200, 200)_relu_identity : ΔE_max = 7.3421 , ΔE_mean = 0.9603 , ΔE_median = 0.7042
MLP_XYZ_(200, 200, 200)_logistic_identity : ΔE_max = 14.4747 , ΔE_mean = 2.7297 , ΔE_median = 1.9047
MLP_Lab_(200, 200, 200)_logistic_identity : ΔE_max = 5.6715 , ΔE_mean = 0.7346 , ΔE_median = 0.3988

I find that the best tradeoff between complexity and performance is:
MLP_Lab_(50, 50)_logistic_identity : ΔE_max = 6.0237 , ΔE_mean = 0.9943 , ΔE_median = 0.6923

Training loss:

Prediction vs Real correlation:

Again L seems to contain more errors than colour (a,b). I still didn't check which patches worked best and worse.

Delta E distribution:

I also defined and checked some needed conversion functions:

XYZ (D50) to Lab conversion
Lab to XYZ (D50) conversion
XYZ (D50) to sRGB (D65) conversion
XYZ (D50) to ProPhoto RGB (D50) conversion
Delta E calculation

I did a complete prediction with the NN over the input RAW values, and compared it to the expected theoretical values (right half rectangle on each patch). Obviously something went wrong because large errors are clearly visibe. I need to check where is the fault.

Regards

32BT · « **Reply #35 on:** June 02, 2019, 11:54:53 am »

Quote from: Guillermo Luijk on June 02, 2019, 11:19:31 am

I did a complete prediction with the NN over the input RAW values, and compared it to the expected theoretical values (right half rectangle on each patch). Obviously something went wrong because large errors are clearly visibe. I need to check where is the fault.

Good progress, and well done.

A useful observation is that gray at least remains neutral. If you figure out why it is too bright, it may show you why the colors are off. Something as simple as clipping may be the culprit.

What surprises me though is that (3, 3) doesn't give better results already. Is the logistic node positive output only? I'm pretty sure that something like (3, 4, 4, 3) should absolutely suffice. The additional layers may be necessary for the NN to scale the values internally. The 4 nodes layers should be optimal in the same way that affine transforms are 4x4 matrixes. If you allow the NN to contort the colorcube using 4 degrees of freedom (if you will) then it should absolutely be able to come close to perfect.

Jack Hogan · « **Reply #36 on:** June 03, 2019, 11:28:46 am »

Quote from: Guillermo Luijk on May 28, 2019, 03:16:43 pm

Back to reality, when properly calculating Delta E over Lab values the scale gets around 2 orders of magitude higher:

() NN: ΔE_max = 29.579874621089253 and ΔE_mean = 3.4460693382616014

Thanks for the iT8 capture and relative XYZ reference values Guillermo. The matrix that takes white-balanced raw data to XYZ suggested by the Normal Equation is

0.5740 0.2535 0.0404
0.2023 0.7195 -0.1887
0.0769 -0.0907 0.8439

Note that feeding it white balanced raw white [1,1,1] results in values lower than expected (if everything is balanced there should be at least one entry near 1), [0.8680 0.7331 0.8301]. CCT of the matrix is 4141K, -.032Duv, not far from what your NN found. RawTherapee and FRV white balance readings report about 4400K/14 CCT, suggesting that the lighting at the time of capture was most likely not a good approximation of D50.

If I feed the raw data, white balanced on most neutral patch GS11 around mid-gray and normalized so that it is the same value as the relative reference Y, to the dE76 optimizing routine using the XYZ spectral measurements you provided - but White Point provided by a blackbody at 4400K (that's as good a guess as any at this point) - we get the first attachment, which is comparable to what you did. If I change the WP to a blackbody at 5000K little changes. Most of the action is at less than 10 dE76, I have never played with a target with so many patches so I don't know whether this is good or poor performance: Max dE76 21.63, mean dE76 3.59, SMI 80.5.

I prefer dE2000 as a metric, so using it and a P5000K WP results in the second attachment. This last one (or one with the correct WP) is the one I would use in practice, and this is the resulting matrix from wbraw to sRGB(D65) under this lighting:

1.6054 -0.5544 -0.0510
-0.1025 1.3965 -0.2940
0.0109 -0.4450 1.4341

But back to the point about raw matrix values being low: roughly 85% of expected from the normal equation and k of about 86% in the attachments suggests uneven lighting. I did not adjust the image for light gradients, did you? There is apparently about a 6% gradient from the white square near A1 (9770 DN) to the one near L22 (10370 DN). Also when the image fills so much of the frame it is likely that there is some light falloff that could mess with captured values compared to the well controlled spectro measurements even at f/16. Lack of linearity in the capture may be a hint as to the errors you are seeing in your side-by-side comparison.

Jack

Jack Hogan · « **Reply #37 on:** June 03, 2019, 01:05:00 pm »

Comparison Adobe DNG Forward Matrix + HSV table rendition vs the raw data rendered by the wbraw->xyzD50 matrix above only, this time to Adobe RGB (so gotta look at it with a properly color managed viewer). Some differences, mostly in the darker tones, mine are the darker ones.

Guillermo Luijk · « **Reply #38 on:** June 03, 2019, 05:27:21 pm »

Thanks for the good stuff Jack. Definitively I need to better understand the implications of white balance and capture lighting in the whole process. Will read your article and posts carefully.
I didn't correct for light gradients but checked the four gray squares in the RAW data and they seemed OK to me (topleft, topright / bottom left, bottom right values):
(35,35,34) (35,35,35)
(36,36,35) (35,35,35)

Regarding the appropiate patches to calculate WB, I used them all in average discarding GS0 and GS1, that according the measurements were by far the less accurate (GS7 seems best):

Surprisingly I ran the prediction again from scracth ang got this:

I think in the former wrong prediction I applied the NN over and already converted to ProPhoto RGB version (DCRAW output). Anyway, clearly neutral patches are the weak point of the prediction, I need to understand this, specially why only gray patches seem to have large L errors. I could understand all patches would (L has a different scale as a/b after all, and I didn't normalise the Lab data to train the NN), but only the neutral ones?.

Maybe the key is that GS predicted patches are darker than the exact ones, while the prediction for the column 16 is lighter than the exact values, and this could be fooling the NN if measured chart values mismatch. I.e. for some reason (spectrophotometer vs camera behaviour differences vs printed inks) lower captured RAW values correspond to brighter patches and viceversa:

Patch L16 has higher L values than GS23 in RAW_WB, but a lower L value in the theoretical chart:
- RAW_WB: L16=12,94 GS23=11,38
- Lab: L16=3,66 GS23=6,72

There is not a continuous solution for such crossover. I will train the NN alternatvely dropping column 16 and then GS patches.

Regarding the Delta E calculation, I saw the dE2000 metric and got lost in the formulation. It would be great if it could be used as a loss function for the NN training, but just for testing purposes I'll stick on the primitive dE76.

Regards!

32BT · « **Reply #39 on:** June 04, 2019, 02:19:14 am »

Could it be an indexing problem? Some array index off by 1?

Author Topic: Camera calibration using a neural network: questions (Read 11245 times)