Topic: Camera calibration using a neural network: questions (Read 11252 times)

Jack Hogan · « **Reply #60 on:** June 06, 2019, 12:11:19 pm »

Thanks for the new reference data Guillermo. For completeness the linear 3x3 matrix fit to the V3 reference spectro measurements results in the following:
dE76 mean = 3.569, dE76 median = 2.461, dE76 max = 21.709

The k value is now 0.961, meaning that we no longer have the earlier lightness red flag. SMI is 80.4, low by CC24 standards. The matrix from wbraw to D50 is
0.6756 0.2306 0.0514
0.2928 0.8662 -0.1590
0.0270 -0.1711 0.9726

The result of the Normal Equation is
0.6245 0.2767 0.0452
0.2213 0.8011 -0.2165
0.0851 -0.1089 0.9724
which yields a CCT of 4441K with a Duv of -0.03186.

The matrix from wbraw->sRGB is
1.6410 -0.5781 -0.0629
-0.0998 1.4205 -0.3207
0.0191 -0.4207 1.4015

Jack

Guillermo Luijk · « **Reply #61 on:** June 07, 2019, 07:19:35 pm »

Hi guys, I did a quick test to check how good interpolation is done by the NN for unseen colours. I just took the 24 gray patches and linearly interpolated 20 values between every pair of patches in their RGB_WB values. Then I predicted the final extended data set (i.e. original gray patches + their interpolations) and plotted everything (in this case I took the RAW_WB G values vs L):

blue is the exact L value
black is the NN prediction for seen patches (predictions are a bit lower than exact L for the brightest patches, and a tiny bit higher for the darkest patches, something we already noticed in the comparison chart)
red are the NN predictions for the interpolated patches

If I did it right this is very good news. I cannot actually say if the L output behaviour follows linearly the linearly interpolated patches, but one can say there is no ringing or any undesired unstable behaviour between seen patches.

Tomorrow I will do the same with some more colorful patches, and will answer your comments.

Regards

Guillermo Luijk · « **Reply #62 on:** June 08, 2019, 06:51:36 pm »

Quote from: Jack Hogan on June 06, 2019, 03:20:40 am

I am curious as to how such a network would perform with a non-linear output activation function, say tanh since it seems to work well. I am asking because the neutrals are still not quite right - perhaps because L is not linear (identity)?

As far as I know, the preferred output activation function (this is the function used in the last layer, the one that provides the output values; L, a and b in our case) for numerical regression is the identity. In other words, it is the rest of the network which is in charge of modelling the non-linearities, the output layer just performs a linear combination over its inputs. Non-linear output activation functions (like sigmoid or tanh) are preferred for logistic regression problems (i.e. classification problems). But I can do a quick test using sigmoid and tanh in the output layer.

Quote from: Jack Hogan on June 06, 2019, 12:11:19 pm

For completeness the linear 3x3 matrix fit to the V3 reference spectro measurements results in the following:
dE76 mean = 3.569, dE76 median = 2.461, dE76 max = 21.709

That is clearly better, but in the same order of magnitude as the linear NN (MLP_XYZ_()).
With a fairly simple NN (MLP_XYZ_(3, 3)_tanh_identity) the NN beats the optimum linear fit.

MLP_XYZ_()_tanh_identity : ΔE_max = 33.6562 , ΔE_mean = 3.2097 , ΔE_median = 1.8020
MLP_XYZ_(3, 3)_tanh_identity : ΔE_max = 15.3840 , ΔE_mean = 2.4645 , ΔE_median = 1.8418
MLP_XYZ_(4, 4)_tanh_identity : ΔE_max = 11.2728 , ΔE_mean = 2.0082 , ΔE_median = 1.5386

Quote from: 32BT on June 06, 2019, 03:49:22 am

What we might be seeing is a combination of overfitting and the inability of the NN to properly represent the Lab gamma curve.

1. Overfitting
If you look at the attached annotation on your L graph, you can see we have outliers (the arrows) but not what seems as random deviation.

2. Gamma curve
In the same attachment in the circle you can see something that looks like ringing. I suspect that this is a result of an inability to properly represent Lab gamma. The tanh activation curve looks somewhat like the gamma curve, but isn't. (Nor is it a linear transition in case of XYZ).

Now, in my never humble opinion I would assess the results as follows:
(50, 50) allows too much variation in curves and fitting. There are several reasons you should NOT want to make the layers that large. One vitally important reason is that NN is supposed to encode patterns compactly that are either too large for us to comprehend or too hard for us to understand, or both. By applying large NN layers for what is essentially a really simple linear matrix conversion, we are not making the solution elegantly small and succinct.

So, in this case I would ask myself what would be necessary for the NN to better match the gamma curve (or the linear curve) which I suspect will better match the overall model without overfitting? Keeping it elegantly small?

My answer would be: add another hidden layer. The NN probably just needs another step for better matching the gamma curves. And, to keep it as small as possible, I would first try (4, 4, 4) and then if it confirms the suspicion, reduce to (4, 4, 3), (3, 4, 3), and maybe (3, 3, 3).

That's very interesting insights, will give them a try. But with deep NN like (200,200,200) the improvement was none, so surely the complexity of the NN was far beyond the complexity of the problem.

What we might be having here is just an innacurate gamma curve fitting in the low end, and the undesired overfitting maybe cause by 'noise': samples having less accuracy because of noise and influence of undesired reflections in the IT8 capture. If we look at the somewhat gamma-like curve I plotted in my previous post (output L values vs input RAW_WB G), the curve doesn't converge softly to (G=0, L=0), and it should. Instead, low G values correspond to even lower than expected L values so the NN seems to be clipping the shadows. This makes me think the RAW file in the dark shadows could be contaminated by some degree of reflection on the chart. A possible solution would be to drop the darkest patches in the training set they are not respecting the sensor linear response), and synthetically introduce (R=0, G=0, B=0) -> (L=0, a=0, b=0) examples in the training set, because we really need L=0 in absence of light, but not before that.

A similar kind of issue may be taking place in the highlights: the NN has not been trained with a (R=255, G=255, B=255) -> (L=100, a=0, b=0) example, nor with partial saturations (some channel clipped while the others are fine). This may explain this undesired behaviour in partially clipped highlights (this is the NN RAW RGB_WB to Lab output, later to ProPhotoRGB):

Just using input values with -0.5EV exposure, the problem is not there:

Anyway I think this is a more complex to fix problem than the low end one, and solving all possible cases of clipped highlights is out of the scope of the exercise. In fact RAW developers need to implement complex highlight strategies to deal with this problem.

---

Before doing more simulations or picking some patch pairs to predict the interpolated colours between them, I did a brute force exercise feeding the NN with all possible RGB 8-bit combinations in a synthetic image by Bruce Lindbloom, which shows smooth gradients:

After being transformed by the NN, we get again smooth gradients in the output what makes me think again that the NN is not oscillating because of overftting when predicting in-between colours:

Maybe I'm oversimplifying my conclusions here, but if the NN would be generating unstable outputs for unseen colours, I think we should see that behaviour here, do you agree?.

Regards

Guillermo Luijk · « **Reply #63 on:** June 08, 2019, 08:38:57 pm »

Regarding the white and black point definition, reading here it seems common when creating profiles to synthetically introduce black and white points:

Add perfect synthetic D50 white and black color patches to the ti3 file
We don't want colprof to use any of the target chart color patches to set the media white and black points. Rather we want colprof to use D50 white as the media white point, and solid black as the media black point. When using a scale from 0 to 100, D50 white has the XYZ values (96.4200, 100.000, 82.4910), and solid black has the XYZ values (0.0, 0.0, 0.0). So we'll add two lines to the ti3 file:

00W 96.4200 100.000 82.4910 100.000 100.000 100.000 0.000000 0.000000 0.000000
00B 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.000000 0.000000 0.000000

I am a bit confused about which XYZ values correspond to the exact white point using the D50 illuminant:
D50 0.96422 1.00000 0.82521
or
ICC 0.9642 1.0000 0.8249

With my conversion formula taken from Bruce Lindbloom:
Lab=(100, 0, 0) -> XYZ=(0.96422, 1, 0.82521)

But on many sites I read the D50 illuminant is: 0.9642, 1.0000, 0.8249

They are very close but which one is the geunine XYZ (D50) reference white?

Regards

32BT · « **Reply #64 on:** June 09, 2019, 02:42:57 am »

The ICC white is the correct version, because all other data will be using it. i.e. the reference data will have been measured and stored with the icc version.

32BT · « **Reply #65 on:** June 09, 2019, 03:11:23 am »

I'm not really sure what the ti3 does, but for your purposes here, you can safely ignore the pure black and "pure white" references. First of all, your experiment is not about those details, and secondly it won't make a difference in training, considering the relative contribution of 1 sample in the entire trainingset.

To mitigate the clipping effect, simply apply channelclipping after conversion.

I have been fooling around with tensorflow and colabs to create a sheet for a simple Linear to Perceptual testcase. We might be able to see what minimum complexity is required for an Y to L match. Will post later today.

32BT · « **Reply #66 on:** June 09, 2019, 03:18:47 am »

Quote from: Guillermo Luijk on June 08, 2019, 06:51:36 pm

As far as I know, the preferred output activation function (this is the function used in the last layer, the one that provides the output values; L, a and b in our case) for numerical regression is the identity. In other words, it is the rest of the network which is in charge of modelling the non-linearities, the output layer just performs a linear combination over its inputs. Non-linear output activation functions (like sigmoid or tanh) are preferred for logistic regression problems (i.e. classification problems). But I can do a quick test using sigmoid and tanh in the output layer.

The output layer is usually meant to move and scale the result back to desired output range. Introducing non-linearities limited to unity is mostly not helpful. (In classification problems it can act as an additional filter slope, in which case it is helpful.)

32BT · « **Reply #67 on:** June 09, 2019, 03:39:41 am »

Quote from: Guillermo Luijk on June 08, 2019, 06:51:36 pm

Maybe I'm oversimplifying my conclusions here, but if the NN would be generating unstable outputs for unseen colours, I think we should see that behaviour here, do you agree?.

Yes, agree. But we have to define what undesirable output means. The ringing in our case here is not excessive. What it might generate is small bands of slightly off color steps in a gradient. So you could perhaps try a granger rainbow to see what it does in that case.

You may be right that what appears as ringing occurs as a result of overfitting noise or unstable dark paches. At least for L. For the experiment it is interesting to note that: IF we don't know the actual model, how do we assess our results? It is precisely because we try to model a smooth curve that we know that the fluttering is undesirable. But, if we want to additionally model non-linearities that may occur in camera capture (non-linearities that would ordinarily not be covered by the normal matrix conversions), how do we know our measure of smoothness ?

Jack Hogan · « **Reply #68 on:** June 09, 2019, 03:51:05 am »

Quote from: Guillermo Luijk on June 08, 2019, 08:38:57 pm

They are very close but which one is the geunine XYZ (D50) reference white?

As always with color the answer is less than obvious. The White Point in XYZ is given by the Spectral Power Distribution of the illuminant times the XYZ Color Matching Functions. So one has a few choices:

1) What range of wavelengths should this be limited to? (normally 380:780nm)
2) How frequent is the sampling of the SPD/CMF? (normally 1 or 5nm, but 10 is also used)
3) What XYZ CMF should be used (1931 2 deg, vs more recent/accurate, vs ...)?
4) What SPD should be used? (normally the standard, related to the chromaticity of the WP, which depends on the CMF. It is sampled every 10nm)

Normally I use the 1931 2 deg CMFs for consistency with other published data (e.g. xy coordinates), interestingly they also provide slightly better fits than the CIE2006 version. Therefore the biggest variation comes from 1), the range of wavelengths used. Since the spectrometer I use provides data in the 400:730nm range, oversampled every 3.33333nm, it only makes sense to me to calculate all values (including recalculating Lab references) in that range only for best results, interpolating all curves linearly down to 1nm. In this case

XYZ_D50 = [0.9638 1.0000 0.8229]

But wait, now xy WP has changed, resulting in a slightly different D50 SPD... So as you can see there is a certain amount of wiggle room and perhaps it is not worthwhile to worry too much about the last couple decimal places.

Jack

Jack Hogan · « **Reply #69 on:** June 09, 2019, 04:06:22 am »

Quote from: 32BT on June 09, 2019, 03:18:47 am

The output layer is usually meant to move and scale the result back to desired output range. Introducing non-linearities limited to unity is mostly not helpful. (In classification problems it can act as an additional filter slope, in which case it is helpful.)

You obviously know NNs, Oscar, while I am merely an interested observer. My suggestion of using a non-linear output layer stems from having seen that the visibly worst offenders seem to be in the brighter neutral tones. Perhaps this is due to the fact that the network has learned to deal with the much more numerous changes further down the curve - but not up there. Since the identity activation function in the output layer is a fixed weighted sum of learned features perhaps it does not have enough oomph to make it up there (and possibly not enough un-oomph to make it down to the deepest shadows Guillermo), hence the suggestion of a non-linear activation function to help it along. Does this make sense?

It would be interesting to see whether some of the other networks that did not do so well by the generic metrics actually generalize better with extreme tones.

Jack

Jack Hogan · « **Reply #70 on:** June 09, 2019, 04:27:35 am »

Quote from: 32BT on June 09, 2019, 03:39:41 am

But, if we want to additionally model non-linearities that may occur in camera capture (non-linearities that would ordinarily not be covered by the normal matrix conversions), how do we know our measure of smoothness ?

Well, thinking aloud, current imaging systems can pretty well be considered to be linear. Any resulting non-linearities are due to the fact that the camera's SSFs typically are not a simple linear transformation away from the standard observer's eye SSFs (or CMFs in XYZ if one wants to stick with CIE conventions). So our measure of smoothness is by definition smoothness as perceived by the standard observer, which brings us back to known metrics of just noticeable color differences (like MacAdam Ellipses or similar).

Jack

32BT · « **Reply #71 on:** June 09, 2019, 04:39:24 am »

Quote from: Jack Hogan on June 09, 2019, 03:51:05 am

As always with color the answer is less than obvious. The White Point in XYZ is given by the Spectral Power Distribution of the illuminant times the XYZ Color Matching Functions. So one has a few choices:

1) What range of wavelengths should this be limited to? (normally 380:780nm)
2) How frequent is the sampling of the SPD/CMF? (normally 1 or 5nm, but 10 is also used)
3) What XYZ CMF should be used (1931 2 deg, vs more recent/accurate, vs ...)?
4) What SPD should be used? (normally the standard, related to the chromaticity of the WP, which depends on the CMF. It is sampled every 10nm)

Normally I use the 1931 2 deg CMFs for consistency with other published data (e.g. xy coordinates), interestingly they also provide slightly better fits than the CIE2006 version. Therefore the biggest variation comes from 1), the range of wavelengths used. Since the spectrometer I use provides data in the 400:730nm range, oversampled every 3.33333nm, it only makes sense to me to calculate all values (including recalculating Lab references) in that range only for best results, interpolating all curves linearly down to 1nm. In this case

XYZ_D50 = [0.9638 1.0000 0.8229]

But wait, now xy WP has changed, resulting in a slightly different D50 SPD... So as you can see there is a certain amount of wiggle room and perhaps it is not worthwhile to worry too much about the last couple decimal places.

Jack

That's not the issue. The spectral distribution of D50 is very precisely defined by the CIE. The difference is a result of emissive vs reflective conversion.

However, with nr 3 you are touching on an interesting point: I always thought that white point issues are very obviously not a 2-degree observer problem, so you could say that perhaps the 10-degree matching functions are better suited, but then we neither have the corresponding tri-color models, nor do we know whether the results can be mixed indiscriminately with 2-degree observer logic.

Isn't colormanagement a wonderful swamp of uncertainties? It's a small wonder that it works at all...

32BT · « **Reply #72 on:** June 09, 2019, 04:48:11 am »

Quote from: Jack Hogan on June 09, 2019, 04:27:35 am

Well, thinking aloud, current imaging systems can pretty well be considered to be linear. Any resulting non-linearities are due to the fact that the camera's SSFs typically are not a simple linear transformation away from the standard observer's eye SSFs (or CMFs in XYZ if one wants to stick with CIE conventions). So our measure of smoothness is by definition smoothness as perceived by the standard observer, which brings us back to known metrics of just noticeable color differences (like MacAdam Ellipses or similar).

Jack

Yes, but I mean: how do we know we have achieved the proper smoothness relative to the noise of observations and capture? It's easy to define linearity and gamma, and then provide a stable match. We know the desired smoothness (mathematically). Once achieved, it is transferable to other cases. However, with NNs, once we exceed a certain complexity, this is no longer guaranteed.

It's a bit like interpolation with polynomials. We know cubic works really well and is very stable. Higher order polynomials are not.

Jack Hogan · « **Reply #73 on:** June 09, 2019, 04:52:43 am »

Quote from: Guillermo Luijk on June 08, 2019, 06:51:36 pm

Before doing more simulations or picking some patch pairs to predict the interpolated colours between them, I did a brute force exercise feeding the NN with all possible RGB 8-bit combinations in a synthetic image by Bruce Lindbloom, which shows smooth gradients:

After being transformed by the NN, we get again smooth gradients in the output what makes me think again that the NN is not oscillating because of overftting when predicting intermedium colours:

Maybe I'm oversimplifying my conclusions here, but if the NN would be generating unstable outputs for unseen colours, I think we should see that behaviour here, do you agree?.

I maybe oversimplifying but since the perceivable issues seem to be with the deepest shadows and brightest highlights, isn't one of the symptoms of overfitting the fact that it tends to busily (though perhaps non perceptually) stick fairly close to the curve in the middle of the range but goes wild towards the extremes, especially beyond the range of the training set?

32BT · « **Reply #74 on:** June 09, 2019, 05:02:08 am »

Quote from: Jack Hogan on June 09, 2019, 04:52:43 am

I maybe oversimplifying but since the perceivable issues seem to be with the deepest shadows and brightest highlights, isn't one of the symptoms of overfitting the fact that it tends to busily (though perhaps non perceptually) stick fairly close to the curve in the middle of the range but goes wild towards the extremes, especially beyond the range of the training set?

Yes, and, no. NNs are not particularly suited for extrapolation. So the edges may exhibit more problematic behavior. However, what happens in case of overfitting is that the NN can exactly reproduce all patches in the chart, because it's complex enough and because we are feeding it all patches. However, the chart is both noisy and possibly inconsistently lit. So, we are actually reproducing those problems.

The captured chart should be checked for lightness against the measured Y values. Plot G vs Y in perceptual space and you might see whether the captured data already exhibits any lightness issues.

32BT · « **Reply #75 on:** June 09, 2019, 06:30:57 am »

Okay, seems to work.

I created a small test sheet for testing the interpolation capabilities of a NN. For a reasonably programming savvy person this should be relatively easy to follow. (If you can do matlab, you can do this).

1. Open a google colab sheet: https://colab.research.google.com
2. Choose File -> upload notebook (remove the .txt extension from the attached file)
3. On the right in the toolbar choose Connect
4. Runtime -> run all

Play around with the last two code blocks for different NN configurations. When you change something in the configuration you only have to rerun the last two blocks.

Note that we are trying to interpolate samples from a curve with a limited NN complexity. There is no issue with training size vs validation size etc. because under- and overfitting, or ringing or any other effect is exactly what we are trying to see.

32BT · « **Reply #76 on:** June 09, 2019, 01:04:19 pm »

Okay, so that turns out interesting.

Apparently a single layer with several nodes is better able to approximate the perceptual curve than several layers with a single node.

In the picture below, the left-most graph is a single layer with a single node, tanh activation function.

Incidentally, use the Nadam optimizer for better results:
model.compile(optimizer='Nadam', loss='mean_squared_error')

Guillermo Luijk · « **Reply #77 on:** June 09, 2019, 02:13:30 pm »

OK I have performed the 'rainbow test' to look for overfitting indications. I picked the following 8 patches (plus the dark patch for begin-end reference):

Taking the RAW RGB_WB values on them, I interpolated linearly the transitions between those patches (100 interpolated samples). Then I predicted the ProPhoto RGB output values for the whole range:

This is how the Lab models evolve with incresing complexity (blue cross = exact value, red ball = prediction over the patch, red line = gradient prediction):

This is how the XYZ models evolve with incresing complexity (blue cross = exact value, red ball = prediction over the patch, red line = gradient prediction):

To me it is clear now that, even if Lab models performed better on DeltaE for the seen patches when using high complexity NN's (50, 50), it was simply because the loss function was much closer to the output space where DeltaE are to be measured. And there was clear overfitting (low error on seen patches, but at a cost of oscillations for unseen colours).

So if I were to decide, I'd choose a much lower complexity NN and use the XYZ model. Making a summary:

Linear optimum fit (Jack's): ΔE_max = 21.709, ΔE_mean = 3.569, ΔE_median = 2.461
MLP_XYZ_(4, 4)_tanh_identity: ΔE_max = 11.2728, ΔE_mean = 2.0082, ΔE_median = 1.5386 -> APPARENTLY NO OVERFITTING
MLP_Lab_(50, 50)_tanh_identity: ΔE_max = 3.9451, ΔE_mean = 0.6966, ΔE_median = 0.5313 -> BUT WITH OVERFITTING

So the MLP_XYZ_(4, 4)_tanh_identity would be a good candidate to improve the linear fit keeping compactness.

This is how MLP_XYZ_(4, 4)_tanh_identity compares to MLP_Lab_(50, 50)_tanh_identity:

The Lab (50,50) model gets closer to the exact values (blue crosses), but at the cost of oscillations (=variance) for unseen colours.

And this is how differences are rendered in the ProPhotoRGB rainbow (the white lines are the precise patch median RAW values):

The ringing on the overfitted model is not noticeable, but we know it's there.

Regards

32BT · « **Reply #78 on:** June 09, 2019, 02:36:06 pm »

Very revealing. You aced this!!!

Guillermo Luijk · « **Reply #79 on:** June 09, 2019, 03:34:50 pm »

And here the same animations, but for the output values represented in Lab.

This is how the Lab models evolve with incresing complexity (blue cross = exact value, red ball = prediction over the patch, red line = gradient prediction):

This is how the XYZ models evolve with incresing complexity (blue cross = exact value, red ball = prediction over the patch, red line = gradient prediction):

For me it's clear that going beyond (16,16) NN's for Lab models and beyond (4,4) for XYZ models doesn't add, so it's just invoking: "overfitting, come to me". This makes sense looking at the DeltaE evolution we already saw, which was actually very insightful in finding out where to stop regarding NN complexity:

Regards

Author Topic: Camera calibration using a neural network: questions (Read 11252 times)