Topic: Camera calibration using a neural network: questions (Read 11250 times)

Guillermo Luijk · « **on:** May 07, 2019, 05:53:16 pm »

In order to practice with NN I have thought of doing a camera calibration exercise, to find out if a simple NN can perform similarly or even better than classical ICC profiles based on LUT's. I have no much idea about how ICC profiles and calibration in general work so here are some questions:

1. In a regular calibrated pipeline, I guess white balance is applied BEFORE the ICC profiling conversions take place. Is that correct?
2. To measure the performance of a calibration workflow, a standard procedure is to measure L*a*b deviations (deltaE) over different patches using some colour card, correct?
3. Let's assume we use an IT8 card for being the card with more patches (that is good in order to train the NN): the L*a*b values with which we need to compare our calibrated output are standard (synthetic) expected values or are Lab values measured (spectrophotometer) over the specific IT8 card used?
4. To do proper WB before training the network, I guess we just need to use the gray patches on the card, correct? later on the white balanced RAW RGB values will be used to train the network.

This is the structure of the NN I plan to implement. Will try different number of nodes since overfitting could lead to very precise colour reproduction for the patches but undesired behaviour for colours unseen by the NN. From a ML point of view, we should preserve some patches in the card as validation set, but on the other hand even an IT8 has very few samples to properly train a NN so I'm not sure what's the best approach here to prevent overfitting.

Regards

GWGill · « **Reply #1 on:** May 07, 2019, 10:36:07 pm »

Quote from: Guillermo Luijk on May 07, 2019, 05:53:16 pm

In order to practice with NN I have thought of doing a camera calibration exercise, to find out if a simple NN can perform similarly or even better than classical ICC profiles based on LUT's. I have no much idea about how ICC profiles and calibration in general work so here are some questions:

I wouldn't recommend using camera profiling as a good choice for such an exercise. The images captured by a camera of a test chart are subject to a lot of interfering factors that take skill and effort to minimize, such as uneven lighting and flare. Commercially available test charts have a limited number of test patches, and a true validation requires an independent test set, so you really have to be prepared to manufacture and spectrally measure your own test charts. A profile is typically never a perfect fit, due to the spectral differences between the camera sensor and the standard observer. These are all reasons why serious camera profiling takes the approach of measuring the sensor spectral sensitivity curves, rather than taking photo's of test charts.

You could use printer profiling as a test exercise, since the repeatability is much better, and it's relatively easy to generate different test sets. (There have been a paper or two in the CIC proceedings on using neutral nets for this type of thing.)

The basic nature of the task is scattered data interpolation. There are many possible techniques that can be used for this, and it's fair to say that I'm not much of a fan of NN's, although you may get acceptable results out of them with such a low dimensional model as printer profiling, and at least you can comprehensively explore continuity behavior etc., something that's basically impossible at higher dimensions. For regularization, check out "neural network dropout regularization", which seems to be a note of sanity amongst the craziness of sparsely trained, over fitted NN based modelling systems that are currently all the rage (turtles being recognized as guns is a classic).

32BT · « **Reply #2 on:** May 08, 2019, 02:07:35 am »

Quote from: Guillermo Luijk on May 07, 2019, 05:53:16 pm

Will try different number of nodes since overfitting could lead to very precise colour reproduction for the patches but undesired behaviour for colours unseen by the NN. From a ML point of view, we should preserve some patches in the card as validation set, but on the other hand even an IT8 has very few samples to properly train a NN so I'm not sure what's the best approach here to prevent overfitting.

Just to be sure: "number of nodes" and "overfitting" are not related. Non-uniformly distributed learning samples are a problem. In linear equations we need as many equations as there are unknowns to solve the system. In ML we just need representative equations (samples) from the entire set while learning, and preferably a different set of samples while verifying.

The trick in this case is to not bias or overfit a certain color because that colorsample happened to be overrepresented during the learning fase.

Your assessment of WB before colormatch is indeed correct, since you generally want to apply a single colorresponse in different WB situations.

WB is interesting in this case, because what NN configuraton would you design to solve just the WB multipliers?

sandymc · « **Reply #3 on:** May 08, 2019, 05:42:19 am »

If you do go ahead, please keep us updated. I'd be very interested in the results of such an exercise.

Guillermo Luijk · « **Reply #4 on:** May 08, 2019, 03:30:30 pm »

Quote from: GWGill on May 07, 2019, 10:36:07 pm

I wouldn't recommend using camera profiling as a good choice for such an exercise. The images captured by a camera of a test chart are subject to a lot of interfering factors
(...)
The basic nature of the task is scattered data interpolation. There are many possible techniques that can be used for this, and it's fair to say that I'm not much of a fan of NN's, although you may get acceptable results out of them with such a low dimensional model as printer profiling, and at least you can comprehensively explore continuity behavior etc., something that's basically impossible at higher dimensions. For regularization, check out "neural network dropout regularization"

I understand the limitations of camera profiling using color charts, but the goal here is not profiling any camera for future use, it's just to practice with NN and find out if a NN (defined in a much simpler way that LUTs) can be as good or better than the classical ICC profiles. I chose camera because I know a person whose company makes IT8 cards and will provide a high quality shot, so as measured Lab values over the chart.

My concern is how will the NN perform at interpolating unseen colours. If it behaves softly between seen patches the result should be good, but if it does this:

colours will be impredictable. That is why I plan to start by using simple structures (few nodes), and try dropout in the training process if the interpolation starts to display undesired responses.

Regards

Guillermo Luijk · « **Reply #5 on:** May 08, 2019, 03:38:55 pm »

Quote from: 32BT on May 08, 2019, 02:07:35 am

Just to be sure: "number of nodes" and "overfitting" are not related. Non-uniformly distributed learning samples are a problem. In linear equations we need as many equations as there are unknowns to solve the system. In ML we just need representative equations (samples) from the entire set while learning, and preferably a different set of samples while verifying.

The trick in this case is to not bias or overfit a certain color because that colorsample happened to be overrepresented during the learning fase.

Your assessment of WB before colormatch is indeed correct, since you generally want to apply a single colorresponse in different WB situations.

WB is interesting in this case, because what NN configuraton would you design to solve just the WB multipliers?

I expect many nodes could more easily lead to overfitting because it provides the NN with more non-linear mapping capabilities, while I'm looking for soft interpolations. But I'll try an increasing number of nodes. The IT8 chart tries to have a good representation of colours evenly distributed in the CIELAB space:

Of course these 200-300 patches are a minimal set for NN training, but I want to give it a try anyway. My input (RGB)/output(Lab) correspondences will be nearly noise free thanks to patch averaging, that's an advantage here.

Regarding the WB, I'll just set a linear scaling of the input RAW RGB values to make a middle gray patch on the IT8 chart become neutral (R=G=B), just like many RAW developers do. It's what I tried to represent in the previous scheme by connecting the RGB to R'G'B' values by a single scaling. It will not be a part of the NN itself to make the process more flexible and adequate to use the same profiling with an arbitrary WB.

Quote from: sandymc on May 08, 2019, 05:42:19 am

If you do go ahead, please keep us updated. I'd be very interested in the results of such an exercise.

Sure!

Regards

32BT · « **Reply #6 on:** May 08, 2019, 04:05:01 pm »

Quote from: Guillermo Luijk on May 08, 2019, 03:38:55 pm

I expect many nodes could more easily lead to overfitting because it provides the NN with more non-linear mapping capabilities, while I'm looking for soft interpolations. But I'll try an increasing number of nodes.

Yes, "less is more" in the case of NN.

Quote from: Guillermo Luijk on May 08, 2019, 03:38:55 pm

Of course these 200-300 patches are a minimal set for NN training, but I want to give it a try anyway. My input (RGB)/output(Lab) correspondences will be nearly noise free thanks to patch averaging, that's an advantage here.

The number of patches is not so much the problem: each run may consist of only 3 patches, as long as those patches are either an (R,G,B) variant or (C,M,Y) variant. That is unfortunately some prior pattern logic you need to feed it to ensure a solution that converges.

Quote from: Guillermo Luijk on May 08, 2019, 03:38:55 pm

Regarding the WB, I'll just set a linear scaling of the input RAW RGB values to make a middle gray patch on the IT8 chart become neutral (R=G=B), just like many RAW developers do. It's what I tried to represent in the previous scheme by connecting the RGB to R'G'B' values by a single scaling. It will not be a part of the NN itself to make the process more flexible and adequate to use the same profiling with an arbitrary WB.

That's fine of course, I merely mentioned it as a perhaps simpler "exercise" to design and learn NN.

As for your output: you might want to consider XYZ as output, or normal conversion from that to Lab. Otherwise you might want to design a separate NN first to convert XYZ to Lab. (Another interesting exercise...).

Guillermo Luijk · « **Reply #7 on:** May 08, 2019, 05:02:14 pm »

Quote from: 32BT on May 08, 2019, 04:05:01 pm

I merely mentioned it as a perhaps simpler "exercise" to design and learn NN.

As for your output: you might want to consider XYZ as output, or normal conversion from that to Lab. Otherwise you might want to design a separate NN first to convert XYZ to Lab. (Another interesting exercise...).

Actually this is already a second NN exercise. In the first (IMAGE PROCESSING REVERSE ENGINEERING USING NEURAL NETWORKS) I brute-force trained a NN to mimic an arbitrary image processing (including non-linear curves, desaturation and hue rotation), and the result was amazing:

http://guillermoluijk.com/datosimagensonido/fotografia_mlp_64_64.jpg

Here the thing is more complicated since the IT8 card just provides a few samples so a huge amount of mapping correspondences have to be interpolated. I plan to do some deltaE calculations to compare the performance of the NN vs ICC profile obtained using the same IT8, and a "standard" best effort RAW developer such as DCRAW, that's why Lab makes things simpler. What would be the advantage of using XYZ?.

Regards

32BT · « **Reply #8 on:** May 08, 2019, 05:59:42 pm »

Quote from: Guillermo Luijk on May 08, 2019, 05:02:14 pm

What would be the advantage of using XYZ?.

Regards

It might make things simpler or more insightful since it gets closer to known relations, but perhaps you're right and you might as well go directly to Lab since NN logic is based on non-linear relations anyway.

GWGill · « **Reply #9 on:** May 08, 2019, 10:32:19 pm »

Quote from: Guillermo Luijk on May 08, 2019, 05:02:14 pm

Actually this is already a second NN exercise. In the first (IMAGE PROCESSING REVERSE ENGINEERING USING NEURAL NETWORKS) I brute-force trained a NN to mimic an arbitrary image processing (including non-linear curves, desaturation and hue rotation), and the result was amazing:

http://guillermoluijk.com/datosimagensonido/fotografia_mlp_64_64.jpg

There's nothing magical about NN's - like all models they have to exist within the constraints of logic and mathematics.

Take ICC profile models for example. In the context of modelling a devices RGB->XYZ behavior, the simplest realistic model would be a gamma curve common to the device values followed by a 3x3 matrix. Such a model has 10 free parameters. To fit the model ("train") to test values gathered from the real world logically requires at least 10 patches. In practice it is not that simple though, since real world samples have uncertainty, and the fitting function from samples to model parameters may well be ill-conditioned, which means that uncertainty in the test values could result in wildly erroneous model behavior in areas of color space that are not near the test values. So you either have to increase the number of test samples and their coverage to the point where the fit is not ill-conditioned, or/and add regularization constraints that push the poorly constrained parameters in the direction of realism.

At the other end of ICC profile complexity would be using a cLUT. This is a model that is basically unconstrained except for the chosen resolution of the table. For instance, a 33x33x33 cLUT has 107811 free parameters, but can model any function that has a continuity at a scale of less than 3% of the input value. Given that it's generally unrealistic to expect a test set containing of the order of 33000 test points uniformly distributed through the device space, an approach has to be taken to make it work with a lot less test points. Typically a regularization constraint of some form of continuity is applied, such as continuity of value or slope. This is effectively making assumptions about typical device behavior.

So given a realistic number of test points, there is always a trade-off between how closely an unknown device behavior can be modeled, and how well behaved it is at all the points in the gamut that are not at test points.

In color profiling, many other types of models have been applied that pick some other point of trade-off between assumption about how a device behaves, and freedom to fit to the actual test patch values.

Exactly the same constraints apply to a NN model. Depending on its non-linearity and construction, the fitting function could be ill-conditioned. Depending on its size, it may have more free parameters than test points. To really know its final performance, you need to be able to check the model against the ground truth in fine detail throughout the whole of the gamut. And you haven't shown how you intend to obtain the ground truth of your camera behavior to carry out such a performance verification. Splitting your test chart points and using some for training and some for verification will give you an indication of how well your model is working, but having a ground truth is far better. Comparing against other modelling approaches may tell you whether you are in the ball park, but doesn't give you any indication as to whether the differences are in the direction of better or worse compares to the ground truth.

If I were attempting to work on developing models for camera profiling, then the approach I would take would be to first construct a realistic mathematical model of camera behavior based on known devices. This would involve spectral response, channel linearity, sensor noise characteristics, channel cross talk etc. If the intention was to fit the model from test charts, then lens distortions of all sorts would have to be added to the model too. The photographing of any sort of test chart can then be realistically simulated, and the resulting NN model compared in fine detail against the ground truth.

Guillermo Luijk · « **Reply #10 on:** May 09, 2019, 01:51:18 pm »

Quote from: GWGill on May 08, 2019, 10:32:19 pm

(...)
which means that uncertainty in the test values could result in wildly erroneous model behavior in areas of color space that are not near the test values.
(...)
given a realistic number of test points, there is always a trade-off between how closely an unknown device behavior can be modeled, and how well behaved it is at all the points in the gamut that are not at test points.
(...)
the same constraints apply to a NN model. (...) Depending on its size, it may have more free parameters than test points. To really know its final performance, you need to be able to check the model against the ground truth (...). Splitting your test chart points and using some for training and some for verification will give you an indication of how well your model is working, but having a ground truth is far better.

Thanks for your helpful insights GWGill. You're totally right, no matter how few patches the IT8 chart has, some of them need to be preserved as a validation set, at least before finding out the complexity of the NN that can be used so as the degree of training to avoid loss of accuracy in unseen colours. This leads me to think about methodical flaws in the usual workflow chart makers sell: shoot the chart, make an ICC profile, and if the patches get small deltaE's you're doing a good profiling. Later on the ICC profile is applied to unseen images (test sets), and colours are assumed to be correct without bearing in mind the strong interpolations taking place.

I could preserve half the patches as a validation set, and even interchange the train an validation sets to compare the results. My concern is to find a NN that interpolates softly unseen colours between seen patches, something that can be checked as well.

I think the ratio between network complexity (no. of layers and nodes) and patches in the training set is a very important hyperparameter to tune. And another one will be the number of iterations. The good thing of these patches is that they will be noise free, so there are not outliers or wrong samples that coould fool the model.

Regarding the more rigurous ways to proceed, I agree this is not ideal but precisely I try to compare the NN with the also far from ideal way to proceed that is supported by so many colour consultants: shoot the IT8, build the ICC profile, apply to all your images.

Regards

Guillermo Luijk · « **Reply #11 on:** May 16, 2019, 02:52:21 pm »

Making some progress this week:

Hugo Rodríguez kindly provided me with an optimium RAW file shot from his personally produced IT8 card
I have decided that the NN won't output Lab values but XYZ values (you were right at this 32BT! I had no idea XYZ was the best PCS to convert to Lab or any RGB afterwards)
Maths come from the Colour Bible and have been checked vs Lindbloom's and EasyRGB's colour calculators, working fine. I don't need Photoshop for any stage of the process: RAW development + WB + profiling + sRGB conversion

This would be the scheme (sorry for the Spanish):

Colour distributions from Hugo's IT8: a-b for colour patches and L for gray patches:

I'll use median rather than mean over the patches to avoid the bias introduced by some scratches

I'll preserve around 20% of the patches as validation set. If the NN peforms fine, I'll probably use all the patches in a final training roundup.

Regards

GWGill · « **Reply #12 on:** May 16, 2019, 08:05:08 pm »

Quote from: Guillermo Luijk on May 16, 2019, 02:52:21 pm

I have decided that the NN won't output Lab values but XYZ values (you were right at this 32BT! I had no idea XYZ was the best PCS to convert to Lab or any RGB afterwards)

A camera typically has an additive characteristic (the three channels don't interact much), so of course if inputs and outputs of the model are proportional to light level (as RAW input and XYZ output are), the model tends to look highly linear (closest to a 3x3 matrix).

32BT · « **Reply #13 on:** May 17, 2019, 01:23:17 am »

Quote from: GWGill on May 16, 2019, 08:05:08 pm

A camera typically has an additive characteristic (the three channels don't interact much), so of course if inputs and outputs of the model are proportional to light level (as RAW input and XYZ output are), the model tends to look highly linear (closest to a 3x3 matrix).

Begging the question of course what advantage to expect from the hidden layers???

GWGill · « **Reply #14 on:** May 17, 2019, 01:43:15 am »

Quote from: 32BT on May 17, 2019, 01:23:17 am

Begging the question of course what advantage to expect from the hidden layers???

A camera matrix is generally not a perfect fit due to the spectral sensitivity difference to the standard observer. So a better fit is possible using something like a 2 dimensional LUT for the chromatic plane based on weighting the mapping by the typical scene spectral statistics, plus 1D curves to model any sensor luminance non-linearity. [ As already stated, I'm not a fan of NN and wouldn't use them for something like this. In fact, having wrestled with the pitfalls of attempting to model higher dimensional functions for some time now, I don't think I'd choose to use NN for anything, much less call it "AI". ]

32BT · « **Reply #15 on:** May 17, 2019, 04:30:11 am »

Quote from: GWGill on May 17, 2019, 01:43:15 am

A camera matrix is generally not a perfect fit due to the spectral sensitivity difference to the standard observer. So a better fit is possible using something like a 2 dimensional LUT for the chromatic plane based on weighting the mapping by the typical scene spectral statistics, plus 1D curves to model any sensor luminance non-linearity. [ As already stated, I'm not a fan of NN and wouldn't use them for something like this. In fact, having wrestled with the pitfalls of attempting to model higher dimensional functions for some time now, I don't think I'd choose to use NN for anything, much less call it "AI". ]

In that respect it would probably be a better experiment to let the NN solve the entire path from camera RGB to perceptual Lab.

An additional experiment which might be far more relevant and interesting: design the smallest NN config to convert XYZ to Lab, and then teach this config on Munsell samples. Then see if it results in an actual hue constant perceptual space instead of an error-based version. Then see if it can better predict color compression for out-of-gamut color. (NNs and extrapolation not exactly being best friends...)

Guillermo Luijk · « **Reply #16 on:** May 17, 2019, 06:17:59 am »

Quote from: 32BT on May 17, 2019, 04:30:11 am

In that respect it would probably be a better experiment to let the NN solve the entire path from camera RGB to perceptual Lab.

An additional experiment which might be far more relevant and interesting: design the smallest NN config to convert XYZ to Lab, and then teach this config on Munsell samples. Then see if it results in an actual hue constant perceptual space instead of an error-based version. Then see if it can better predict color compression for out-of-gamut color. (NNs and extrapolation not exactly being best friends...)

Being the XYZ to Lab conversion (or to any RGB output colour space) a well known deterministic formula (much likely as WB is), I don't see the point in making it a part of the NN.

I know this is not a typical usage of NN: first we have very few samples, and secondly we are modelling subtle non-linearities when NN's stronghold is modelling non-linear behaviours.

I still want to try it. An advantage here is that samples can be considered noiseless, and by using a simple NN I think we can still get soft interpolations between the patches.

A question for GWGill: in LUT based ICC profiles, who takes the decision of how many samples are those LUT made of? And secondly: how/where is later decided the way to interpolate between the LUT values? for example when opening profiled image data in Photoshop and assigning it to a given ICC profile. What kind of interpolation used here, just linear or something more sofisticated?.

Regards

32BT · « **Reply #17 on:** May 17, 2019, 06:33:42 am »

Quote from: Guillermo Luijk on May 17, 2019, 06:17:59 am

Being the XYZ to Lab conversion (or to any RGB output colour space) a well known deterministic formula (much likely as WB is), I don't see the point in making it a part of the NN.

It's an extremely simple, non-linear deterministic formula, which makes it ideal for NN experiments.
you only need a simple NN, and once it seems to work for standard Lab conversion, you could reset it, and perhaps train it for Munsell samples, at which point it becomes less deterministic, but all the more relevant. You could design a small NN that actually spits out a truly perceptually uniform space with constant hue characteristics.

Quote from: Guillermo Luijk on May 17, 2019, 06:17:59 am

I know this is not a typical usage of NN: first we have very few samples, and secondly we are modelling subtle non-linearities when NN's stronghold is modelling non-linear behaviours.

I still want to try it. An advantage here is that samples can be considered noiseless, and by using a simple NN I think we can still get soft interpolations between the patches.

Absolutely, it is just why I also mentioned that going directly to Lab may have an advantage over XYZ because of the non-linearities. So you might as well train it to do linear camera input to perceptual output (Lab, or even sRGB directly). Note that I am primarily making suggestions for potential experimentations. I'm certainly not trying to dismiss this particular experiment in its current form, au contraire.

Guillermo Luijk · « **Reply #18 on:** May 17, 2019, 08:58:52 am »

Quote from: 32BT on May 17, 2019, 06:33:42 am

Note that I am primarily making suggestions for potential experimentations.

Understood!!! and your suggestions are very much appreciated. Thanks!. Looking at the formulation, I think the conversion from XYZ to Lab is quite trivial for a NN in fact.

Regards

Guillermo Luijk · « **Reply #19 on:** May 19, 2019, 06:34:02 pm »

Little progress over the weekend:
- Play with image library to read/save 16-bit images
- White balanced RAW extraction (DCRAW)
- Read chart, crop and calculate median over each patch

The code here.

Regards

Author Topic: Camera calibration using a neural network: questions (Read 11250 times)