Actually this is already a second NN exercise. In the first (IMAGE PROCESSING REVERSE ENGINEERING USING NEURAL NETWORKS) I brute-force trained a NN to mimic an arbitrary image processing (including non-linear curves, desaturation and hue rotation), and the result was amazing:
http://guillermoluijk.com/datosimagensonido/fotografia_mlp_64_64.jpg
There's nothing magical about NN's - like all models they have to exist within the constraints of logic and mathematics.
Take ICC profile models for example. In the context of modelling a devices RGB->XYZ behavior, the simplest realistic model would be a gamma curve common to the device values followed by a 3x3 matrix. Such a model has 10 free parameters. To fit the model ("train") to test values gathered from the real world logically requires at least 10 patches. In practice it is not that simple though, since real world samples have uncertainty, and the fitting function from samples to model parameters may well be ill-conditioned, which means that uncertainty in the test values could result in wildly erroneous model behavior in areas of color space that are not near the test values. So you either have to increase the number of test samples and their coverage to the point where the fit is not ill-conditioned, or/and add regularization constraints that push the poorly constrained parameters in the direction of realism.
At the other end of ICC profile complexity would be using a cLUT. This is a model that is basically unconstrained except for the chosen resolution of the table. For instance, a 33x33x33 cLUT has 107811 free parameters, but can model any function that has a continuity at a scale of less than 3% of the input value. Given that it's generally unrealistic to expect a test set containing of the order of 33000 test points uniformly distributed through the device space, an approach has to be taken to make it work with a lot less test points. Typically a regularization constraint of some form of continuity is applied, such as continuity of value or slope. This is effectively making assumptions about typical device behavior.
So given a realistic number of test points, there is always a trade-off between how closely an unknown device behavior can be modeled, and how well behaved it is at all the points in the gamut that are not at test points.
In color profiling, many other types of models have been applied that pick some other point of trade-off between assumption about how a device behaves, and freedom to fit to the actual test patch values.
Exactly the same constraints apply to a NN model. Depending on its non-linearity and construction, the fitting function could be ill-conditioned. Depending on its size, it may have more free parameters than test points. To really know its final performance, you need to be able to check the model against the ground truth in fine detail throughout the whole of the gamut. And you haven't shown how you intend to obtain the ground truth of your camera behavior to carry out such a performance verification. Splitting your test chart points and using some for training and some for verification will give you an indication of how well your model is working, but having a ground truth is far better. Comparing against other modelling approaches may tell you whether you are in the ball park, but doesn't give you any indication as to whether the differences are in the direction of better or worse compares to the ground truth.
If I were attempting to work on developing models for camera profiling, then the approach I would take would be to first construct a realistic mathematical model of camera behavior based on known devices. This would involve spectral response, channel linearity, sensor noise characteristics, channel cross talk etc. If the intention was to fit the model from test charts, then lens distortions of all sorts would have to be added to the model too. The photographing of any sort of test chart can then be realistically simulated, and the resulting NN model compared in fine detail against the ground truth.