Actually, I didn't mind this "oversimplification" all that much.
While I agree that in practice we use more than object perception to look at (judge) an image, I also believe that "object perception" is exactly the similarity between art perception and language coding.
If I am not able to distinguish different elements in perception, I will not be able to code these different elements in words. It also means I will not be able to understand a coding of these elements by someone else. In other words; I won't be able to "relate to" someone else's experience.
I believe this is true for differentiating any type of experience, be it rational or intuitive, sensible or emotional. Especially the latter is important since most art attempts to invoke some kind of emotional response, or at least tries to appeal to some kind of emotional experience.
We use the rules to order the words, but if we are unable to delineate the different meanings of coding elements, then even knowing the rules will not help.
That's why a prodigy child playing a musical instrument does not usually invoke the emotions as "envisioned" by the composer, even though the execution can be brilliant. So brilliant in fact that it can obscure the lack of interpretation.
For further thought, here is a coded rule of Photography and/or Art:
- A piece of Art necessarily has a frame.
Meaning: there is always a difference between the object and its context. If there wasn't, we simply wouldn't be able to communicate the concept...
- The frame is both spatial as well as temporal
If it wasn't, then we wouldn't know where the experience ends, and therefore we wouldn't know how to differentiate the experience.
I don't think I am confusing the two but I am guilty of over-simplification in this case. Looking at an image involves more than just object perception