Photonic noise is the square root of the total number of photons impinging upon the photodetector. Of 16 photons impinging upon the 8 micron photodetector, 4 will be noise. Total noise for that area of sensor is 25%.
If we cover the same area with 4x4 micron photodetectors, each photodetector will receive 4 photons, two of which are noise. Total noise for that area of sensor is 4x2=8 photons. Ie., 50% noise.
Wrong or at least irrelevant because you ignore that noise is a mixture of positive and negative variations around the "true" value, so that when signals are merged, ther is some cancellatio of positive and negative noise values, so that total noise increass less than in proportio to the number of signals combined. For the common and simple case of uncorrelated noise, noise levels combines in root-mean-square fashion, and so total noise grows as the square root of the number of values combined.
Let me rework your example of using eithe one big photsitre of four smaler photsites to gather light from a given part of the image, and then combining (binning?) the four small photosite signals to get the same resolution as the big photosite sensor.
Say the larger photosites receiving light from a subject of certain illumination level should gather 16 photons, but due to noise, the resulting electron count is "16 plus or minus 4", meaning that various photosites receive an average of 16 photons each but with fluctuations above and below that value, of standard deviation of sqrt(16)=4.
If the subject is instead photographed with a sensor whose photsites are one quarter the area, each will give an average count of four, with standard deviation sqrt(4)=2. If the signals from the four small photosites covering the same part of the subject as one big photosite are combined (binned), the average signals simply adds, to total 16, while the four standard deviations (noise level) of 2 combine as follows:
sqrt(2^2+2^2+2^2+2^2) = sqrt(16)=4,
EXACTLY the same signal and noise standard deviation as if you had used one bigger photosite to start with.
Actually, this fancy mathematics is not needed in the case of photon noise, which you should remember is variation in the light arriving at the sensor, not something cased by the sensor itself. Clearly, whether you prodice each "big pixel" by counting the light arriving at a certain part of the sensor with one big photosite, or use four smaller photosites and then combine the totals into a single output "big pixel value", the total light received will be the same, and thus the variation between neighboring big pixel values will this be the same.
Thus as fas as photon nouise goes, aggregating data from more smaller photsites given teh same S/N ratio as if fewer larger ones were used.
This also works if the aggregation is done by printing the smaller pixels at higher pixel density to get the same image size, and viewing from a distance at which the lower pixel count image is not visibly pixelated: the smaller pixels will then be too small to resolve, and so get visually averaged by the eyes. Conversely, if the smaler pixels are big enough to resolve, their worse per pixel noise might be detected, but the alternative evil with biger pixels is visible pixelation!