Here is a stab at comparing sensors with different pixel sizes, keeping everything else the same.
Suppose some manufacturer decides to make two cameras, using the same lens and sensor area, but different pixel sizes. Camera A has large pixels, while camera B has small pixels. Let's say the small pixels of camera B are made by subdividing each large pixel of camera A into four sub-pixels.
Clearly both cameras need the same lens and aperture and shutter speed to capture equivalent images. What then are the differences between the resulting images?
The camera with smaller pixels will have an advantage in system resolution. The magnitude of this effect depends on whether the system resolution is dominated by the lens or by the sensor. If the lens can resolve 100 lp/mm and the sensor has 10 micron pixels, both resolutions are comparable, and pixel size has a significant effect on resolution. If the lens is not this spectacular (or is stopped down below f/8), or if the sensor pixels are smaller, then smaller pixels provide less improvement in system resolution.
The camera with larger pixels will have an advantage in system noise. Siince not everyone seems to agree with this, let's go through the argument slowly.
Let's agree to expose the image correctly in each camera, to just barely saturate the brightest pixels. The large pixels capture 4 times as many photons as the small pixels, but the sensor charge capacity (full well depth) is correspondingly 4 times as great, so this requires the same exposure (aperture and shutter speed) in both cases. Let's say this exposure produces digitized values of 256 in the saturated small pixels and 1024 in the saturated large pixels, at the same electronics gain. (Of course camera A may subsequently multiply all values by 1/4 to provide a common intensity scale, but this does not affect signal/noise considerations.)
Now what about noise? Sensor noise is generally dominated by the readout electronics. Let's say the readout noise is a fixed number of electrons, corresponding to a digitized value of 1 count. Since every pixel is subject separately to this readout noise, it will be the same for the small pixels and the large pixels, at the same electronics gain. But remember the large pixels produce 4 times more signal per pixel, so the signal/noise ratio will be 4x better for the large pixels than for the small pixels.
The situation is not quite as bad as this, because the small pixels are bunched together closer in the final displayed image, so some averaging occurs. This averaging may occur digitally, if the image is displayed small, as pixels are down-sampled to match the printer or screen resolution. If the four small pixels are averaged back into one large pixel, the noise will improve by a factor of 2 by simple statistics. This still leaves a factor of 2 worse noise than if the averaging was done in the sensor before digitization.
Photon statistics (shot noise) is another noise source. This will be the dominant noise for bright pixels, where signal/noise is already good, but it will not be the dominant noise for dark pixels, where signal/noise is poor and hence important. If sensor readout noise was somehow made negligible and photon statistics became the dominant noise source, then we would have a different situation. The small pixels would have 1/4 the signal and hence 1/2 the noise of the large pixels, and by the averaging argument given above this factor of 2 would cancel out. In this situation, the small pixels would be equivalent to the large pixels in signal/noise.