Yup... and isn’t it interesting that we’re only bothered in the vertical, but railroad tracks going toward the horizon and converging don’t bother us at all. Same exact thing. If you want to freak out your brain, take a photo of said tracks and make them perfectly parallel going off into the horizon (the way they “actually do,” i.e. remain parallel).
There seems to be a lot of confusion on this issue. The greater the distance of any observed object from the viewer, the smaller that object appears to the viewer. This is how we can estimate the distance of any object. The human mind gradually develops from birth and early childhood to automatically understand this. The size of recognizable objects enables us to determine the distance between ourselves and those objects.
The average house viewed from a distance of a few hundred metres looks as small as a child's doll's house. However, I doubt that anyone would confuse such a house at a distance, with a doll's house or a small accurate model of a house which is situated much closer. This is because there are so many recognizable objects between the house in the distance, and the viewer, and each of these objects between the house and the viewer, such as grass, trees, fences, or railway lines, also look smaller in proportion to their distance from the viewer.
Both the camera and the natural eye will capture these effects, provided there are sufficient recognizable objects in the foreground. This means that the camera must have an appropriate focal length of lens in order to capture the objects in the foreground. However, an extremely wide-angle lens, such as 12 mm or 14 mm, will tend to unnaturally exaggerate the size of close objects and unnaturally diminish the size of distant objects. Likewise, a long telephoto lens, or a macro lens, which excludes most of what the natural eye sees in the foreground and surrounding area, can make small objects appear huge. However, if we recognize those objects, such as a species of insect or bird, we can deduce they are not huge monsters. Also, adjacent objects such as leaves next to a shot of a bird taken with a telephoto lens, provide a clue as to the size of the bird when we are unfamiliar with the species.
The issue of objects appearing to lean to one side in a photograph, is quite different from the above issue which I've tried to explain. In the real world I get no sense of tall buildings leaning to one side when I walk through a city center with tall skyscrapers. Perhaps I am only speaking for myself. Perhaps there actually are people who do get a sense of leaning skyscrapers when they walk through a city. If that is the case, please mention it and explain how you can tell the difference between something that just appears to be leaning and something which actually is leaning, like the leaning Tower of Pisa.
As I sit here in my house, at my computer desk, I am surrounded by dozens of verticals and horizontals, such as door frames, window frames, cupboards, TV frame, wall edges, table legs, and so on. Whether I'm standing up, sitting down, or lying on the floor, all those verticals remain vertical as I view them naturally with my eyes.
However, when I look through the camera viewfinder through a wide-angle lens, those verticals change. From a lower position, kneeling on the floor, with camera tilted slightly upwards, in order to capture the top of the door and window frames, the verticals all lean towards the centre, creating the impression that each door and window frame has the shape of a pyramid. When I point the camera down, from a standing position, the reverse impression occurs, like a pyramid standing on its head. And of course, if I tilt the camera to one side, the horizontals cease to be horizontal. It's a terrible distortion, and is why Photoshop has introduced the 'distort, warp and perspective' controls, and a cropping format which can be tilted.
Now you might wonder what the reason is for this effect. After all, the camera has the reputation for capturing what the eye sees in reality.
Here's my explanation, but please feel free to correct me if you think my argument is not sound.
Einstein's Theory of Relativity is relevant here. A vertical is only vertical in relation to something else which is not vertical. The camera always introduces its own verticals and horizontals, separate from the scene, and which are imposed on every scene which is captured. In other words, it's the camera format, whether square or rectangular, which is the source of the distortion, and such format restrictions are always reproduced in the print.
The natural eye does not impose such external format restrictions. We have a central view of focus, but that view includes a wide range of 'out-of-focus' areas on all sides of the object we are viewing. We can also move our eyeballs from left to right, and up and down, without moving our head, in order to shift that focus.
Being able to distinguish between what is really vertical and what is really leaning is an essential part of human evolution.
To finish my long-winded post, I'll include an image of the Leaning Tower of Pisa (from the internet, not mine) in which the inherent distortions of the camera imply that the tower of Pisa is not leaning any more than the cathedral next to it, and possibly less.