It depends on what f stop you are shooting and how close your starting focus is. The lower the f stop (lower DOF) and closer the starting focus, the lower step size you need. You just need to set up some experiments to get a feel for the step size.
You always have to keep in mind two things...
1) the closer a region is, the more sensitive it is to a change in focus. For example, if you are focusing on a region that is two feet away, then a very slight turn of the manual focus ring might move the focal point an inch; while if the region is 20' away, a slight turn of the focus ring might move the focal point a couple feet. This slight turn of the focus ring corresponds to a "step" when using auto focus bracketing. A smaller step number corresponds to a smaller change in focus.
2) the f stop of the lens affects DOF, so you need to make sure that the bracketing steps are small enough, given the available DOF, so that every portion of the region being scanned by the lens is in focus somewhere in the collection of bracketed frames.
After you select step size, then set the number of shots to 999, focus on the nearest point, then squeeze the shutter. The bracketing sequence will start where you initially had the lens focused, then work its way out according to the step size you have specified, until infinity is reached. Selecting the number of shots to be 999 ensures this will happen. The sequence will almost never approach 999 because it automatically stops when it reaches infinity.
In the example you provide, if I was shooting at f5.6 while using the 12-40 Pro lens, I would use a step size of at least four or five, given that the nearest object in your example is about ten feet away. How did I know the step size should be four or five? Experience. If I use a larger step size I might find that some parts of the image were never in focus in any frame. A smaller step size will work as well, but I will get a lot more frames than I need. If you are shooting a lot of focus bracketing shots in RAW mode, with a small step size, you can fill up a 32GB card pretty fast and be faced with a mammoth editing job to delete all the redundant frames you don't need.
If the shot starts just a foot or two away and I was shooting with the lens wide open so DOF is minimum, then I might set the step size down to as small as two. With a step size that small, combined with starting the focusing that close, you may end up with over a 100 files.
The reason you are getting so many files with everything out of focus is that most of the time the lens was focused on the empty space between your close foreground and infinity. The example you posted is typical of this situation. There is nothing you can do to avoid that other than increase step size. Increase step size until you find that there are parts of the foreground that are never in focus in any of the frames, then back off the step size. Do this for an afternoon and you will soon get a feel for the appropriate step size for a given situation.
I don't use the camera for focus stacking, only focus bracketing. I peruse the files in LR first and throw out the ones I don't need (the ones that are out of focus because the lens was focused on empty space) and then combine what's left in PS.