I KNOW IT can be hard to explain but can anyone explain why and how Vinod sees the boxes that way (differently?? (does he have a vision problem haha?))

The first raw from the right which is one of the cubes that Sara observes in the middle has 3 cubes based on what Vinood sees.
So the middle raw can’t have less than 3 cubes.
So I think the answer is D.