You're making a comment in response to a specific research paper. Therefore, I interpret the comment within the context of that paper. So you're implying that the paper is "focusing on a narrow weakness which may not actually be that relevant for the general case". I disagree.
Any 2d image is an optical illusion, so it makes no sense to criticize human image recognition based on it being 'fooled' by illusions. The real criteria for whether image recognition works well or not is altogether different.