This is not so different from recognizing images in their fourier frequency domain. The frequency features and their origins in the spatial domain can be made very unintuitive.
But I'm not clear how important this phenomenon really is to the practice of CV, since 1) 'spoofed' images are highly specific to each DNN being used, and 2) a trivial reality check of the image can always 'out' examples like these.
Your 2nd point is critical, you can filter these images easily before even running them through the DNN. However researchers are also interested in why it is possible to spoof NN's in general. The typical response of 'overfitting' is being questioned.
Also the question is raised as to whether or not new methods of spoofing are possible that aren't so easily detectable.
But I'm not clear how important this phenomenon really is to the practice of CV, since 1) 'spoofed' images are highly specific to each DNN being used, and 2) a trivial reality check of the image can always 'out' examples like these.