It is fairly well established that neurons in these artificial neural networks are polysemantic and information is represented in directions in the activation embedding space rather than neurons independently representing information (which is why anthropic is doing things like training sparse autoencoders). I haven't read the paper in depth but it seems like it is based on a fundamental misunderstanding about neurons in ANNs vs the brain.
It may be that there are just a few misfirings that dramatically degrade the results. Strokes in the human brain causing a few neurons to die can have outsized affects in the body...