If you want to avoid having your site contents used to train AI and don't want to make your website unavailable to the open web, then blocking Common Crawl (among others) is absolutely mandatory.
We have very few tools to protect ourselves here, and need to make the most of the ones we have.
This is kind of a question of etiquette and legality more than it is a question of what is technically possible. People can and do ignore robots.txt. That doesn't mean copyright infringement is now legal.
People steal cars and break into houses and both have their versions of locks and alarms and are equally as useless as a right-click-disable fix to a website to a skilled "attacker".
Some people want to make their stuff freely available to people, but don't want it to be used to train AI. That's the group my comment was talking about. No monopolistic megalomaniacs here.
Personally, I think that such efforts are pointless -- all it takes is one crawler to make it through your defenses and you may as well have not done a thing. The alternative, though, is to remove your sites from the open web entirely. Either way, Commmon Crawl needs to be excluded if you want to avoid your stuff being used to train AI.