Tokenization is purely an implementation detail. If OpenAI had cared, they could...

Tokenization is purely an implementation detail. If OpenAI had cared, they could have deleted those obviously glitched tokens from their tokenizer. They just didn't inspect it carefully and/or care.

GPT4 does not suffer from the same glitched tokens as GPT3, presumably because it uses a different tokenizer.

Furthermore, there are LLMs that operate on single bytes instead of multi-character tokens, totally obviating that problem.