The argument in the paper has nothing to do with cardinality.
Review my comment a few steps above about Big-O. The reasoning there shows the real numbers, despite having cardinality of the continuum, are inadequate for measuring even the following countable set of Big-O complexity classes: O(2^n) together with O(n),O(n^2),...,O(n^i),...
Is there something special about intelligence that implies we only need to consider RL-environments with real-valued rewards, and not, say, big-O-complexity-class rewards? Maybe there is. But if there is, the proof will have to be nontrivial---at a bare minimum, it should make some reference to what the reals are (e.g., the unique complete ordered field). This would be quite world-shattering, to real analysts if to no-one else. But none of the RL literature seems to make any reference to what the reals actually are, instead just taking it for granted that the reals are a magical number system as flexible as anybody could ever want. (Other authors essentially pointed this out before I did; see footnote 10 in my paper.)
I re-read your Big-O comment, and it still does not make sense. The fact, that mapping all n^i to arctan(i) and i^n to arctan(i) + pi is somehow "misleading" is not making it theoretically impossible for RL algorithms from finding an optimal solution in that metric. If for a specific instance you're saying n^1e300 is wrong because it is impractical, and 2^n would be preferred, you simply posed the original task incorrectly by asking for a wrong metric.
It's not about whether the agent will or will not find an optimal solution (no agent can possibly find good solutions in every environment). It's about whether the agent could even understand the true environment based on your real-value-reward description of it.
Suppose the true environment has various tasks the agent can do which give big-O-complexity-valued rewards, one task giving a reward of O(2^n), and others giving rewards of O(n^i) for various i. For concreteness, say Task A rewards O(2^n), Task B rewards O(n^10000), and Task C rewards O(n^20000).
Now suppose you present this to the agent using real-valued rewards, say, where O(n^i) is replaced by arctan(i) and O(2^n) is replaced by arctan(2)+pi, as you suggest, then the agent will be deluded into thinking, e.g., that Task B and Task C give almost identical rewards (Task B gives reward 1.57069633 and Task C gives reward 1.57074633, which barely differ from each other at all). This is misleading because in the true environment, Task C gives much more reward than Task B. Yes, the agent understands Task C gives a bigger reward, but the agent totally mis-understands how much bigger :)
Your paper does not prove anything in regards to what agent "mis-undestands". "Understanding" is not a mathematical concept in this case. It would still progressively find more and more optimal solutions to this task, so I see no problem there.
Suppose I claimed it's impossible to understand Shakespeare with all the consonants are deleted, leaving only vowels. Most people would accept that claim without pedantically asking me to define "understand". Same story here. I'm arguing that certain environments, when shoe-horned into the traditional RL model, are like Shakespeare with all the consonants removed.
The agent might (or might not) find progressively more and more optimal solutions to the interpreted environment, but not to the true environment (unless by dumb luck), because it does not see what the true environment actually is. For example, in the concrete environment described above, you could suppose completing Task C takes more steps than Task B. Then the agent is likely to conclude, on the basis of the misleading arctan rewards, that Task C is not worth the extra steps. Depending on the extra steps, this conclusion could be badly false.
Review my comment a few steps above about Big-O. The reasoning there shows the real numbers, despite having cardinality of the continuum, are inadequate for measuring even the following countable set of Big-O complexity classes: O(2^n) together with O(n),O(n^2),...,O(n^i),...
Is there something special about intelligence that implies we only need to consider RL-environments with real-valued rewards, and not, say, big-O-complexity-class rewards? Maybe there is. But if there is, the proof will have to be nontrivial---at a bare minimum, it should make some reference to what the reals are (e.g., the unique complete ordered field). This would be quite world-shattering, to real analysts if to no-one else. But none of the RL literature seems to make any reference to what the reals actually are, instead just taking it for granted that the reals are a magical number system as flexible as anybody could ever want. (Other authors essentially pointed this out before I did; see footnote 10 in my paper.)