Some of the comments from that posting give concrete examples where the formula fails. Such as: an item with 1000 upvotes and 2000 downvotes will get ranked above one with 1 upvote and 2 downvotes. This is because the formula uses the lower bound of the Wilson interval.