Chalk it up to poker being something you can be skilled at without really having...

eutectic · on Jan 9, 2017

The charitable interpretation of his argument is that the outcome is only a statistical 'tie' if you accept the arbitrary significance threshold of p=5%. Even relatively weak evidence is still evidence.

Unfortunately, some of the examples which he claims don't pass this significance threshold actually do. This does not help his credibility.

conistonwater · on Jan 9, 2017

FWIW, the win rate was -91mbb/g, and so they reported the probability of being equal in skill somewhere between 5% and 10%, hence the "tie". The introduction to the other poker paper on HN now (https://arxiv.org/pdf/1701.01724v1.pdf), by a different team, from Alberta, is rather less gracious and says it was a "huge margin of victory". I think that whatever the interpretation, his video makes it clear he didn't understand any of this.

hnkain · on Jan 9, 2017

Significance works the other way round than you described it -- if two players were equal in skill, then a winrate of -91mbb/g or larger would happen 5-10% of the time.

I don't know if Doug Polk understands that or not, but I agree with his criticism and I think his analogy with sports reporting is sound. While "statistical tie" is true in a technical sense, it's not usually how matches are reported, and it's somewhat disingenuous and self-serving to use that language. It would be more honest to say that it's very unlikely that bot is better than humans -- given the advantages the bot had (a gruelling 2-week schedule, and pressure on the humans to get through N hands per day) it still lost by a significant margin.

conistonwater · on Jan 10, 2017

You're right, I would have liked them to report posterior probabilities, but those were p-values instead.

> It would be more honest to say that it's very unlikely that bot is better than humans

I don't think the analogy is sound. Consider the alternative: over a long match, the humans failed to beat the bot convincingly. But in any case, the examples he gives are so wide of the mark that it's clear this isn't the sort of argument he's making—he talks about "spin" and stuff.