This is because they use the CuDNN LSTM kernel in CNTK and in TensorFlow they ch...

justnikos · on June 2, 2017

While it is true that CNTK can use CuDNN LSTM, if the recurrence does not fall into the 4 recurrences that CuDNN supports, CNTK is still much faster. The simplest way to verify this is to take a Keras script that uses whatever recurrent network you want and run it (on a GPU) with Tensorflow backend and with CNTK backend. Some anecdotal evidence suggest an easy 3x speedup.

Disclaimer: I work at Microsoft.

moron4hire · on June 3, 2017

Why would that be "a bit unfair"? It sounds like the TensorFlow team just hasn't been able to make as many or as good of optimizations as the CNTK team. If they could have, but chose not to do the same things, what tradeoff is there to be able to say it's "a bit unfair"?

dgacmu · on June 3, 2017

The binding of the word "they" is unclear in the comment you're replying to.

The "they" who chose not to use the cuDNN bindings were the authors of the benchmark. Some of the Torch folks filed a bug with the HKbench folks for the same error, but with respect to Torch: https://github.com/hclhkbu/dlbench/issues/14

moron4hire · on June 3, 2017

thanks, that makes more sense