Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is because they use the CuDNN LSTM kernel in CNTK and in TensorFlow they chose to not use the CuDNN LSTM kernel which might be a bit unfair because they could have used it. The same for Torch. See here for more details: https://news.ycombinator.com/item?id=14473234


While it is true that CNTK can use CuDNN LSTM, if the recurrence does not fall into the 4 recurrences that CuDNN supports, CNTK is still much faster. The simplest way to verify this is to take a Keras script that uses whatever recurrent network you want and run it (on a GPU) with Tensorflow backend and with CNTK backend. Some anecdotal evidence suggest an easy 3x speedup.

Disclaimer: I work at Microsoft.


Why would that be "a bit unfair"? It sounds like the TensorFlow team just hasn't been able to make as many or as good of optimizations as the CNTK team. If they could have, but chose not to do the same things, what tradeoff is there to be able to say it's "a bit unfair"?


The binding of the word "they" is unclear in the comment you're replying to.

The "they" who chose not to use the cuDNN bindings were the authors of the benchmark. Some of the Torch folks filed a bug with the HKbench folks for the same error, but with respect to Torch: https://github.com/hclhkbu/dlbench/issues/14


thanks, that makes more sense




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: