Today I read an interesting post to drop RNNs for sequential models.
However, the post, unfortunately, didn't go into much detail on how one would study the attention based models and start experimenting with them.
The only usefull link was to this paper which uses convolutional networks.
The paper is very packed and dense, making it hard for me to understand, and I couldn't find any books on implementation of these attention models.
Does anyone have any suggestions?