Check out my Medium blog post with the overview of deep learning attention mechanisms in NLP! We start with the basic attention mechanism, moving on to its variations (think global vs local attention). Then we introduce co- and self-attention (Transformer model). Key-Value(-Predict), Hierarchical and Nested attention models are also covered. The blog focuses a bit more in depth on machine comprehension and question answering task. You can find a list of open-source implementations in Python in the end of the post. Enjoy and share your feedback!