Tag: attention-mechanism

44 What is the positional encoding in the transformer model? 2019-04-28T14:43:17.090

15 Can BERT do the next-word-predict task? 2019-02-28T08:37:42.190

14 Gumbel-Softmax trick vs Softmax with temperature 2019-08-29T10:30:50.857

12 How does attention mechanism learn? 2020-01-23T06:05:27.383

10 Variable input/output length for Transformer 2019-02-13T03:43:48.647

10 In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it? 2019-07-18T08:34:46.710

9 How do attention mechanisms in RNNs learn weights for a variable length input 2018-01-30T00:35:51.420

8 What's the difference between Attention vs Self-Attention? What problems does each other solve that the other can't? 2019-04-17T10:39:34.037

6 Transformer model: Why are word embeddings scaled before adding positional encodings? 2021-01-13T10:10:24.257

4 Transformer-based architectures for regression tasks 2020-05-26T18:03:35.377

3 Why does Position Embeddings work? 2018-11-08T16:05:00.290

3 Keras Attention Guided CNN problem 2018-12-23T10:54:39.563

3 How to train tensorflow's transformer model on my own data? 2018-12-31T13:53:38.913

3 SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors 2019-12-08T17:29:11.803

3 Why is the decoder not a part of BERT architecture? 2019-12-21T17:09:07.040

3 Attention mechanism in Tensorflow 2 2020-01-29T11:43:45.157

3 Why does the transformer positional encoding use both sine and cosine? 2020-02-23T12:54:49.263

3 Transformer decoder output - how is it linear? 2020-05-20T13:43:21.157

3 Attention for time-series in neural networks 2020-11-28T12:32:13.717

2 Keras value error: Operands could not be broadcast with with shapes(100,100) - GRU 2019-02-21T10:25:07.967

2 ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 256) 2019-04-28T23:24:10.627

2 Any good Implementations of Bi-LSTM bahdanau attention in Keras? 2019-12-02T21:22:22.810

2 How do Bahdanau - Luong Attentions use Query, Value, Key vectors? 2020-03-03T08:56:55.490

2 Does BERT use GLoVE? 2020-04-28T21:23:47.850

2 Explanation about i//2 in positional encoding in tensorflow tutorial about transformers 2020-08-08T22:29:26.477

2 Transformer masking during training or inference? 2020-08-26T17:35:55.427

2 Nutritional image classification task 2021-01-26T14:58:47.977

1 Attention Mechanism: Why use context vector instead of attention weights? 2019-05-16T01:49:18.067

1 How to Visualize Graph Attention 2019-10-29T17:26:18.253

1 What is the advantage of positional encoding over one hot encoding in a transformer model? 2019-11-12T05:49:57.037

1 Training a model for Single Image Super Reoslution 2019-11-30T06:40:43.387

1 In Deep Learning, how many kinds of Attention exist? And what is the history of Attention models? 2019-12-04T11:47:44.703

1 Weight matrices in transformers 2019-12-05T10:34:50.910

1 What is difference between attention mechanism and cognitive function? 2019-12-14T12:15:21.937

1 What is the feedforward network in a transformer trained on? 2020-02-13T14:09:33.973

1 Attention to multiple areas of same sentence 2020-03-08T09:01:26.097

1 In "Attention Is All You Need", why are the FFNs in (2) the same as two convolutions with kernel size 1? 2020-04-03T03:15:50.413

1 What are good toy problems for testing Transformer architectures? 2020-04-09T12:15:28.220

1 Can BERT be used for predicting words? 2020-04-16T09:03:52.267

1 Attention network without hidden state? 2020-04-27T22:22:29.207

1 How to add attention mechanism to my sequence-to-sequence architecture in Keras? 2020-05-17T19:11:57.523

1 How to add a Decoder & Attention Layer to Bidirectional Encoder with tensorflow 2.0 2020-05-18T05:15:42.043

1 what is the difference between positional vector and attention vector used in transformer model? 2020-07-03T19:43:11.267

1 Splitting into multiple heads -- multihead self attention 2020-08-22T16:19:20.303

1 What is the difference between additive and multiplicative attention? 2020-09-07T18:24:41.417

1 What would be the target input for Transformer Decoder during test phase? 2020-09-15T10:23:15.087

1 Is a dense layer required for implementing Bahdanau attention? 2020-10-17T12:13:46.423

1 Why does an attention layer in a transformer learn context? 2020-11-12T15:31:37.863

1 Working Behavior of BERT vs Transformers vs Self-Attention+LSTM vs Attention+LSTM on the scientific STEM data classification task? 2020-11-17T11:42:53.930

1 Role of decoder in Transformer? 2020-11-23T20:29:43.453

1 Pytorch Luong global attention: what is the shape of the alignment vector supposed to be? 2020-12-29T06:05:01.693

1 how is the linear relation between positional encoding helping attention? 2021-01-16T21:00:52.613

1 How do attention mechanism in CNN for images? 2021-01-27T23:45:28.163

0 Why and how BERT can learn different attentions for each head? 2018-12-21T03:34:42.383

0 What is the reason for the speedup of transformer-xl? 2019-02-25T02:24:10.967

0 two different attention methods for seq2seq 2019-11-05T02:44:41.240

0 How are Q, K, and V Vectors Trained in a Transformer Self-Attention? 2020-02-17T09:55:54.033

0 Attention model with seq2seq over sequence 2020-03-16T11:35:26.063

0 how many spectogram frames per input character does text-to-speech (TTS) system Tacotron-2 generate? 2020-05-14T21:31:06.163

0 Is the number of bidirectional LSTMs in encoder-decoder model equal to the maximum length of input text/characters? 2020-05-20T05:10:08.047

0 How to understand Inconsistent and ambiguous dimensions of matrices used in the Attention layer? 2020-06-02T16:51:58.300

0 Predicting point sequence in image 2020-06-17T15:28:09.483

0 NLP Transformers - understanding the multi-headed attention visualization (Attention is all you need) 2020-08-07T11:44:00.363

0 What are the hidden states in the Transformer-XL? Also, how does the recurrence wiring look like? 2020-08-19T21:00:18.737

0 Practical attention models 2020-10-09T11:15:12.913

0 Question about Relative-Position-Representation code 2020-10-14T03:36:25.010

0 Understanding Transformer's Self attention calculations 2020-11-09T13:00:18.900

0 Using Transcoder Model for language to language conversion 2020-11-17T16:48:08.783

0 Why this TensorFlow Transformer model has Linear output instead of Softmax? 2020-11-22T15:08:08.553

0 Predict customer behaviour with Transformer(attention is all you need) 2020-12-05T11:24:21.987

0 Basic of the attention mechanism 2020-12-27T04:47:00.767

0 How do the linear layers in the attention mechanism work? 2021-01-22T11:45:58.957

0 Transformer architecture question 2021-01-26T15:01:45.863