Tag: transformer

44 What is the positional encoding in the transformer model? 2019-04-28T14:43:17.090

15 Can BERT do the next-word-predict task? 2019-02-28T08:37:42.190

10 In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it? 2019-07-18T08:34:46.710

8 Preprocessing for Text Classification in Transformer Models (BERT variants) 2019-11-08T06:28:48.750

7 what is the first input to the decoder in a transformer model? 2019-05-11T08:36:07.907

6 Why does vanilla transformer has fixed-length input? 2020-03-08T16:28:59.357

6 Is BERT a language model? 2020-05-13T12:22:22.470

6 Transformer model: Why are word embeddings scaled before adding positional encodings? 2021-01-13T10:10:24.257

5 Proper masking in the transformer model 2019-12-18T11:18:32.987

4 Test dataset with categorical variable value not present in train dataset & transformer 2019-05-28T04:53:34.053

4 Why Decision Tree Classifier is not working with categorical value? 2019-12-22T18:54:00.187

4 Transformer-based architectures for regression tasks 2020-05-26T18:03:35.377

4 BPE vs WordPiece Tokenization - when to use / which? 2020-06-02T14:21:40.983

4 Bert for QuestionAnswering input exceeds 512 2020-09-14T12:59:36.467

3 How Transformer is Bidirectional - Machine Learning 2019-03-16T07:12:36.477

3 Problem trying to build my own sklean transformer 2019-05-05T17:39:44.033

3 Pyspark Pipeline Custom Transformer 2019-05-17T13:02:53.887

3 What is Bit Per Character? 2019-07-22T10:11:13.170

3 Bi-directionality in BERT model 2019-08-05T17:14:17.467

3 Why does the transformer positional encoding use both sine and cosine? 2020-02-23T12:54:49.263

3 Transformer decoder output - how is it linear? 2020-05-20T13:43:21.157

3 German Chatbot or conversational AI 2020-05-30T10:55:41.627

3 Effect of Stop-Word Removal on Transformers for Text Classification 2020-12-03T20:24:23.693

3 Trained BERT models perform unpredictably on test set 2020-12-11T10:23:51.530

3 Layer normalization details in GPT-2 2021-01-27T14:24:12.767

2 Incrementally Train BERT with minimum QnA records - to get improved results 2019-03-16T10:30:21.220

2 How to prepare the data for text generation task 2019-03-23T00:43:54.160

2 What is the use of [SEP] in paper BERT? 2019-05-07T04:53:18.680

2 Pytorch: How to implement nested transformers: a character-level transformer for words and a word-level transformer for sentences? 2019-06-14T18:44:26.740

2 Does it make sense to use Transformer encoders on top of a pretrained Word2Vec embedding for a classification task? 2019-08-28T14:45:43.833

2 What is auxiliary loss in Character-level Transformer model? 2019-09-30T04:00:46.460

2 How do Bahdanau - Luong Attentions use Query, Value, Key vectors? 2020-03-03T08:56:55.490

2 Custom functions and pipelines 2020-04-06T16:21:06.033

2 Does BERT use GLoVE? 2020-04-28T21:23:47.850

2 Explanation about i//2 in positional encoding in tensorflow tutorial about transformers 2020-08-08T22:29:26.477

2 Overfitting while fine-tuning pre-trained transformer 2020-08-12T18:03:26.737

2 Transformer masking during training or inference? 2020-08-26T17:35:55.427

2 Loss first decreases and then increases 2020-08-29T09:30:52.917

2 How to treat data transformation choices as hyperparemeters? 2020-09-14T04:25:32.563

2 Why is 10000 used as the denominator in Positional Encodings in the Transformer Model? 2020-10-01T21:27:52.227

2 How to train a model on top of a transformer to output a sequence? 2020-10-30T11:37:22.500

2 What is the difference between BERT architecture and vanilla Transformer architecture 2020-11-30T03:34:44.230

2 How to evaluate the quality of speech-to-text data without access to the true labels? 2021-01-24T01:12:46.660

1 The principle of LM deep model 2019-03-22T09:35:50.487

1 How does Byte Pair Encoding work on the byte sequence? 2019-09-06T13:51:02.740

1 BERT for non-textual sequence data 2019-11-14T08:55:22.413

1 In Deep Learning, how many kinds of Attention exist? And what is the history of Attention models? 2019-12-04T11:47:44.703

1 Weight matrices in transformers 2019-12-05T10:34:50.910

1 Measuring quality of answers from QnA systems 2019-12-21T15:21:19.663

1 How do I implement Dual-encoder model in Pytorch? 2019-12-30T10:04:03.700

1 Why do BERT classification do worse with longer sequence length? 2019-12-31T18:08:12.690

1 What is the feedforward network in a transformer trained on? 2020-02-13T14:09:33.973

1 Should weight distribution change more when fine-tuning transformers-based classifier? 2020-02-24T20:08:45.703

1 Does the transformer decoder reuse previous tokens' intermediate states like GPT2? 2020-03-25T15:44:14.860

1 In "Attention Is All You Need", why are the FFNs in (2) the same as two convolutions with kernel size 1? 2020-04-03T03:15:50.413

1 What are good toy problems for testing Transformer architectures? 2020-04-09T12:15:28.220

1 Can BERT be used for predicting words? 2020-04-16T09:03:52.267

1 TensorFlow1.15, multi-GPU-1-machine, how to set batch_size? 2020-06-01T05:23:51.590

1 Calculating key and value vector in the Transformer's decoder block 2020-06-20T18:13:17.497

1 Next sentence prediction in RoBERTa 2020-06-29T20:55:34.947

1 what is the difference between positional vector and attention vector used in transformer model? 2020-07-03T19:43:11.267

1 Based on transformer, how to improve the text generation results? 2020-08-19T04:09:42.670

1 Splitting into multiple heads -- multihead self attention 2020-08-22T16:19:20.303

1 Does finetuning BERT involving updating all of the parameters or just the final classification layer? 2020-09-04T20:54:25.300

1 Question about BERT embeddings with high cosine similarity 2020-09-10T15:13:03.027

1 What would be the target input for Transformer Decoder during test phase? 2020-09-15T10:23:15.087

1 SVM on BERT-Embeddings with very small dataset does not converge 2020-10-29T17:52:03.753

1 Can I fine-tune the BERT on a dissimilar/unrelated task? 2020-10-30T07:20:30.487

1 Why does an attention layer in a transformer learn context? 2020-11-12T15:31:37.863

1 How to use paraphrase_mining using sentence transformers pre-trained model 2020-11-13T20:56:43.320

1 Role of decoder in Transformer? 2020-11-23T20:29:43.453

1 Why transform embedding dimension in sin-cos positional encoding? 2020-11-24T00:12:56.410

1 Understanding the XLNet model for a concrete case 2020-12-08T14:03:11.337

1 Can Transformer Models be used for Training Chatbots? 2020-12-27T03:29:55.030

1 How do I handle class imbalance for text data when using pretrained models like BERT? 2020-12-31T14:09:09.513

1 Inference order in BERT masking task 2020-12-31T20:33:17.627

1 What does attention weights output from Transformer network do? 2021-01-05T06:35:18.377

1 Using numpy.ndarray in machine learning sklearn.preprocessing model 2021-01-09T17:23:18.730

1 How to i get word embeddings for out of vocabulary words using a transformer model? 2021-01-13T07:02:51.217

1 how is the linear relation between positional encoding helping attention? 2021-01-16T21:00:52.613

1 Train a final model with the full data 2021-01-26T13:38:31.013

1 Unigram tokenizer: how does it work? 2021-02-02T13:28:18.273

1 Why does my manual derivative of Layer Normalization imply no gradient flow? 2021-02-19T21:43:34.940

0 What is the reason for the speedup of transformer-xl? 2019-02-25T02:24:10.967

0 Which is better: GPT or RelGAN for text generation? 2019-03-26T08:39:57.427

0 Transformer for neural machine translation: is it possible to predict each word in the target sentence in a single forward pass? 2019-06-30T02:44:25.183

0 NMT, What if we do not pass input for decoder? 2019-09-16T06:52:32.240

0 When do you use FunctionTransformer instead of .apply()? 2019-10-19T09:06:26.237

0 Why seq2seq models are superior to simple LSTMs? 2019-11-29T14:24:55.463

0 the library 'transformers' works also with older version of Tensorflow? 2019-12-11T16:22:26.750

0 Pretrained Models for Keyword-Based Text Generation 2020-02-12T16:12:59.570

0 How are Q, K, and V Vectors Trained in a Transformer Self-Attention? 2020-02-17T09:55:54.033

0 Seeking your advice on XLM-R for NMT 2020-03-07T10:31:35.323

0 Transformer seq2seq model and loading embeddings from XLM-RoBERTa 2020-03-11T18:01:15.257

0 Transformer-XL architecture 2020-03-17T17:05:17.550

0 How to detokenize a BertTokenizer output? 2020-03-25T21:26:39.813

0 Transformers and BERT: dealing with possessives and apostrophes when encode 2020-04-02T20:29:18.710

0 What is "position" in CNN (im2latex) for Positional Encoding? 2020-04-03T15:49:15.353

0 Overfitting with text classification using Transformers 2020-04-23T12:43:10.007