Difference between zero-padding and character-padding in Recurrent Neural Networks


For RNN's to work efficiently we vectorize the problem which results in an input matrix of shape

    (m, max_seq_len) 

where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples have a smaller lengths than this max_seq_len. A solution is to pad these sentences.

One method to pad the sentences is called "zero-padding". This means that each sequence is padded by zeros. For example, given a vocabulary where each word is related to some index number, we can represent a sentence with length 4,

    "I am very confused" 


    [23, 455, 234, 90] 

Padding it to achieve a max_seq_len=7, we obtain a sentence represented by:

    [23, 455, 234, 90, 0, 0, 0] 

The index 0 is not part of the vocabulary.

Another method to pad is to add a padding character, e.g. "<<pad>>", in our sentence:

    "I am very confused <<pad>>> <<pad>> <<pad>>"

to achieve the max_seq_len=7. We also add "<<pad>>" in our vocabulary. Let's say it's index is 1000. Then the sentence is represented by

    [23, 455, 234, 90, 1000, 1000, 1000]

I have seen both methods used, but why is one used over the other? Are there any advantages or disadvantages comparing zero-padding with character-padding?


Posted 2021-02-27T10:13:26.473

Reputation: 1

No answers