## WAV - Bytes per video frame

1

I hope this question will be interpreted more as the nature of maths than the coding examples provided within:

Using the ffmpeg-fluent library for Node.js I'm able to extract PCM (WAV) audio. I've enabled the .native() capability to the chain of commands for ffmpeg which, for both video and audio, transcodes the input data at the same rate as the FPS of the video feed, 20.

I want to find out the amount of bytes that is contained within a "frame" of audio, correlating it to a frame of video data.

My script delivers me a chunk size of 4096 bytes.

Trying to calculate this ahead of time has been problematic for me since I thought the formula was:

48.000 / 20 * samplesize

By the logic that samples per second / frames per second * size of a sample would give me the size in bytes at each frame however the math does not add up.

Attached below you'll find the WAV header of the audio stream extracted using a hex dump:

  Offset  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000  52 49 46 46 FF FF FF FF 57 41 56 45 66 6D 74 20  RIFFÿÿÿÿWAVEfmt
00000010  10 00 00 00 01 00 02 00 80 BB 00 00 00 EE 02 00  ........»...î..
00000020  04 00 10 00 4C 49 53 54 1A 00 00 00 49 4E 46 4F  ....LIST....INFO
00000030  49 53 46 54 0E 00 00 00 4C 61 76 66 35 38 2E 32  ISFT....Lavf58.2
00000040  39 2E 31 30 30 00 64 61 74 61 FF FF FF FF        9.100.dataÿÿÿÿ


When parsing this with node.js I confirm that my block align is 4 bytes and sample rate is 48000, using offsets from the WAV header documentation found at: http://soundfile.sapp.org/doc/WaveFormat/

console.log(chunk.slice(24, 28).readUInt32LE()); // Sample rate.
console.log(chunk.slice(34, 36).readUInt16LE()); // Bits per sample.

48000
4
16


Note: In case I screwed something up with the commands in ffmpeg-fluent by combining native processing with a new FPS, the original FPS of the input signal is 25fps.

0

Firstly, the RIFF file example you are showing is corrupt, which is probably why your math isn't working out. A correct RIFF file has the following chunk format:

4-bytes: {TAG {RIFF}}
4-bytes: {PAYLOAD-Length} (Little-endian representation} - size of entire file minus 8 bytes
4-bytes: {TAG {WAVE}}


The "PAYLOAD" portion of the above contains a number of chunks of the following format:

{
4-bytes: {TAG}
}


The "FMT " chunk will tell you about the format of the data held in the payload of the "DATA" chunk.

The "FMT " and "DATA" chunks are mandatory, although a valid RIFF file can contain other chunks.

You will need to parse the "FMT " to work out the size of each sample and also to work out how many samples are required to fill a frame of video.

The "FMT " chunk contains the following information:

4-bytes   Subchunk1ID      Contains the letters "fmt " (0x666d7420 big-endian form).
4-bytes   Subchunk1Size    16 for PCM.  This is the size of the rest of the Subchunk which follows this number.
2-bytes   AudioFormat      PCM = 1 (i.e. Linear quantization) Values other than 1 indicate some form of compression.
2-bytes   NumChannels      Mono = 1, Stereo = 2, etc.
4-bytes   SampleRate       8000, 44100, etc.
4-bytes   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
2-bytes   BlockAlign       == NumChannels * BitsPerSample/8 The number of bytes for one sample including all channels. I wonder what happens when this number isn't an integer?
2-bytes   BitsPerSample    8 bits = 8, 16 bits = 16, etc.


Parsing out the "FMT " chunk should reveal the ByteRate and BlockAlign values (little-endian) which should give you what you need in one easy hit.

{ByteRate}/{Frames Per Second}


should give you the number of bytes per frame.

This should be equal to

{BlockAlign}*{SampleRate}/{Frames Per Second} .


Just be aware that all these values are little-endian representation.

In your example, the {ByteRate} value is 00 EE 02 00 which is 0x0002EE00, which is 192000 Decimal. 192000/25 gives you a BytesPerFrame rate of 7680.

Hi Mark! First of all thank you for your elaborate answer. Appreciate that. In your concluding remarks you mention that the ByteRate / FPS is one of the formulae I can use to derrive my answer yet you end up using the BlockAlign instead. Did you secretly multiply the BlockAlign with the Samplerate?

I can't get it to match the 4096 bytes per videoframe I'm receiving. :) – Syncretic – 2019-08-31T10:12:01.520

Yeah sorry about that - typo. I've updated it. Should have been {ByteRate} there at the end. Your {BlockAlign} is 0x04 and {SampleRate} is 48000 so 4*48000/25 = 7680 which is a match. – Mark – 2019-08-31T12:59:00.653

As for why you are receiving 4096 bytes per frame, I don't have enough information to work out what's going on there. – Mark – 2019-08-31T12:59:58.347

Maybe try modifying your script to pull down more data for each frame? – Mark – 2019-08-31T13:11:48.277

Thanks Mark. The thing is, I'm pretty sure 4096 is the correct amount of bytes since I actually feed the chunks into SoX as they are processed by Node.js using Unix pipes. I do the exact same thing with my video frames simultaneously and they stay perfectly synchronized this way. – Syncretic – 2019-09-01T14:06:53.360

I suspect you are probably getting audio dropouts then. I don't have any visibility of your script or the sox configuration so I don't know what Sox is expecting nor what the script is doing. The output from the FMT chunk is correct though. – Mark – 2019-09-02T00:20:56.423