How can these wav/mpeg files be the exact same duration—down to the (apparent) millionth of a second?


I'm working on a project that involves many different .wav files. In checking out some attributes of these files, I noticed something strange: The duration of all the files looks to be chunked in some way:

To verify, I downloaded the files and ran ffprobe -i pol.wav -show_entries format=duration -v quiet -of csv="p=0" and each one spat out the number 1.959250—which coincides exactly with the length I was seeing in the first place.

Is there something about wav or mpeg files that forces them to be in a particular bucket of length (i.e. - if two files are 2.6 seconds and 2.4 seconds, they both get changed in such a way to be 2500 miliseconds)? Or is this something probably done very deliberately by whomever recorded these sounds in the first place (i.e. - every sound file was specifically trimmed to some exact length).

Even if it is the case that these files were trimmed to this exact length, how is it that they could be trimmed to the millionth of a second? Is it even possible to know the duration of a file to that precision?


Posted 2018-08-06T16:36:55.807

Reputation: 103



Digital audio is sampled i.e. made up of discrete chunks, with each chunk representing a small sliver of time. So the duration will be quantized to multiples of 1/sampling frequency i.e. if sampling rate is 22050 Hz (samples/sec), and there are 37646 samples (per channel), then duration will be 37646 x 1/22050 seconds == 1.7073015873015873015873015873016. Most apps will round this value and show a lower precision reading. FFmpeg goes upto microseconds, so it will show 1.707302 seconds.

The above directly applies to uncompressed audio, like that found in WAV files. Audio in MPEG files is compressed and encapsulated into frames. Each frame has a fixed number of samples, with the number depending on the codec. AAC (audio codec found in iTunes) is typically 1024 samples/frame. The audio encoder will add some extra samples at the start and end of the stream (technical reasons). So, the duration in this case will be, assuming the same 22050 Hz, in steps of 1024/22050 seconds == 0.04643990929705215419501133786848 seconds. The MP4 contains information about the padding added, so the decoder will not render those extra samples and restore the original duration.

About the identical durations in those files, they have probably been set to contain the same number of samples.


Posted 2018-08-06T16:36:55.807

Reputation: 544