Does converting mp3 to wav make sense with regard to timing accuracy?


Think about a program where it is mission critical to sync audio playback with text being displayed on the screen. And timing of this sync must be perfect.

Currently, I am having all sorts of trouble with this as browsers are not accurately representing the playback position within mp3 files, specifically for long .mp3 files (nor .ogg), perhaps with variable bit rates (as opposed to constant bit rates)... still experimenting with all the combinations. They are often up to a half-second or more off time, especially when "scrubbing" through the long track.

I have been led to believe that these inaccuracies are the result of the compression techniques used to shrink files and that I may be able to completely sidestep this issue by making use of .wav files within my application (not ideal, but the timing MUST be accurate). I still need to do more testing on this, but I want to ask here to get the opinion of people who understand audio compression/codecs, etc. better than I do.

This applications makes use of thousands of mp3's. If a file is already an mp3, is it feasible that converting to wav after the fact could address these timing issues? Or is it a situation more like with the quality of the audio itself that once it is downsized to mp3, the information is lost and converting back to wav will not "get that lost data back?"

Thanks for your insight.

**EDIT - as it turns out, variable bit rate encodings on your mp3 files are indeed the problem here. If you need accurate timing, make sure you use FIXED/CONSTANT bitrate encoding.

Brian FitzGerald

Posted 2016-06-14T15:37:31.210

Reputation: 151

Voting to close as, although a very interesting question, it is not related to sound design but to html/javascript development. Maybe would be better fitted.

– audionuma – 2016-06-14T17:11:51.650


Well, I think it falls somewhere in the middle. I have already posted over there but my take is that most developers (such as myself) may not have a solid enough understanding of audio compression/codecs etc to really know what all may be going on here, specifically how different types of files/codecs/bitrates/sample rates/etc. might relate to inaccurate time reporting by the browser... :(

– Brian FitzGerald – 2016-06-14T17:21:22.057

the thread you are pointing to seems quite comprehensive to me. As for checking wether you have better results with wav file, if you have code running with mp3 files it shouldn't be too difficult to test wether you have better results with wav. As it is, your question doesn't fit into

– audionuma – 2016-06-14T17:42:16.773

1That thread is not comprehensive as it doesn't solve the problem, so I have to go deeper... to the source (audio knowledge). Anyhow, I understand your point. I will ask another question that gets more directly to what I want to know regarding the audio. Thanks! – Brian FitzGerald – 2016-06-15T14:20:19.483

Unfortunately, many (most ?) sound people (production mixer, dubbing mixers, sound editors, ...) are not expert in the field of encoding/decoding audio theory and implementation. Most of them will nevertheless consider that when frame/sample accuracy is needed, PCM audio is the only way to go as lossy codecs will inherently induce some issues. There are nevertheless examples of web based games where sufficient sync is achieved, even using web audio api. You didn't tell us if tests with wav files lead to better results. Notice that instead of asking another question, you might edit this one. – audionuma – 2016-06-15T17:45:49.077

4Is it a lot of work to just try it? You already have the system built that tries to follow MP3s, can you adjust it to run on wave files instead? In terms of audio nerds being able to answer your question, one challenge is we don't know why the MP3s could not sync. If the sync is trying to track the position in the audio file by "counting" words or bits, then wave should work better than MP3 because word length and sample rate are fixed in wave files, but not always fixed in MP3s. You might also look for formats that can include position/clock metadata. – Todd Wilcox – 2016-06-15T23:06:03.833

1Could you say a bit more about the timing of the images/words, how that is being run? I'm having trouble visualizing what you are doing. How are you playing back the sound file and what exactly has to correspond with what? Why are you not paying back video files where the images and sounds are fixed in position rather than having two separate entities? – Phil Freihofner – 2016-06-16T01:33:59.567

Thanks a lot for the response and thoughts guys. Based on some additional tests I have done, it does indeed seem to come down to an issue with variable bit rate encoded mp3's (versus constant bit rate). At the end of the day, VBR encodings just seem to be VERY unreliable with regards to the timing data. Googling, I have seen developers struggle with this, video guys, etc. Anyone needing to sync audio with text or video for example, can run into this issue when dealing with VBR. I will test wav for my own personal interest, but my most evident conclusion is that VBR encoding is the devil. – Brian FitzGerald – 2016-06-16T13:13:27.443

Regarding your question Phil, there is no video file, only audio and text. Think about an ebook reader platform where you can click on the words to save them, add annotations, etc. Can't really do that if they are "baked into" the video. – Brian FitzGerald – 2016-06-16T14:36:41.003

Yes, I do recall VBR MP3s having timing issues on online playback.

Whenever I play them back in something like, if I skip through the track, and then go back to a particular time stamp, it's not the same spot in the audio file. Sometimes, I get the issue where the time display for the file goes past its original length but the song is not over yet.

However, when I re-render the file as CBR (constant bit rate), I don't get this issue. – timaeus222 – 2016-06-18T09:25:37.497

6The problem is not one of 'reliability' as you say, but a simple matter of how the format works. The mp3 stream is divided into frames, and each frame holds exactly the same length (time) of audio.. approx 26 milliseconds. But the size (bytes) of each frame will vary based on the bitrate of that frame. In a VBR file, the bitrate thus the frame size varies, so there's no way, by looking at the number of bytes of data consumed/left, to know exactly where you are in terms of time past/remaining. You can only approximate. With CBR and WAV, the relationship between bytes and time is linear. – little_birdie – 2016-06-22T21:26:18.780

Following the discussion above, I'm wondering what exactly it is that seems to make VBR problematic in some cases. VBR ITSELF cannot be regarded as a problem, since it works on many players without timing problems. VBR means that the precision of sound encoding varies over time. So, the exact reason could maybe be an insufficient allocation of processing resources, while being in need to calculate with a new BR at a decisive point of time. (But that's definetely just a guess.) – philburns – 2017-05-11T12:09:25.293

The issue isn't VBR encoding per se. .mp3 is close to a raw bitstream i.e. no index or timestamps, so the inconstant frame sizes don't allow for a naive seek-by-byte-offset. The audio data should be stored in a container with a global header and timestamps. MP4 is one. – Gyan – 2018-06-09T15:35:15.460



It does not make sense. You get a lot bigger filesizes, precisely 620 MB per hour of audio (which translates to 20 3-minute-long songs). For thousands, you're looking at about 60 GB of music. A minute of uncompressed audio is 31 MB. For an average American (source), that's 13 seconds of loading a single minute of audio. Not to mention 31 MB of data from a mobile plan if your users happen to use a phone, where the situation would get worse.

VBR (variable bitrate) seems to be a common issue as seen here and here. There are some bug reports like this 2-year old bug in Firefox which got marked as works for me. There are also a similar questions on StackOverflow (see HTML5 audio starts from the wrong position in Firefox and Inconsistent seeking in HTML5 Audio player). You might have better luck with Web Audio or a library that uses it, such as SoundJS.

You also need to properly configure your server to send X-Content-Duration headers


Posted 2016-06-14T15:37:31.210

Reputation: 121