Is there a losslessly compressed audio file?

11

1

Anyone who works with audio either as a hobby or as a professional knows about MP3 and WAV, and the advantages and disadvantages of each. Namely:

  • MP3 (most commonly MPEG III)
  • WAV (Waveform Audio File Format)
    • disadvantage: audio data not compressed
      • relatively large file size
    • disadvantage: 4 GiB (232 - 1) file size limit
    • advantage: crystal clear quality

Being an audio "producer", I know to use WAV for everything. However, I'm in a dilemma. As stated above, WAV files are large. A 4 minute stereo song is about 50 megabytes. All those songs add up. In addition, WAV files don't support metadata like MP3 does as far as I've seen.

I can always compress the files with 7-zip's "ultra" option, but this isn't reasonable (or feasible) for files I'm currently working with.

What I'm looking for is an audio format that has the "crystal clear" quality that WAV has with the metadata ability and small file sizes of MP3. Essentially, an audio format that has losslessly compression and metadata. I would prefer it to be widely supported. Maybe just an MP3 codec that is lossless? Does anything like this exist?


audio quality depends on export or rip quality and speakers

Cole Johnson

Posted 2013-07-10T22:06:05.853

Reputation: 608

Answers

23

Just four letters: FLAC.


Some explanation / thoughts on the subject

Warning: this includes personal opinions that aren't necessarily mainstream-accepted. See AJ Henderson's answer for a somewhat more moderate view.

I'd first like to say: being pedantic, there is no such thing as a lossless audio file. Audio is an analogue phenomenon, anything digital can only ever be an approximation. So there is no quite as fundamental difference between "lossless" and lossy codecs as it is always said. If you compare a 11.025 kHz / 8 bit mono .wav file to a 110 kbit/s 48 kHz .mp3 or .ogg by ear, the "lossy" format will clearly win quality-wise, while having almost the same size. That's because these lossy codecs omit information "cleverly", i.e. in such a way as to minimize the impact on how the audio sounds, whereas .wav just quantises time and amplitude everywhere uniformly1.

The human ear has some rather simply specifiable, and some more complicated limits.

  • Any Fourier components above ca. 20 kHz in a signal are virtually inaudible. So (by the Shannon-Nyqvist theorem), technically perfect PAM sampling with rates above 40 kHz yields a representation of the signal that our ears cannot distinguish from the original.

  • Since our body has a finite temperature, there is necessarily some noise floor in our ears. If any noise introduced by digital quantisation stays below this threshold, it won't be audible. Effectively, the dynamic range of the human ear is no more than 140 dB (oftentimes the useful range is much lower); 24-bit integers cover a range of 144 dB. So a properly dithered 24-bit PCM version of a 48 kHz PAM signal will still be indistinguishable (by bare ear) from the analogue original signal.

This is why we call such .wav files (or even CD-quality ones) lossless: the losses are inaudible, so for all relevant purposes they don't exist.

Wait a moment. Is listening really the only thing you do with an audio file? Why, no, you might first pull the signal through all kinds of audio effects. Many of those (e.g. reverb or basic EQing) won't really change anything about the sampling limit and dynamic range. But of others, this can't be said. Most obviously, a compressor takes in a signal with some dynamic range and outputs a signal with lower one. If you still want the final result to have "lossless" quality, you need to make sure the intermediate file has a higher dynamic range than what you want on your ears. This is why professional recordings are nowadays never done on less-than-24-bit files, even if you ultimately need only 16 bit CD-quality. High bit depths pretty much solve the problem for everything ever used in 99% of all audio productions, but it's quite obvious that there are effects where it's not enough: consider simply a strong pitch-down effect. You'd expect this to make ultrasonic sounds audible, but this can't possibly work if the recorded file has only 44.1 kHz sampling rate. Fortunately, such stuff is seldom needed.

Back to out ears' limits – I said there are more complicated ones. Which is why I'm not going to discuss them in detail here, but they're mostly about masking effects: during silence, even very quiet noise is audible, but while there is loud music much higher levels of noise would go unnoticed. So at these parts, you can quantise much rougher – that's easiest done with floating-point samples. This is especially efficient when not working purely in time space but confining the artifacts to frequency ranges where there is also high amounts of signal. To this extend, most lossy algorithms (apart from mp3 I'd like to mention Ogg Vorbis, which BTW doesn't have those patent issues) use a type of Fourier transform, usually DCT, before their main quantisation step. This can work very effectively, as demonstrated by small yet well-sounding .ogg files. In fact, continuing the logic used above: if we manage to produce such a file that humans can in a double-blind test not distinguish from the original, why should we not call this file lossless? It has "errors" (artifacts), but again a .wav file isn't exact either.

However, artifacts in .mp3 are much more complicated than simple quantisation noise or high-frequency band cutoffs. Even when they're still inaudible to the ear, they may have a much more significant effect when fed into further processing, like recompression to another format or audio effects. They're just not designed for that. OTOH, bandlimiting and quantisation noise are very well understood and modelled mathematically exact; you can transform between different sample rates without generation loss, you can re-dither to other bit depths etc.. This makes it acceptable to call lossless formats thus, even though really they aren't lossless. And it's why they are the right choice during audio processing2.

⟨/rant⟩

So how do formats like .flac fit into all this? Lossless means again to keep the benefits of plain PCM like .wav, i.e. not allowing bad-to-handle artifacts. In fact it means you are able to convert a .flac file back to a .wav that will be identical to the original source .wav.

Question is, do we have to live with he large size of PCM files?
If you think about it, there must be physical reasons why our ears are so insensitive to e.g. DCT-quantisation artifacts. From one point, it appears that they mechanically aren't able to properly follow those. From another point, you might also say that real-world sound happens to be of a kind that de-emphasises such artifacts. These arguments are in fact interchangable, as any sound transmitter is also a receiver and an instrument's body is in a way analogue to the eardrum (more literally, a microphone is equivalent to a small loudspeaker). So it should be possible to make up a compression scheme the other way around: instead of thinking "how do we leave out information so the difference won't be notable?" we go "where do we expect to find no information in the first place?" Because then we can use entropy coding, which basically just shuffles data around in a clever way, such that information that was already expected takes up little space while only (seldom-occurring) unexpected information is fully expanded. When we then feed such an algorithm with an audio file that conforms with our expectations (and thus with our ears', which has pretty good experience in those matters), the compressed version will be significantly smaller yet digitally equivalent to the .wav version. And this does in fact work quite well.


1 Actually, there are several rather crude "better-quality" compression schemes that .wav can use. Much more primitive than the DCT-based algorithms that mpeg or Vorbis employ.

2 (this isn't so much a footnote as a conclusion) I should still add that there's another consideration for audio processing: unlike in a simple music player, a DAW needs to access many audio files simultaneously, very quickly, and possibly non-sequentially. This is very easy with a format like .wav, which is encoded completely in the time domain: want to enter at 2:35? Why, just add the offset 155*samplerate*bytespersample to the file position and start playing from there. By contrast, both lossy and lossless compressed file formats have the data somewhat disordered by the various transformations. You usually still have some kind of time-chunks that you can each access directly, but you can't quite as easily just start playing at some arbitrary point – there's always some pre-load, latency, memory&processing overhead and whatnot. So even though .flac may seen the perfect alternative, I actually don't use it that much: disk space is cheap nowadays, so .wav is rather fine. If I need 20GB for a song I'm working on for 20 hours, the storage cost is pretty much neglectable: I can just get a new external HDD when I need it. So there's not much reason to switch to .flac and if it might cause latency issues during an important recording session, I'd have quite a problem. So that's not going to happen.

With finished mastered song files it's rather different: those are just going to be fed into your ears, nothing else. So here the crucial thing is subjectively lossless again: lossless to the ear. Now, I positively dislike the sound of so-called "CD-quality" 128 kBit/s .mp3 files, but for >200 kBit/s, done with a good encoder like lame (with the more maturely designed Vorbis or e.g. .aac you can go even lower) you won't be able to tell the difference. You just won't. So it ******* is lossless, and these files will be quite a lot smaller than the tightest that "lossless codecs" can do.

leftaroundabout

Posted 2013-07-10T22:06:05.853

Reputation: 5 941

Nice one. How does Flac compare to .wav? – filzilla – 2013-07-10T23:23:45.833

4I had to knock off my upvote because lossless isn't because it captures everything our ear can hear, it's because the compression does not degrade the sampled waveform. Increases in sampling frequency and bit depth both produce noticeable improvements in audio, so under your definition of lossless, no 48khz, 16 bit audio would be lossless. Sampling is always lossy, lossless formats simply don't lose any additional information. – AJ Henderson – 2013-07-11T04:43:49.613

1Also, 320kbit is enough for most hardware, but if you have really good gear and a trained ear, the difference is still obvious, though not as obvious as the loss from live sound from a mixer vs a 48khz 16 bit recording. Unless you've spent years analysing sound though, you are correct, very few people actually hear the difference even when listening for it. – AJ Henderson – 2013-07-11T04:47:45.677

2@AJHenderson: "Sampling is always lossy..." that was pretty much the whole point of my rant, wasn't it? The reason I don't really agree with calling .wav or .flac any more lossless than Vorbis or .mp3 is that there's nothing intrinsically special about PCM sampling – it is not even necessarily what AD converters use internally. The only meaningful definition of lossless is artifacts cannot be proven to exist under the targeted hearing conditions. Which is fulfilled by high-quality .wav as well as by high-quality .ogg files. – leftaroundabout – 2013-07-11T11:30:22.150

Increases in sampling frequency or bit depth above 48 kHz / 24 bit do not produce noticable improvements in audio – not as a final format. What's true is 1. A/D converters operating at higher specs will produce a better representation of the analog signal, because they can use more gentle cutoff filters and dithering; but if you later downsample the result to 48 kHz using the correct sinc algorithms it won't sound discernibly different. 2. as I said, if there's further processing involved, previously inaudible artifacts of either format may become obvious. – leftaroundabout – 2013-07-11T11:30:38.523

As for 320 kBit/s etc. – well: for .mp3 it depends heavily on the encoder used, because the standard only specifies the decoding. There definitely are bad old encoders around that won't sound good regardless how high you set the bit rate. Plus, mp3 itself is just technically inferior to newer standards. But with up-to-date lossy compression, you can get results nobody can tell from the exact analogue original. As I said, my ears have the threshold around 200 kBit/s. Surely there are people with better ears, but I would bet you cannot double-blind-spot a properly done 320 kBit/s file. – leftaroundabout – 2013-07-11T11:40:23.927

2@leftaroundabout - I guess we'll just have to agree to disagree on what is discernible then. I've done blind tests of higher sample rates of PCM audio and have been able to pick out the higher quality sample. As the quality goes up, the amount of increase in quality to detect it goes up to as the differences become increasingly small, but that doesn't mean that there aren't artifacts present that can be detected, particularly when played against a significantly higher quality signal. It also depends on how much is going on in the signal though. A band is easier than a speaker to tell. – AJ Henderson – 2013-07-11T13:57:11.157

1That all said, I still love the post other than referring to the definition of lossless as anything other than an exact representation of the signal presented to it. There is a technically correct and established definition of lossless. It is a format that stores the input fed to it exactly. (The sampling occurs prior to storage.) Use of the term lossless for anything other than that is technically incorrect and prevents me from being able to upvote. If you can refine that, I will upvote again even if I disagree on detecting differences since practically, it's correct for 99% of the people. – AJ Henderson – 2013-07-11T14:02:24.367

3I love this post, but AJ is right - for flac the meaning of lossless is precise.it is equivalent to eg winzip being able to reproduce exactly the files compressed within it, despite the size of the zip file being much smaller than the original. The PCM file will be exactly reproduced, at bit level. – Rory Alsop – 2013-07-11T17:28:39.607

@AJHenderson, DrMayhem: first off, I didn't "refer to a definition of lossless other than..."; I referred to the existing audio formats, and what their benefits and problems are. I just made the point why I disagree with calling some of them lossless. Now you say, lossless has an established definition, but it doesn't. This is a definition for lossless compression of digital data. It can thus apply perfectly to e.g. text files, but not to audio because that's an analogue thing. Yes, you can reconstruct the PCM file from .flac, but that is itself just one particular lossy representation. – leftaroundabout – 2013-07-11T23:29:44.717

@leftaroundabout - you are confusing pressure waves with sampled audio. Compression never works with the actual pressure wave (well other than the fact they are compression waves :) ) An audio file is a means of storing sampled data, just like storing a text file and the same definitions apply in the Audio/Visual industry. Lossless and lossy compression in the A/V field have the exact same meaning as in the data field and that is very well established. It wouldn't take much alteration to correct for this. If you would like, I could make the alterations and you can revert if you don't like – AJ Henderson – 2013-07-11T23:44:29.860

@leftaroundabout - I decided to go ahead and make my suggested changes, feel free to revert if you don't like them. – AJ Henderson – 2013-07-12T00:08:19.473

@AJHenderson: I appreciate your effort to make this as good an answer as possible, but I'm afraid I just can't agree with your point of view and proposed changes. "Compression never works with the actual pressure wave" – yes it does. In fact there exist purely analogue data compression schemes; good old Dolby is an example (and I'm not confusing dynamic compression with data compression: the former is one tool to achieve the latter here).

– leftaroundabout – 2013-07-12T00:57:19.143

All widespread audio formats happen to be based on PCM sampled representations, but that's just by convention. Alternatives exist, such as DSD. — While you're mentioning Visual industry, there it's much more obvious: raster graphics predominate in many applications, but vector graphics are also very important. Not really useful for typical photographic data, but quite valuable for e.g. astronomical data where you can't possibly rasterise the whole sky with enough precision to catch double-star system details.

– leftaroundabout – 2013-07-12T01:02:05.820

1Downvote this answer if you consider it so misleading, but as it stands it is my opinion. Why not add more detailed discussion of the less controversial alternative view to your answer? – leftaroundabout – 2013-07-12T01:05:36.727

@leftaroundabout - yeah, I did that too. I just really like most of your answer. Hope you didn't take offense to the edits, wish there was a way to do suggest edits before they are actually made live. Would have used that otherwise. – AJ Henderson – 2013-07-12T15:05:00.733

@leftaroundabout - I'm not sure I follow what you are refering to about vector vs raster, that has nothing to do with compression. That's different image formats. I was more talking about the fact that the idea of converting it in to a format different from what we normally process is more obvious with video and images. When we hear sound, it's very close to what we hear live, but when we see something, it is obvious it is different after being captured. This makes it easier to understand the difference between sampling(capture) and storage. Compression only applies to storage. – AJ Henderson – 2013-07-12T15:11:58.937

Also, your example of the noise reduction is not data compression, it is dynamic compression. In an analog system, the data available wouldn't be reduced as long as the range of possible values is sufficient. For example, if I have the integer values 1 through 10 and I then dynamic compress them to half their value, I get .5,1,1.5, etc. The data isn't actually compressed, it simply requires higher resolution to store. When I expand, I get the original values, but I'm able to keep away from the noise floor. If the resolution was too low it would be a lossy compression though I guess. – AJ Henderson – 2013-07-12T15:14:50.323

Though the same could be said of any quality loss then. It also begs the question of if the headroom is still data, in which case it wouldn't be compression (data). I don't think your answer deserves a downvote though, particularly with the disclaimer added. I'll just remove my upvote again and leave it at that. Thanks for the discussion. – AJ Henderson – 2013-07-12T15:15:17.620

@AJHenderson: can we settle this discussion at some point? Oh well... — What else has vector vs raster to do with, if not compression? Both are fundamentally different approaches of reducing the analogue, and thus infinite, space ℝ² -> RGB to the countable space of digital files. — Again, Dolby is data compression: due to noisy channel capacity, magnetic tape can only transmit a finite rate of information. Recording on tape is thus always lossy compression. You can do it cleverly, or stupidly (e.g. not properly setting up gain).

– leftaroundabout – 2013-07-13T00:45:17.730

@leftaroundabout - I'm not sure how I can make clear to you the difference between data compression and sampling error. They are fundamentally different. One is an incomplete but accurate recording of a given state, the other is an extrapolation which discards information that doesn't fit nicely into a simplified model. Raster is not a form of compression it is a storage format for pixel data. Taking a photo is not compression of a real scene, it is a sampling. Altering parts of that sample such that they no longer match with an accurate sampling model to save space is compression. – AJ Henderson – 2013-07-13T04:26:32.937

And vector has nothing to do with compression at all as it is simply a mathematical model of a shape in a way that defines the data. A vectorization of a raster image would be compression however as it necessitates discarding data that doesn't fit the desired simplification, but vector formats are not themselves a form of compression, the raster to vector conversion is. This is all well established in the field of data compression. – AJ Henderson – 2013-07-13T04:27:56.347

@AJHenderson sampling is insofar different from other techniques as the mathematics of its simplified model (a countable-basis subspace of the L²(ℝⁿ) Hilbert space) are very well understood. So you can be sure about a lot of well-behaved properties of its extrapolation back to , but an extrapolation it is nevertheless. — "Vector...is simply a mathematical model...that defines the data" Aha! Then what is sampling? – leftaroundabout – 2013-07-13T09:31:35.013

let us continue this discussion in chat

– leftaroundabout – 2013-07-13T09:52:51.987

@leftaroundabout - sampling is just what the word means. It is taking samples and storing the values. It is some mechanism of choosing portions of data to capture for converting between formats. This differs from data compression which looks at all of the data and attempts to model the data more simply (either through statistical redundancy, lossless, or through discarding data that is either not useful or that is least significant in fitting a simplified model, lossy) – AJ Henderson – 2013-07-13T18:31:49.697

@AJHenderson no, that is outrightly wrong! If sampling would really just "take samples", i.e. record "snapshots" of a signal, most digital recordings would have very audible and nasty aliasing artifacts in them. What it actually does is exactly what you say of data compression: it models the data more simply, namely by a sequence of samples such that a superposition of sinc peaks around each sample with its recorded amplitude is as close as possible (WRT the norm) to the original signal.

– leftaroundabout – 2013-07-13T18:54:59.740

As I said, this is most naturally described as a subspace projection in the frequency domain (and then it's indeed just "choosing portions of data"), but as the necessary Fourier transform is infeasible to do with analogue means this is not how AD converters actually work. Instead, the sinc-filter being self-adjoint, you can just apply it to the signal and then take, in fact, snapshots of it. This is a fortunately easy process, but it's still not in any way "the one and only way" to digitalise an analogue signal. – leftaroundabout – 2013-07-13T18:58:06.597

@leftaroundabout - I don't know a whole lot about how audio waveforms are produced compared to what I know about video, so I'll take your word on that, but in that case, it is signal processing being done on the sample. By definition, sampling is the act of taking samples. If further processing is done to refine those samples to weed out artifacts from the samples, it is signal processing, if information from the sampling is being discarded to reduce the complexity of the signal or to make it take less space, it is compression.

– AJ Henderson – 2013-07-14T07:56:21.813

@RoryAlsop in the question they specifically mention availability of metadata as an advantage/disadvantage, my answer addressed that point so I think you were incorrect to delete my answer. – Paul Taylor – 2016-12-26T23:25:27.087

Paul - your answer missed off everything else. It was not an answer to the question "is there a losslessly compressed audio file" – Rory Alsop – 2016-12-26T23:29:35.323

11

FLAC (free, lossless audio codec) is a non-patent encumbered audio codec that utilizes lossless compression to store the audio. There are many other lossless options that support compression, but FLAC is more or less the defacto standard. Since it is lossless, the waveform from it will exactly match an uncompressed wav, however it looks for patterns in the audio that can be described exactly, thus finding some space savings. This is basically how all compression works, just lossy compression doesn't mind if it can't make a perfect fit where as lossless means that only an exact match will do.

It is worth noting that lossless files only mean that they store the exact sampled values. All digital audio sampling is an approximation of the analog audio source (unless it is natively produced electronic sounds.) Lossless files ensure you don't get any artifacts from compression though they require a lot of extra space for fairly minimal quality gain for the average listener.

On average gear, the difference between sufficiently high quality lossy and lossless files is going to be almost impossible to tell even with a trained ear. On high end audio gear, with a trained ear, it can be possible to recognize the differences mostly by picking up on artifacts in the sound that alter it subtly, but the average person won't typically notice the difference as long as it is a good quality level. (320kbps for example is more than sufficient data rate for probably 99.9% of the population to not recognize the difference between lossy and lossless.)

AJ Henderson

Posted 2013-07-10T22:06:05.853

Reputation: 7 961

3

FLAC being the most popular one, there is a comprehensive list of lossless compression formats on Wikipedia: http://en.wikipedia.org/wiki/Lossless_compression#Audio

Compressed files should be processed by the CPU before being used. This is not preferable in professional editing as CPU is a very valuable and expensive resource than storage space. Since WAV files are uncompressed they are quite CPU friendly.

Similar to that, end users don't push their CPU's to the limits for audio playback, storage space is a more valuable resource for them.

As a result, uncompressed formats are preferred by audio professionals, lossless formats are preferred by end users.

Guney Ozsan

Posted 2013-07-10T22:06:05.853

Reputation: 724

-3

Yes,use Audacity and convert your .WAV files to 256Kbps .OGG (Vorbis) files,set up audacity to use the best quality everything in quality and your .OGG files will have a little better quality than the .WAV files.

Unruly Godfrey

Posted 2013-07-10T22:06:05.853

Reputation: 1

1No, no they won't. They could be a very exact copy of whatever problems the WAV file had, but they WILL discard information and they certainly won't improve things at all. It will just limit the amount of information loss. – AJ Henderson – 2018-02-19T06:23:06.340