Converting audio recording to numeric values representing how loud the audio is at a given time


Warning: Complete audio beginner

Background: I have audio recordings that were recorded through a mobile device using an app. I need to write a program that takes the audio and finds periods of increasing volume (don't know if that's the correct term) followed by a sudden drop in volume.

Want: Ideally I'd want to take the audio file and convert it to text containing numeric values that represent the loudness (volume) every one second interval. So if the audio is one hour long, I'd have 3600 data points.

What I've Tried: I opened the audio file using audacity. In the menu bar, I clicked analyze --> sample data export and it gave me values between -1 to +1.

Problem: I don't know if each data point refers to one second of the audio recording. I don't want negative values. There are also options in audacity in which I'm not sure what to put. Such as project rate (Hz), sample format (16-bit PCM, 24-bit PCM, and 32-bit float), and how many samples I want.

Thank you for your help!


Posted 2015-08-20T22:05:11.157

Reputation: 111

Are you looking for the A) instantaneous value at a sample point every second or B) the peak value during the previous second, or C) the average or integrated value over the previous second or D) something else? – Jim Mack – 2015-08-21T17:08:25.843

Hi @JimMack, I'd be looking for the instantaneous value at a sample point for every second. – itsSLO – 2015-08-21T20:32:24.663

Everyone here seems to be talking about level metering rather than loudness metering - that's a different thing, and much more difficult to calculate. We should change the title if that's what is actually intended. – Mark Durham – 2015-08-22T19:55:35.230

@MarkDurham What's the difference? – itsSLO – 2015-08-24T22:06:01.870

Because loudness is something we perceive, so it is difficult to measure. Have a look at the Wikipedia page for loudness: and this: Essentially frequency content and duration heavily affect how loud something appears, so your numeric values will not represent how loud the sound is, just what level it is. That could be what you want - it depends on the application which you don't explain.

– Mark Durham – 2015-08-25T15:57:16.980



You would rather turn to a program like Matlab or its open source equivalent Octave.

Open your mono-channel audio data as a vector (using wavread for instance), and turn each sample value to its square value. Then compute the mean every N samples, N depending on the smoothness/time-precision you need.


Posted 2015-08-20T22:05:11.157

Reputation: 316

Thank you for your response, what's the advantage to squaring it vs. taking the absolute value? – itsSLO – 2015-08-24T21:28:36.127

It is because you wanted to represent "how low the audio is" and unless I am mistaken, perceived loudness is proportional to the square of the amplitude (you should be able to find more precise definitions and terminology in acoustics and psycho-acoustics literature, especially about intensity, loundness, logarithmic scale, sound pressure level, etc.). – maxime.bochon – 2015-08-25T06:59:13.070


Seeing that typically there are 16k-48k samples per second, picking one at random wouldn't tell you anything about the trend in level. You'd have a 1 in 16000 chance of getting a representative value. In order for the value for "one second" to mean anything you'd have to integrate over the duration.

You're looking for an envelope of the signal. Probably the most useful measurement would be the running average of one (or more) second's worth of samples. You might think of that as a trend line (as in stock prices etc). Sampling the trend line every second would give you what you need.

Since you have values of -1 to +1, you'd take the absolute value or magnitude, since -1 has the same 'volume' as +1 for your purposes.

Jim Mack

Posted 2015-08-20T22:05:11.157

Reputation: 1 661

The running average of samples within a second is always more or less exactly zero (unless there's some DC bias problem). You probably mean the running average of the squared samples's square root.

– leftaroundabout – 2015-08-21T22:02:34.767

How will I know how many samples are in one second? In audacity, I currently have my project rate at 41khz and 41k samples. – itsSLO – 2015-08-21T23:05:09.573

@leftaroundabout - That's why I said he had to deal with magnitude (absolute value). Squaring each sample accomplishes that too but is more costly. Taking the sqrt of the average of the squared values (not exactly what you said) is a good technique but for his purposes is probably overkill. – Jim Mack – 2015-08-22T01:05:49.743

@itsSLO The file metadata will tell you what the sample rate is. If you load the file into Audacity it will read and display that, or you can use a tool like MediaInfo to discover it. – Jim Mack – 2015-08-22T01:10:56.917

@JimMack: well, you may want to rephrase that a bit, it's not evident what you mean there. — Squaring each sample is not significantly more expensive than taking the absolute value: both are a single cycle on any modern processor. Indeed taking the square root is a bit costly (though bit-fiddling tricks can help), but since you don't need to do that for every sample it doesn't really matter.

– leftaroundabout – 2015-08-22T20:26:37.877

The reason that RMS-level is preferred (yes, it is!) to abs level or peak level is that it matches the actual perceived loudness most closely. This is because RMS is the natural norm of the Hilbert space in which the ear performs a sort of Fourier transform; the other norms aren't preserved under phase rotations in this space (which our ears can not sense, or at least barely).

– leftaroundabout – 2015-08-22T20:26:59.220