How to get number of Frames(or Samples) per sec or ms in a audio (.wav or .mp3) file?

2

1

some of the terms are but technical but please bare with me

I've been observing an audio file under an Python language module

the audio has a framerate/samplerate of 44100 per sec or hz and total frames are 9745238 and the duration of the audio is 220 secs by the file properties whereas it should be 220.9804535147392 secs and has 2 channels.

after reading a file it returned me a 9745238 X 2 matrix of 16bit signed int as expected

where 1 column is channel 1 data and 2 column is the channel 2 data for respective 9745238 frames

so my question is there any robust method find these values (with 1, 2 channel consecutively i.e. each row or the matrix) per second or millisecond, because when i multiplied the frame rate (44100) with the duration(220 secs) of my audio file it must return me total number of frames/samples but No, it returned me 9702000 whereas there are 9745238 in total. So how can i get the exact values(rows) per second?

any guesses?

Edit 1

I've referred to a good discussion here

and i guess all i need is bitrate which is bitrate = sampleRate * bitDepth but how can i get bit depth is it sample size / sample width or something else.

2

The bit depth in your case is 16. It refers to how large each sample is, so if you have a 16 bit value for each sample, that is your bit depth.

Bit rate is a measure of data per second (as I think you know) and hence for uncompressed PCM audio it is sample rate (44100) * bit depth (16) * channels (2)

I hope this is all the information you're looking for. You already know the exact file duration in samples. (Every measure in terms of seconds/milliseconds will usually have a rounding error, and so when you're doing computations on audio it's definitely useful to know the duration in samples.) It seems you know every pertinent stat.

PS. For the record, frame and sample are not equivalent. In audio programming, you might come across the term frame to denote a number of contiguous samples that are processed in one go: it's also known as a block or vector. Processing a bunch of samples together reduces certain overheads. You'll see this in the settings of some audio software, where you can change the block size.

thanks for the answer, so when i printed 1st frame data by using a different python module to check the bits, it returned me \x00\x00\x00\x00 as you said 16 bits (8 bits for each channel), which by converting into 16-integer value gave me a vector [0 0] as expected, but then in what terms a samples and a frame are not equal, is one sample equivalent to the row in the matrix (vector or a block as you said eg. [0 0] for the first sample) but isn't one frame is the same too? and what are overheads? – P.hunter – 2017-12-19T06:00:45.940

thanks i figured it out, and i mentioned your answer on stack overflow for references for other people.

– P.hunter – 2017-12-19T09:14:10.827

I'm not sure what those hex values mean but glad to hear you've got it figured out. Bear in mind that it's unlikely the samples are 8-bit as that is extremely low quality (think vintage console games, 8-bit music, etc.) As for frames, perhaps terminology differs. On reflection, it could be that in this context you can use frame to refer to a row in the matrix, ie. samples for all channels at a point in time. But for instance when writing a plugin in C++... – Igid – 2017-12-19T15:05:04.407

...you will pull a frame of audio from the host (the environment in which the plugin is running), which will be an array of N samples for each channel, where N is the block size. You will process that chunk of audio, and push your modified frame of audio back to the host. I don't know in-depth about what overheads are involved, other than that other lower-priority threads will only be run in between two frames being processed. – Igid – 2017-12-19T15:07:52.610

could it because the file was initially .mp3 and i later converted it into .wav file, however i played the .wav file it sounded me same like the mp3 one, maybe the latter is the cause, and by meaning block do you mean number of frames right? – P.hunter – 2017-12-19T16:08:05.713

So, I read this discussion and answer of a user named hatpaw2 made me think that, we are discussing about the 1 frame (first 4 bytes of the data) only which is under 44 bytes, and it said that the OS additionally adds some RIFF headers when converting (which are like 44 bytes from the starting), however, i did it on windows, do you think it could be the cause?

– P.hunter – 2017-12-19T16:12:07.483

Now I'm pretty well in over my head. These are probably questions more for Stack Overflow. Good luck! – Igid – 2017-12-20T00:15:02.023