MP3 wave graph

Geoff_Gilmour-Taylor · January 26, 2009, 7:47pm

From: wxpython-users-bounces@lists.wxwidgets.org
[mailto:wxpython-users-bounces@lists.wxwidgets.org]On Behalf
Of Lucas Boppre Niehues
Sent: January 26, 2009 13:33
To: wxpython-users@lists.wxwidgets.org
Subject: [wxpython-users] MP3 wave graph

Hello

I'm looking for a way to display a MP3 file's wave. It's for
an audio lessons player, so the main purpose is allowing the
user to see where are the sounds and pauses, so he / she can
fast forward to the next chapter or rewind to the past one.

[snip]

Question: is there any package / library / function that can
successfully extract an waveform-like graph from an MP3 file?
It doesn't have to create an image file, like the given
example, just the graph points are enough.

I don't think there is any way of getting the waveform from an MP3 *without* decoding it. You could try decoding it a piece at a time, as needed. I do believe it's possible to extract the volume level from each frame without fully decoding (I have some closed-source freeware that displays MP3s like this) but I don't know of a library that does this. That might be enough, though.

If you are the one producing the audio, you might want to think about splitting the MP3 into separate files, one per chapter, or using a navigable digital talking book format such as DAISY (www.daisy.org). (Disclosure: I work for a DAISY producer. The bulk of my team's work is finding chapters in books to mark them up as navigable points in DAISY DTBs. DAISY is primarily aimed at blind, dyslexic, and otherwise print-disabled people.)

Geoff

Geoffrey Gilmour-Taylor
Business Analyst and Supervisor, Audio Conversion Unit
CNIB
1929 Bayview Ave.
Toronto, ON
M4G 3E8

T: (416) 486-2500, ext. 7555
F: (416) 480-7700

CNIB: Vision health. Vision hope. Visit www.cnib.ca

Privacy Disclaimer - Français à suivre

This e-mail message (including attachments, if any) is intended for the use of the individual or entity to which it is addressed and may contain information that is priviledged, proprietary, confidential. If you are not the intended recipient, you are notified that any dissemination, distribution, or copy of this communication is strictly prohibited. If you have received this communication in error, please notify the sender and erase this e-mail message immediately.

···

-----Original Message-----

-----------------------------------------------------------------------------------

Déclaration de confidentialité

Le présent courrier électronique (y compris les pièces qui y sont annexées, le cas échéant) s'adresse au destinataire indiqué et peut contenir des renseignements de caractère privé ou confidentiel. Si vous n'êtes pas le destinataire de ce document, nous vous signalons qu'il est strictement interdit de le diffuser, de le distribuer ou de le reproduire. Si ce message vous a été transmis par erreur, veuillez en informer l'expéditeur et le supprimer immédiatement.

Lucas_Boppre_Niehues · January 26, 2009, 8:19pm

Thanks for the answers. Yes, my expectations are already low with this. I have only heard of softwares that could display compressed waves the way I need, and they probably are not for python.

Geoff, I will search for a way to extract the volume, it should be work for me. And no, I’m not the one making the files.

Thanks

Tim_Roberts · January 27, 2009, 12:10am

Lucas Boppre Niehues wrote:

Thanks for the answers. Yes, my expectations are already low with
this. I have only heard of softwares that could display compressed
waves the way I need, and they probably are not for python.

They are not displaying "compressed waves". They are decompressing the
data, then displaying the decompressed data. I guarantee it. There is
no way to extract a waveform from an MP3 file without decompressing it.
The waveform data is passed through a discrete cosine transform, so what
it stored in the MP3 file is frequency-based data, not time-based data.

Geoff, I will search for a way to extract the volume, it should be
work for me.

No, it won't. Some MP3 files include a header that indicates the
average volume level of the entire file, but it's not standardized.

···

--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Lucas_Boppre_Niehues · January 27, 2009, 12:22am

So, here I am again. I’ve been trying to install PyMedia with no success, but that’s a problem for another time.

Following Geoff idea, is there any package that has a function “sound.GetVolumeAt(time)”? Is there any other type of frequency / volume / etc by time information that can be extracted without decompressing it?

Nitro · January 27, 2009, 12:41am

Volume at time is exactly the definition of waveform. A waveform tells you the volume at a given time. You are trying to do the impossible. You cannot display the waveform without having it. To have it, you need to decompress compressed data.

In your original post you wrote you wanted to do this for "audio lessons", so the user can go forward and backward. Wasn't it much easier if you annotated the mp3s
with chapter information and then let the user seek via "next/prev chapter" buttons?
If the mp3s are not going to rapidly change, you can also preprocess them. So you'd decompress them to waves, calculate the average volume in 1s (or different length) intervals and then save this out. Now when the user is accessing it, you don't have to decompress the waves anymore and can just display the preprocessed data.

To get a good solution for this, we need to know more about what exactly your application should do. Right now it seems you are lost in a detail solution and I am not sure the problem you have needs to be solved at all (e.g. by using a different strategy right from the start).

-Matthias

···

Am 27.01.2009, 01:22 Uhr, schrieb Lucas Boppre Niehues <lucasboppre@gmail.com>:

So, here I am again. I've been trying to install PyMedia with no success,
but that's a problem for another time.

Following Geoff idea, is there any package that has a function
"sound.GetVolumeAt(time)"? Is there any other type of frequency / volume /
etc by time information that can be extracted without decompressing it?

Lucas_Boppre_Niehues · January 27, 2009, 1:01am

Thanks for trying to simplify things Nitro. Explaining myself about the volume graph, I though common waveforms were plotted using frequency, not volume. But I got the point: no graph without decompressing it.
And about the reason behind the waveform: it’s not only for chapters, actually it was an example. The idea is letting the user select what he wants to play, be it an entire chapter or a single phrase; and for that, a visual aid (e.g. waveform) is needed. The current software being used is CoolEdit that, although being obviously capable of editing files, is used only for it’s waveform and player functions.
So I will stick to the preprocess and store processed data solution to minimize delays.

Thank you

Nitro · January 27, 2009, 1:29am

And about the reason behind the waveform: it's not only for chapters,
actually it was an example. The idea is letting the user select what he
wants to play, be it an entire chapter or a single phrase; and for that, a
visual aid (e.g. waveform) is needed. The current software being used is
CoolEdit that, although being obviously capable of editing files, is used
only for it's waveform and player functions.

So you want to have CoolEdit/Audacity, just in a much simpler form which only displays and plays audio clips the user can select?

So I will stick to the *preprocess and store processed data* solution to
minimize delays.

Have you actually measured the delays? Modern computers can decompress audio files really really fast. Audacity and CoolEdit also decompress those files and are fast enough.
So if your target machines have enough RAM and the mp3s are not hours long (or you need lots of them to be active at once), you could just decompress the whole files and store them in memory. Then you model a kind of "view" class around this. The view class can just display parts of the data (the part that will actually appear on screen) and it doesn't have to display each point of the waveform. E.g. if you view 1 minute of audio in an audio editor this is (typically) 44100 samples/s * 60 seconds = 2.6 million samples. Now your monitor has only roughly 1000 pixels in the x-direction. This means you can skip *a lot* of samples and draw only 500 or 1000 of them. This can be made very fast. The "numpy" package is a great asset if you want to code this kind of thing in python.
If you cannot decompress the mp3 files, cause they will be too big in RAM, you can also consider "streaming" them. I am not sure how fast seeking in mp3 files is, but assuming it's rather fast, you can only decompress the parts of the mp3 that will actually be shown on screen. E.g. if the user is editing from 0:40 to 0:50 you only decompress 10s of audio. With a good caching strategy and assuming the user doesn't randomly hop around the file all the time this will be fast, too. If the file is really long and the user wants to see it completely, you can stream it one part after another so you don't get over the ram limit.
Other audio applications I used (more professional ones than audacity) can get slower on really huge projects, but usually they're lighting fast and I don't think they use black magic to do this.

Oh - if you're really set out for speed you can use the GPU for number crunching and direct drawing

-Matthias

···

Am 27.01.2009, 02:01 Uhr, schrieb Lucas Boppre Niehues <lucasboppre@gmail.com>:

Che_M · January 27, 2009, 3:27am

How about this? You just allow the user to press a button for a start and stop point along a line representing the duration of the mp3. This is how some video editors, like VirtualDubMod, work. I think it’s possible that an audio waveform could be too “techy” or “sciencey” looking for most language learners, and from my experience in music recording I know it is not always easy to keep track of which break in the waveform is which. You don’t want them worrying about that noisy visual information. A straight line with little sections to be looped for the learner is probably easiest for them–and for you.

You could even maybe allow the user to provide information for that section (they could type the words spoken into a text box) and you could then associate that text to the start and stop times on the mp3 (and store it somewhere). (In fact, this is getting me interested in trying your software to keep up with learning Spanish!)

···

On Mon, Jan 26, 2009 at 8:29 PM, Nitro nitro@dr-code.org wrote:

Am 27.01.2009, 02:01 Uhr, schrieb Lucas Boppre Niehues lucasboppre@gmail.com:

And about the reason behind the waveform: it’s not only for chapters,

actually it was an example. The idea is letting the user select what he

wants to play, be it an entire chapter or a single phrase; and for that, a

visual aid (e.g. waveform) is needed.

Geoff_Gilmour-Taylor · January 27, 2009, 3:30pm

I think you can still extract the volume of a single frame without having to decode the entire waveform. That's good enough for phrase detection, since MP3 frames are 26 ms, and normal pauses in narration seem to be about 500--1000 ms.
I'm poking around in the libmpg123 source, and there's an mpg123_getvolume function in frame.c---but I'm not familiar enough with the library to say whether it decodes the whole frame. However, mp3DirectCut (http://mpesch3.de1.cc/mp3dcscr.html), which is based on mpg123, displays volume information at frame resolution almost instantaneously, so there has to be some way.
That said, I think it's simpler to annotate chapter breaks. It takes an hour or two per book on average, unless it has more than 200 chapters or there's an unusually hard-to-find chapter.

(My apologies; this is getting rather off-topic. Feel free to contact me off-list.)