Advice for Audio classifier based on Voice Activity Detection

Josh Warner silvertrumpet999 at gmail.com
Fri May 22 15:12:26 EDT 2015


Your problem needs time-frequency analysis 
<https://en.wikipedia.org/wiki/Time%E2%80%93frequency_analysis>. Generate 
some waterfall plots of time vs. frequency using windowed Fourier 
transforms. Inspect those, and/or use them them as input into your learning 
approaches. Depending on the approach you want to use, the waterfall plots 
can be analyzed like images with some caveats as the rows and columns 
represent entirely different physical measurements. That's the main place 
where scikit-image could potentially assist.

This is its own sub-field of digital signal processing; now that you know 
the keyword to search against you can peruse a large body of literature to 
assist with your project.

Josh


On Friday, May 22, 2015 at 2:00:58 PM UTC-5, user783746 wrote:
>
> I am writting a program to classify recorded audio phone calls files (wav) 
> which contain atleast some Human Voice or Non Voice (only DTMF, Dialtones, 
> ringtones, noise). I tried implementing simple VAD (voice activity 
> detector) using ZCR (zero crossing rate) & calculating Energy, but these 
> parameters confuse with DTMF, Dialtones files with Voice.
>
> I also tried implementing a machine learning based approach using SVM 
> (Support Vector Machine) and MFCC coefficients. The results were worse than 
> previous approach.
>
> I need someone to advice me little on this domain, I have no previous 
> experience in machine learning or AI. I am willing to put in good amount of 
> time in this domain.
>
> I am comfortable working in MATLAB, scipy, numpy, scikit-learn, python.
>
> Thank you
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20150522/4155cfc0/attachment.html>


More information about the scikit-image mailing list