Advice for Audio classifier based on Voice Activity Detection
Josh Warner
silvertrumpet999 at gmail.com
Fri May 22 15:12:26 EDT 2015
Your problem needs time-frequency analysis
<https://en.wikipedia.org/wiki/Time%E2%80%93frequency_analysis>. Generate
some waterfall plots of time vs. frequency using windowed Fourier
transforms. Inspect those, and/or use them them as input into your learning
approaches. Depending on the approach you want to use, the waterfall plots
can be analyzed like images with some caveats as the rows and columns
represent entirely different physical measurements. That's the main place
where scikit-image could potentially assist.
This is its own sub-field of digital signal processing; now that you know
the keyword to search against you can peruse a large body of literature to
assist with your project.
Josh
On Friday, May 22, 2015 at 2:00:58 PM UTC-5, user783746 wrote:
>
> I am writting a program to classify recorded audio phone calls files (wav)
> which contain atleast some Human Voice or Non Voice (only DTMF, Dialtones,
> ringtones, noise). I tried implementing simple VAD (voice activity
> detector) using ZCR (zero crossing rate) & calculating Energy, but these
> parameters confuse with DTMF, Dialtones files with Voice.
>
> I also tried implementing a machine learning based approach using SVM
> (Support Vector Machine) and MFCC coefficients. The results were worse than
> previous approach.
>
> I need someone to advice me little on this domain, I have no previous
> experience in machine learning or AI. I am willing to put in good amount of
> time in this domain.
>
> I am comfortable working in MATLAB, scipy, numpy, scikit-learn, python.
>
> Thank you
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20150522/4155cfc0/attachment.html>
More information about the scikit-image
mailing list