Advice for Audio classifier based on Voice Activity Detection
I am writting a program to classify recorded audio phone calls files (wav) which contain atleast some Human Voice or Non Voice (only DTMF, Dialtones, ringtones, noise). I tried implementing simple VAD (voice activity detector) using ZCR (zero crossing rate) & calculating Energy, but these parameters confuse with DTMF, Dialtones files with Voice. I also tried implementing a machine learning based approach using SVM (Support Vector Machine) and MFCC coefficients. The results were worse than previous approach. I need someone to advice me little on this domain, I have no previous experience in machine learning or AI. I am willing to put in good amount of time in this domain. I am comfortable working in MATLAB, scipy, numpy, scikit-learn, python. Thank you
participants (3)
-
Josh Warner
-
Stefan van der Walt
-
user783746