How to check whether audio bytes contain empty noise or actual voice/signal?

Sat Oct 26 12:07:10 EDT 2024

On 10/25/2024 12:25 PM, marc nicole via Python-list wrote:
> Hello Python fellows,
> 
> I hope this question is not very far from the main topic of this list, but
> I have a hard time finding a way to check whether audio data samples are
> containing empty noise or actual significant voice/noise.
> 
> I am using PyAudio to collect the sound through my PC mic as follows:
> 
> FRAMES_PER_BUFFER = 1024
> FORMAT = pyaudio.paInt16
> CHANNELS = 1
> RATE = 48000
> RECORD_SECONDS = 2import pyaudio
> audio = pyaudio.PyAudio()
> stream = audio.open(format=FORMAT,
>                  channels=CHANNELS,
>                  rate=RATE,
>                  input=True,
>                  frames_per_buffer=FRAMES_PER_BUFFER,
>                  input_device_index=2)
> data = stream.read(FRAMES_PER_BUFFER)
> 
> 
> I want to know whether or not data contains voice signals or empty sound,
> To note that the variable always contains bytes (empty or sound) if I print
> it.
> 
> Is there an straightforward "easy way" to check whether data is filled with
> empty noise or that somebody has made noise/spoke?

It's not always so easy.  The Fast Fourier Transform will be your 
friend. The most straightforward way would be to do an autocorrelation 
on the recorded interval, possibly with some pre-filtering to enhance 
the typical vocal frequency range.  If the data is only noise, the 
autocorrelation will show a large signal at point 0 and only small, 
obviously noisy numbers everywhere else. There are practical aspects 
that make things less clear.  For example, voices tend to be spiky and 
erratic so you need to use small intervals to have a better chance of 
getting an interval with a good S/N ratio, but small intervals will have 
a lower signal to noise ratio.

Human speech is produced with various statistical regularities and these 
can sometimes be detected with various means, including the autocorrelation.

You also will need to test-record your entire signal chain because it 
might be producing artifacts that could fool some tests.  And background 
sounds could fool some tests as well.

Here are some Python libraries that could be very helpful:

librosa (I have not worked with this but it sounds right on target);
scipy.signal (I have used scypi but not specifically scipy.signal);
python-speech-features (another I haven't used);
     https://python-speech-features.readthedocs.io/en/latest/

Other people will know of others.