scipy.io.wavfile to read byte array directly?
Hi all, I was interested in the creation of a function for the scipy.io.wavfile utility. Rather than requiring that read() only be performed on a file, I'd like to add a read() function where a byte array of WAV data can be provided directly. Here's some background behind this motivation. I am a student with the University of Washington and I have been working with a former student's machine learning algorithm. The aim of the algorithm is to detect human laughter and it utilizes SciPy and NumPy. We're aiming to create a service-oriented architecture maintained in AWS and our audio data is stored within S3. I've been experimenting with the Boto3 library, which returns a byte array, and I'd like to provide that data directly to the machine learning script (instead of writing to the disk and reading from it). I'd like to hear your thoughts and might experiment with this idea until approval is expressed by the community. Thank you for your time, Miles -- Miles
Miles, Are you aware of io.BytesIO? I don't know the performance implications of using a wrapper, but I'd expect loading the data to take marginal time compared to training your ML model. -- Joseph On Sep 10, 2016 4:35 PM, "Miles Dowe" <milesdowe@gmail.com> wrote:
Hi all,
I was interested in the creation of a function for the scipy.io.wavfile utility. Rather than requiring that read() only be performed on a file, I'd like to add a read() function where a byte array of WAV data can be provided directly.
Here's some background behind this motivation. I am a student with the University of Washington and I have been working with a former student's machine learning algorithm. The aim of the algorithm is to detect human laughter and it utilizes SciPy and NumPy.
We're aiming to create a service-oriented architecture maintained in AWS and our audio data is stored within S3. I've been experimenting with the Boto3 library, which returns a byte array, and I'd like to provide that data directly to the machine learning script (instead of writing to the disk and reading from it).
I'd like to hear your thoughts and might experiment with this idea until approval is expressed by the community.
Thank you for your time,
Miles -- Miles
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
On Sun, Sep 11, 2016 at 8:53 AM, Joseph Booker <joe@neoturbine.net> wrote:
Miles,
Are you aware of io.BytesIO? I don't know the performance implications of using a wrapper, but I'd expect loading the data to take marginal time compared to training your ML model.
BytesIO would be useful if the data is already in an array. It's not clear from the question that that's the case. If not, it's the interpreting of the .wav file data format that Miles would like to reuse.
-- Joseph
On Sep 10, 2016 4:35 PM, "Miles Dowe" <milesdowe@gmail.com> wrote:
Hi all,
I was interested in the creation of a function for the scipy.io.wavfile utility. Rather than requiring that read() only be performed on a file, I'd like to add a read() function where a byte array of WAV data can be provided directly.
wavfile.read already takes a file or a file-like object. The docs don't specify exactly what methods the file-like object needs to have. A quick browse says: read, seek, tell and close. Would be nice to get that documented and tested. Does that help?
Here's some background behind this motivation. I am a student with the University of Washington and I have been working with a former student's machine learning algorithm. The aim of the algorithm is to detect human laughter and it utilizes SciPy and NumPy.
We're aiming to create a service-oriented architecture maintained in AWS and our audio data is stored within S3. I've been experimenting with the Boto3 library, which returns a byte array, and I'd like to provide that data directly to the machine learning script (instead of writing to the disk and reading from it).
I'd like to hear your thoughts and might experiment with this idea until approval is expressed by the community.
If you can make this work with the existing read() function, that would be useful. A separate function shouldn't be needed. Ralf
Thank you for your time,
Miles -- Miles
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
participants (3)
-
Joseph Booker -
Miles Dowe -
Ralf Gommers