"Dynamic convolution" in Numpy
Hello everybody i m fighting with a dynamic binaural synthesis(can give more hints on it if necessary). i would like to modify the sound playing according to listener's head position. I got special filters(binaural ones) per head position that i convolve in real time with a monophonic sound.When the head moves i want to be able to play the next position version of the stereo generated sound but from the playing position(bit number in the unpacked datas) of the previous one.My problem is to hamper audible artefacts due to transition. At moment i m only using *short audio wav* that i play and repeat if necessary entirely because my positionning resolution is 15° degrees.Evolution of head angle position let time for the whole process to operate (getting position->choosing corresponding filter->convolution->play sound) *For long **audio wav* I could make a fade-in fade-out from the transition point but i have no idea how to implement it(i m using audiolab and numpy for convolution) An other solution could be dynamic filtering means when i change position i convolve the next position filter from the place the playing must stop for the previous one(but won't pratically stop to let the next convolution operates on enough frames) in accordance with the filter frame length(all the filters are impulse response of the same lenght 128). The "drawing" i introduce just below is my mental representation of what i m looking to implement, i already apologize for its crapitude (and one of my brain too): t0_________t1__t2__t3___________________________________________________________t=len(stimulus) monophonic sound(time and bit position in the unpacked datas) C1C1C1C1C1C1C1C1C1C1C1... running convolution with filter 1 corresponding to position 1 (ex: angle from reference=15°) P1_______ sound playing 1 ^ position 2 detection(angle=30°) C2C2C2C2C2C2C2C2C2C2C2... running convolution with filter 2 P1_____x keep playing 1 for convolution 2 to operate on enough frames (latency) FIFO fade in fade out P2_________ sound playing 2 I don't know if i made myself very clear. if anyone has suggestions or has already operated a dynamic filtering i would be well interested. Cheers Arthur
Hi Arthur, I've no experience whatsoever with what you are doing, but my first thought was why don't you compute all possible versions beforehand and then progressively switch from one version to another by interpolation between the different versions. If the resolution is 15 degrees, there aren't that many versions to compute beforehand. David On Thu, Jun 3, 2010 at 6:49 AM, arthur de conihout < arthurdeconihout@gmail.com> wrote:
Hello everybody
i m fighting with a dynamic binaural synthesis(can give more hints on it if necessary).
i would like to modify the sound playing according to listener's head position. I got special filters(binaural ones) per head position that i convolve in real time with a monophonic sound.When the head moves i want to be able to play the next position version of the stereo generated sound but from the playing position(bit number in the unpacked datas) of the previous one.My problem is to hamper audible artefacts due to transition. At moment i m only using *short audio wav* that i play and repeat if necessary entirely because my positionning resolution is 15° degrees.Evolution of head angle position let time for the whole process to operate (getting position->choosing corresponding filter->convolution->play sound)
*For long **audio wav* I could make a fade-in fade-out from the transition point but i have no idea how to implement it(i m using audiolab and numpy for convolution)
An other solution could be dynamic filtering means when i change position i convolve the next position filter from the place the playing must stop for the previous one(but won't pratically stop to let the next convolution operates on enough frames) in accordance with the filter frame length(all the filters are impulse response of the same lenght 128).
The "drawing" i introduce just below is my mental representation of what i m looking to implement, i already apologize for its crapitude (and one of my brain too):
t0_________t1__t2__t3___________________________________________________________t=len(stimulus) monophonic sound(time and bit position in the unpacked datas)
C1C1C1C1C1C1C1C1C1C1C1... running convolution with filter 1 corresponding to position 1 (ex: angle from reference=15°)
P1_______ sound playing 1
^ position 2 detection(angle=30°)
C2C2C2C2C2C2C2C2C2C2C2... running convolution with filter 2
P1_____x keep playing 1 for convolution 2 to operate on enough frames (latency)
FIFO fade in fade out
P2_________ sound playing 2
I don't know if i made myself very clear.
if anyone has suggestions or has already operated a dynamic filtering i would be well interested.
Cheers
Arthur
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, thanks for your answer *why don't you compute all possible versions beforehand* thats exactly what i m doing presently cause i m using a 187 filters database(azimuth and elevation).I would love to be able to reduce the angle thread under 5° which triggers around 1000 files to produce.For a 3Mb original sound file, it becomes huge. Thanks Arthur 2010/6/3 David Huard <david.huard@gmail.com>
Hi Arthur,
I've no experience whatsoever with what you are doing, but my first thought was why don't you compute all possible versions beforehand and then progressively switch from one version to another by interpolation between the different versions. If the resolution is 15 degrees, there aren't that many versions to compute beforehand.
David
On Thu, Jun 3, 2010 at 6:49 AM, arthur de conihout < arthurdeconihout@gmail.com> wrote:
Hello everybody
i m fighting with a dynamic binaural synthesis(can give more hints on it if necessary).
i would like to modify the sound playing according to listener's head position. I got special filters(binaural ones) per head position that i convolve in real time with a monophonic sound.When the head moves i want to be able to play the next position version of the stereo generated sound but from the playing position(bit number in the unpacked datas) of the previous one.My problem is to hamper audible artefacts due to transition. At moment i m only using *short audio wav* that i play and repeat if necessary entirely because my positionning resolution is 15° degrees.Evolution of head angle position let time for the whole process to operate (getting position->choosing corresponding filter->convolution->play sound)
*For long **audio wav* I could make a fade-in fade-out from the transition point but i have no idea how to implement it(i m using audiolab and numpy for convolution)
An other solution could be dynamic filtering means when i change position i convolve the next position filter from the place the playing must stop for the previous one(but won't pratically stop to let the next convolution operates on enough frames) in accordance with the filter frame length(all the filters are impulse response of the same lenght 128).
The "drawing" i introduce just below is my mental representation of what i m looking to implement, i already apologize for its crapitude (and one of my brain too):
t0_________t1__t2__t3___________________________________________________________t=len(stimulus) monophonic sound(time and bit position in the unpacked datas)
C1C1C1C1C1C1C1C1C1C1C1... running convolution with filter 1 corresponding to position 1 (ex: angle from reference=15°)
P1_______ sound playing 1
^ position 2 detection(angle=30°)
C2C2C2C2C2C2C2C2C2C2C2... running convolution with filter 2
P1_____x keep playing 1 for convolution 2 to operate on enough frames (latency)
FIFO fade in fade out
P2_________ sound playing 2
I don't know if i made myself very clear.
if anyone has suggestions or has already operated a dynamic filtering i would be well interested.
Cheers
Arthur
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jun 3, 2010 at 9:52 AM, arthur de conihout < arthurdeconihout@gmail.com> wrote:
Hi, thanks for your answer
*why don't you compute all possible versions beforehand*
thats exactly what i m doing presently cause i m using a 187 filters database(azimuth and elevation).I would love to be able to reduce the angle thread under 5° which triggers around 1000 files to produce.For a 3Mb original sound file, it becomes huge.
Indeed ! I'll be curious to see what solutions ends up working best. Keep us posted.
David
Thanks
Arthur
2010/6/3 David Huard <david.huard@gmail.com>
Hi Arthur,
I've no experience whatsoever with what you are doing, but my first thought was why don't you compute all possible versions beforehand and then progressively switch from one version to another by interpolation between the different versions. If the resolution is 15 degrees, there aren't that many versions to compute beforehand.
David
On Thu, Jun 3, 2010 at 6:49 AM, arthur de conihout < arthurdeconihout@gmail.com> wrote:
Hello everybody
i m fighting with a dynamic binaural synthesis(can give more hints on it if necessary).
i would like to modify the sound playing according to listener's head position. I got special filters(binaural ones) per head position that i convolve in real time with a monophonic sound.When the head moves i want to be able to play the next position version of the stereo generated sound but from the playing position(bit number in the unpacked datas) of the previous one.My problem is to hamper audible artefacts due to transition. At moment i m only using *short audio wav* that i play and repeat if necessary entirely because my positionning resolution is 15° degrees.Evolution of head angle position let time for the whole process to operate (getting position->choosing corresponding filter->convolution->play sound)
*For long **audio wav* I could make a fade-in fade-out from the transition point but i have no idea how to implement it(i m using audiolab and numpy for convolution)
An other solution could be dynamic filtering means when i change position i convolve the next position filter from the place the playing must stop for the previous one(but won't pratically stop to let the next convolution operates on enough frames) in accordance with the filter frame length(all the filters are impulse response of the same lenght 128).
The "drawing" i introduce just below is my mental representation of what i m looking to implement, i already apologize for its crapitude (and one of my brain too):
t0_________t1__t2__t3___________________________________________________________t=len(stimulus) monophonic sound(time and bit position in the unpacked datas)
C1C1C1C1C1C1C1C1C1C1C1... running convolution with filter 1 corresponding to position 1 (ex: angle from reference=15°)
P1_______ sound playing 1
^ position 2 detection(angle=30°)
C2C2C2C2C2C2C2C2C2C2C2... running convolution with filter 2
P1_____x keep playing 1 for convolution 2 to operate on enough frames (latency)
FIFO fade in fade out
P2_________ sound playing 2
I don't know if i made myself very clear.
if anyone has suggestions or has already operated a dynamic filtering i would be well interested.
Cheers
Arthur
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Have you ever considered using pygame? Afaik, it is based on SDL and therefore should support realtime mixing and writing to the sound buffer simultaneously as it is played back. I did not check the gory details but it seems that pygame has the Mixer interface from SDL, good luck! For your thing in principle, when your HRIRs are not tooo long, why not calculate each frame of audio by integrating the convolution integral only at time t? Or you could do a cyclic buffer, and add each time instance the current HRIR to it, as multiplied with the current frame. You even do not have to do this in advance. In principle, just copy the cyclic generator buffer to the audio buffer as the time instances arrive? Friedrich
On Thu, Jun 3, 2010 at 7:49 PM, arthur de conihout <arthurdeconihout@gmail.com> wrote:
I don't know if i made myself very clear. if anyone has suggestions or has already operated a dynamic filtering i would be well interested.
Does fade-in/fade-out actually works ? I would have thought that it had killed the properties of your filters ? There are two issues: - how to do convolution fast - how to go from one filter to the other The main issue with changing filters is that your system is not LTI anymore. If your filters have finite impulse answer, I guess it should not be too much of an issue. To do convolution quickly, you need to use FFT, which is a bit tricky if you want to do things in real-time, as you need to partition the impulse response. Using "partitioned impulse answer" as keywords should give you plenty of references on how to do it, David
On 6 June 2010 04:44, David Cournapeau <cournape@gmail.com> wrote:
On Thu, Jun 3, 2010 at 7:49 PM, arthur de conihout <arthurdeconihout@gmail.com> wrote:
I don't know if i made myself very clear. if anyone has suggestions or has already operated a dynamic filtering i would be well interested.
Does fade-in/fade-out actually works ? I would have thought that it had killed the properties of your filters ?
There are two issues: - how to do convolution fast - how to go from one filter to the other
I think the kicker is here: what is the right way to interpolate between filters? If you have, or can generate, a lot of filters, then at least you can evaluate the quality of the interpolation. The right way to understand what kind of interpolation to is to have some idea of the physics you're dealing with. In this case, as I understand it, you're dealing with the auditory effects of the head, seen from different angles. I would say that the ear senses something integrated power in logarithmically-spaced frequency bins over roughly twentieth-of-a-second time intervals. So the relevant effects you care about are amplitude absorption and time delays, if they are as long as a twentieth of a second. Doing simple linear interpolation, unfortunately, will probably get you in trouble - imagine, for example, that two impulse responses have the same amplitude at 440 Hz but different phases. A linear interpolation will change the amplitude (for example if they're 180 degrees apart it'll pass through zero). You might do better with interpolating in polar coordinates, though you might have phase wrapping issues. A really thorough approach might be to take a granular-synthesis approach to the impulse responses, breaking them up into orthogonal time-domain channels within which the response is defined by an amplitude and a time delay, which you'd interpolate in the natural way. I'd try polar interpolation on the FFTs of the amplitudes first, though (since in fact it's the same thing with the minimal possible frequency channels). I suspect that some interpolation, however unrealistic (even linear interpolation) is necessary or listeners may perceive sounds "snapping" from place to place in the aural field.
The main issue with changing filters is that your system is not LTI anymore. If your filters have finite impulse answer, I guess it should not be too much of an issue. To do convolution quickly, you need to use FFT, which is a bit tricky if you want to do things in real-time, as you need to partition the impulse response. Using "partitioned impulse answer" as keywords should give you plenty of references on how to do it,
As far as convolution, as David says, take a look at existing algorithms and maybe even music software - there's a trade-off between the n^2 computation of a brute-force FIR filter and the delay introduced by an FFT approach, but on-the-fly convolution is a well-studied problem. Anne
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 06/07/2010 12:08 AM, Anne Archibald wrote:
I think the kicker is here: what is the right way to interpolate between filters?
There is no right way that I know of, it really depends on what you are doing. The big issue here is that a filter which changes is obviously not time independent anymore. For IIR, this has the unfortunate consequence that even if your filter is stable at every interpolation point, transitioning from one to the other can still blow up. For filters used in music processing, it is common to have filters changing really fast, for example in synthesizers (up to a few hundred times / sec) - and interesting effects are obtained for filters with very high and concentrated resonance. The solution is usually a mix of upsampling to avoid big transitions and using filter representations which are more stable (state-based representation instead of direct filters coefficients, for example). I don't think it is directly applicable to your problem, but the series of papers by Dattorro ten years ago is a goldmine: "Effect Design Part 1|2|3, Jon Dattorro, J. Audio Eng. Soc., Vol 45, No. 9, 1997 September"
As far as convolution, as David says, take a look at existing algorithms and maybe even music software - there's a trade-off between the n^2 computation of a brute-force FIR filter and the delay introduced by an FFT approach, but on-the-fly convolution is a well-studied problem.
It may be pretty cool to have an implementation of scipy, now that I think of it :) One issue is that there are several patents on those techniques, but I was told there are ways around it in term of implementation. Not sure how to proceed here for scipy, David
Hello thanks for all your answers. I had a look at some references in this field. Mostly in R. Nicoll Monograph on Binaural Technology, which is the best recent overview i could find (on AES). All the solutions mentionned in the ones you submitted are included ;) You might do better with interpolating in polar coordinates, though you might have phase wrapping issues. It appears that filters interpolation gives the best results by operating a reconstruction by basis functions using for example spherical thin plate splines performed separately on the magnitude and phase spectra. There is no right way that I know of, it really depends on what you are doing. The big issue here is that a filter which changes is obviously not time independent anymore. For IIR, this has the unfortunate consequence that even if your filter is stable at every interpolation point, transitioning from one to the other can still blow up.
You re so right and here is given cross-fading as a "convenient" solution as dynamic update of filters requires the rapid commutation between sets of coefficients leading to signal discontinuities. I'll have a look at J.Dattorro's tutorials on effect design. If you have, or can generate, a lot of filters, then at least you can evaluate the quality of the interpolation. I m actually using public HRTF databases( there are plenty of :IRCAM, CIPIC...). It may be pretty cool to have an implementation of scipy I think it to :) i would love trying my luck with the zero-delay Gardner's algorithm(in the same prolific years as Dattorro...) which lead to reduced convolution delay.(Cf. W.G Gardner,"efficient Convolution without Input-Output delay",AES.org)but.... my present problem is even more simple. I actually get head position(head-tracking) then deduced associated filters(from a public hrtf database), convolve them with a short stimulus(0.25s) and then play using audiolab.play on numpy arrays. And this process is so long that i can't even get an "unproper" solution(without considering interpolation and filtering on long stimuli using filters commutation...) just playing short stimuli according to head position. Should i code parrallel convolution and play? i think i also multiplied operations cost in the part of the code getting proper filters filenames according to transmitted position information(i m using a wiimote IR camera for my head-tracking solution). So before i go further and i would love to, does anyone could give hints on these basis points? I can provide the code i m using even if it's really messy...due to my recent dealing with coding. Thank you. 2010/6/7 David <david@silveregg.co.jp>
On 06/07/2010 12:08 AM, Anne Archibald wrote:
I think the kicker is here: what is the right way to interpolate between filters?
There is no right way that I know of, it really depends on what you are doing. The big issue here is that a filter which changes is obviously not time independent anymore. For IIR, this has the unfortunate consequence that even if your filter is stable at every interpolation point, transitioning from one to the other can still blow up.
For filters used in music processing, it is common to have filters changing really fast, for example in synthesizers (up to a few hundred times / sec) - and interesting effects are obtained for filters with very high and concentrated resonance. The solution is usually a mix of upsampling to avoid big transitions and using filter representations which are more stable (state-based representation instead of direct filters coefficients, for example).
I don't think it is directly applicable to your problem, but the series of papers by Dattorro ten years ago is a goldmine:
"Effect Design Part 1|2|3, Jon Dattorro, J. Audio Eng. Soc., Vol 45, No. 9, 1997 September"
As far as convolution, as David says, take a look at existing algorithms and maybe even music software - there's a trade-off between the n^2 computation of a brute-force FIR filter and the delay introduced by an FFT approach, but on-the-fly convolution is a well-studied problem.
It may be pretty cool to have an implementation of scipy, now that I think of it :) One issue is that there are several patents on those techniques, but I was told there are ways around it in term of implementation. Not sure how to proceed here for scipy,
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (6)
-
Anne Archibald
-
arthur de conihout
-
David
-
David Cournapeau
-
David Huard
-
Friedrich Romstedt