Hello, I am new to the scikit-image, and interested in using HOG. However, the one which is implemented doesn't seem to give as good results as expected. As a possible explanation I think mainly of 2 reasons: 1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1]. 2) They use tri-linear interpolation when here the you seem to use hard binning. Does this make sense or am I missing something? Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest? Cheers, Jean
On Mon, Aug 19, 2013 at 8:03 PM, Jean K <jean.kossaifi@gmail.com> wrote:
As a possible explanation I think mainly of 2 reasons: 1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1]. 2) They use tri-linear interpolation when here the you seem to use hard binning.
Would someone more intimately familiar with HoG answer Jean? Thanks Stéfan
Hi Jean, First of all, I am not an expert regarding HoG… :-)
1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1].
Not sure why the original author of the implementation did use np.diff rather than central differences or even Sobel / Scharr and the like (apart from performance). It should return much better approximations of the gradient.
2) They use tri-linear interpolation when here the you seem to use hard binning.
The tri-linear interpolation seems to be the original approach, but I do not know of a simple way to implement it in pure Python in a fast way… I guess scipy.ndimage.map_coordinates might be very useful here. I think, these fixes would be both much appreciated!
Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest?
Yes, definitely. Johannes
Hi, Thank you for your answers :) @Johannes: For the tri-linear interpolation, you're absolutely right, and I spent a lot of time thinking about it. Eventually I thought of something: Let sx, sy be the size of the image, nbins the number of desired bins. First, we interpolate between the bins, from the original (sx, sy) image to a (sx, sy, nbins) array. Then we can notice that, inside each cell, we have pixels_per_cell_x * pixels_per_cell_y histograms, which position in the cell doesn't matter (because we are going to sum them up to have only one histogram per cell). We can thus virtually divide each cell in 4, each part being interpolated in the 4 diagonally adjacent sub-cells. As a result, each of the 4 sub-cell will be interpolated once in the same cell, and once in the 3 adjacent cells (which is exactly what interpolation is). The only thing to do is to multiply by the right coefficient. Here's an image to illustrate: We sum 4 times in the 4 diagonal directions. The coefficient for the sum can be represented by a single matrix which is turned. <https://lh3.googleusercontent.com/-F_jIzkHrTXI/UhUAfAwdrNI/AAAAAAAAACc/X-xmT...> Finally you just sum the histograms in each cell to obtain the (n_cells_x, n_cells_y, nbins) desired orientation_histogram (which you can further normalise block-wise). So I implemented a version using this trick, based on the original code, and the result seems to be fast for & 160*160 image. However, as I said, I'm not perfectly sure of the result. Also, I separated the gradient computation from the binning so that the function can also be used for HOF. Maybe I could do a pull request so you can have a look on the code? Cheers, Jean On Wednesday, 21 August 2013 08:06:56 UTC+1, Johannes Schönberger wrote:
Hi Jean,
First of all, I am not an expert regarding HoG… :-)
1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1].
Not sure why the original author of the implementation did use np.diff rather than central differences or even Sobel / Scharr and the like (apart from performance). It should return much better approximations of the gradient.
2) They use tri-linear interpolation when here the you seem to use hard binning.
The tri-linear interpolation seems to be the original approach, but I do not know of a simple way to implement it in pure Python in a fast way… I guess scipy.ndimage.map_coordinates might be very useful here.
I think, these fixes would be both much appreciated!
Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest?
Yes, definitely.
Johannes
Your ideas seem totally valid to me (if I understand correctly), but how about turning around the order of interpolation: 1. 2-D interpolation (x, y direction) 2. Interpolation in the 3rd dimension, which could then easily be implemented with array slicing ``for i, j in pixel_per_cell: magnitude[i::pixels_per_cell, j::pixels_per_cell] and orientation[i::pixels_per_cell, j::pixels_per_cell]``. This should be basically the same, but you save some memory as you do not the (sx, sy, nbins) intermediate array. It would be great if you could open a PR with your code, then we can discuss in there :-) Regards, Johannes Am 21.08.2013 um 20:04 schrieb Jean K <jean.kossaifi@gmail.com>:
Hi,
Thank you for your answers :)
@Johannes: For the tri-linear interpolation, you're absolutely right, and I spent a lot of time thinking about it.
Eventually I thought of something: Let sx, sy be the size of the image, nbins the number of desired bins. First, we interpolate between the bins, from the original (sx, sy) image to a (sx, sy, nbins) array. Then we can notice that, inside each cell, we have pixels_per_cell_x * pixels_per_cell_y histograms, which position in the cell doesn't matter (because we are going to sum them up to have only one histogram per cell). We can thus virtually divide each cell in 4, each part being interpolated in the 4 diagonally adjacent sub-cells. As a result, each of the 4 sub-cell will be interpolated once in the same cell, and once in the 3 adjacent cells (which is exactly what interpolation is). The only thing to do is to multiply by the right coefficient. Here's an image to illustrate: We sum 4 times in the 4 diagonal directions. The coefficient for the sum can be represented by a single matrix which is turned.
Finally you just sum the histograms in each cell to obtain the (n_cells_x, n_cells_y, nbins) desired orientation_histogram (which you can further normalise block-wise).
So I implemented a version using this trick, based on the original code, and the result seems to be fast for & 160*160 image. However, as I said, I'm not perfectly sure of the result.
Also, I separated the gradient computation from the binning so that the function can also be used for HOF.
Maybe I could do a pull request so you can have a look on the code?
Cheers,
Jean
On Wednesday, 21 August 2013 08:06:56 UTC+1, Johannes Schönberger wrote: Hi Jean,
First of all, I am not an expert regarding HoG… :-)
1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1].
Not sure why the original author of the implementation did use np.diff rather than central differences or even Sobel / Scharr and the like (apart from performance). It should return much better approximations of the gradient.
2) They use tri-linear interpolation when here the you seem to use hard binning.
The tri-linear interpolation seems to be the original approach, but I do not know of a simple way to implement it in pure Python in a fast way… I guess scipy.ndimage.map_coordinates might be very useful here.
I think, these fixes would be both much appreciated!
Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest?
Yes, definitely.
Johannes
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
My is accessible here : https://github.com/JeanKossaifi/scikit-image/tree/improve_hog I don't have my own computer so I couldn't run the tests yet, and there must be some issues: should I still do the pull request so we can discuss there? Also, I think the interpolation has to be done on the bins before, otherwise, when we sum the histograms in each cells, the orientation would get mixed... Regards, Jean On Wednesday, 21 August 2013 19:44:41 UTC+1, Johannes Schönberger wrote:
Your ideas seem totally valid to me (if I understand correctly), but how about turning around the order of interpolation:
1. 2-D interpolation (x, y direction) 2. Interpolation in the 3rd dimension, which could then easily be implemented with array slicing ``for i, j in pixel_per_cell: magnitude[i::pixels_per_cell, j::pixels_per_cell] and orientation[i::pixels_per_cell, j::pixels_per_cell]``.
This should be basically the same, but you save some memory as you do not the (sx, sy, nbins) intermediate array.
It would be great if you could open a PR with your code, then we can discuss in there :-)
Regards, Johannes
Am 21.08.2013 um 20:04 schrieb Jean K <jean.k...@gmail.com <javascript:>>:
Hi,
Thank you for your answers :)
@Johannes: For the tri-linear interpolation, you're absolutely right, and I spent a lot of time thinking about it.
Eventually I thought of something: Let sx, sy be the size of the image, nbins the number of desired bins. First, we interpolate between the bins, from the original (sx, sy) image to a (sx, sy, nbins) array. Then we can notice that, inside each cell, we have pixels_per_cell_x * pixels_per_cell_y histograms, which position in the cell doesn't matter (because we are going to sum them up to have only one histogram per cell). We can thus virtually divide each cell in 4, each part being interpolated in the 4 diagonally adjacent sub-cells. As a result, each of the 4 sub-cell will be interpolated once in the same cell, and once in the 3 adjacent cells (which is exactly what interpolation is). The only thing to do is to multiply by the right coefficient. Here's an image to illustrate: We sum 4 times in the 4 diagonal directions. The coefficient for the sum can be represented by a single matrix which is turned.
Finally you just sum the histograms in each cell to obtain the (n_cells_x, n_cells_y, nbins) desired orientation_histogram (which you can further normalise block-wise).
So I implemented a version using this trick, based on the original code, and the result seems to be fast for & 160*160 image. However, as I said, I'm not perfectly sure of the result.
Also, I separated the gradient computation from the binning so that the function can also be used for HOF.
Maybe I could do a pull request so you can have a look on the code?
Cheers,
Jean
On Wednesday, 21 August 2013 08:06:56 UTC+1, Johannes Schönberger wrote: Hi Jean,
First of all, I am not an expert regarding HoG… :-)
1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1].
Not sure why the original author of the implementation did use np.diff rather than central differences or even Sobel / Scharr and the like (apart from performance). It should return much better approximations of the gradient.
2) They use tri-linear interpolation when here the you seem to use hard binning.
The tri-linear interpolation seems to be the original approach, but I do not know of a simple way to implement it in pure Python in a fast way… I guess scipy.ndimage.map_coordinates might be very useful here.
I think, these fixes would be both much appreciated!
Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest?
Yes, definitely.
Johannes
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image...@googlegroups.com <javascript:>. For more options, visit https://groups.google.com/groups/opt_out.
It would be great if you could open a PR against your branch. Am 21.08.2013 um 23:59 schrieb Jean K <jean.kossaifi@gmail.com>:
My is accessible here : https://github.com/JeanKossaifi/scikit-image/tree/improve_hog I don't have my own computer so I couldn't run the tests yet, and there must be some issues: should I still do the pull request so we can discuss there?
Also, I think the interpolation has to be done on the bins before, otherwise, when we sum the histograms in each cells, the orientation would get mixed...
Regards,
Jean
On Wednesday, 21 August 2013 19:44:41 UTC+1, Johannes Schönberger wrote: Your ideas seem totally valid to me (if I understand correctly), but how about turning around the order of interpolation:
1. 2-D interpolation (x, y direction) 2. Interpolation in the 3rd dimension, which could then easily be implemented with array slicing ``for i, j in pixel_per_cell: magnitude[i::pixels_per_cell, j::pixels_per_cell] and orientation[i::pixels_per_cell, j::pixels_per_cell]``.
This should be basically the same, but you save some memory as you do not the (sx, sy, nbins) intermediate array.
It would be great if you could open a PR with your code, then we can discuss in there :-)
Regards, Johannes
Am 21.08.2013 um 20:04 schrieb Jean K <jean.k...@gmail.com>:
Hi,
Thank you for your answers :)
@Johannes: For the tri-linear interpolation, you're absolutely right, and I spent a lot of time thinking about it.
Eventually I thought of something: Let sx, sy be the size of the image, nbins the number of desired bins. First, we interpolate between the bins, from the original (sx, sy) image to a (sx, sy, nbins) array. Then we can notice that, inside each cell, we have pixels_per_cell_x * pixels_per_cell_y histograms, which position in the cell doesn't matter (because we are going to sum them up to have only one histogram per cell). We can thus virtually divide each cell in 4, each part being interpolated in the 4 diagonally adjacent sub-cells. As a result, each of the 4 sub-cell will be interpolated once in the same cell, and once in the 3 adjacent cells (which is exactly what interpolation is). The only thing to do is to multiply by the right coefficient. Here's an image to illustrate: We sum 4 times in the 4 diagonal directions. The coefficient for the sum can be represented by a single matrix which is turned.
Finally you just sum the histograms in each cell to obtain the (n_cells_x, n_cells_y, nbins) desired orientation_histogram (which you can further normalise block-wise).
So I implemented a version using this trick, based on the original code, and the result seems to be fast for & 160*160 image. However, as I said, I'm not perfectly sure of the result.
Also, I separated the gradient computation from the binning so that the function can also be used for HOF.
Maybe I could do a pull request so you can have a look on the code?
Cheers,
Jean
On Wednesday, 21 August 2013 08:06:56 UTC+1, Johannes Schönberger wrote: Hi Jean,
First of all, I am not an expert regarding HoG… :-)
1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1].
Not sure why the original author of the implementation did use np.diff rather than central differences or even Sobel / Scharr and the like (apart from performance). It should return much better approximations of the gradient.
2) They use tri-linear interpolation when here the you seem to use hard binning.
The tri-linear interpolation seems to be the original approach, but I do not know of a simple way to implement it in pure Python in a fast way… I guess scipy.ndimage.map_coordinates might be very useful here.
I think, these fixes would be both much appreciated!
Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest?
Yes, definitely.
Johannes
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Done: https://github.com/scikit-image/scikit-image/pull/703 Regards, Jean 2013/8/22 Johannes Schönberger <jsch@demuc.de>
It would be great if you could open a PR against your branch.
Am 21.08.2013 um 23:59 schrieb Jean K <jean.kossaifi@gmail.com>:
My is accessible here : https://github.com/JeanKossaifi/scikit-image/tree/improve_hog I don't have my own computer so I couldn't run the tests yet, and there must be some issues: should I still do the pull request so we can discuss there?
Also, I think the interpolation has to be done on the bins before, otherwise, when we sum the histograms in each cells, the orientation would get mixed...
Regards,
Jean
On Wednesday, 21 August 2013 19:44:41 UTC+1, Johannes Schönberger wrote: Your ideas seem totally valid to me (if I understand correctly), but how about turning around the order of interpolation:
1. 2-D interpolation (x, y direction) 2. Interpolation in the 3rd dimension, which could then easily be implemented with array slicing ``for i, j in pixel_per_cell: magnitude[i::pixels_per_cell, j::pixels_per_cell] and orientation[i::pixels_per_cell, j::pixels_per_cell]``.
This should be basically the same, but you save some memory as you do not the (sx, sy, nbins) intermediate array.
It would be great if you could open a PR with your code, then we can discuss in there :-)
Regards, Johannes
Am 21.08.2013 um 20:04 schrieb Jean K <jean.k...@gmail.com>:
Hi,
Thank you for your answers :)
@Johannes: For the tri-linear interpolation, you're absolutely right, and I spent a lot of time thinking about it.
Eventually I thought of something: Let sx, sy be the size of the image, nbins the number of desired bins. First, we interpolate between the bins, from the original (sx, sy) image to a (sx, sy, nbins) array. Then we can notice that, inside each cell, we have pixels_per_cell_x * pixels_per_cell_y histograms, which position in the cell doesn't matter (because we are going to sum them up to have only one histogram per cell). We can thus virtually divide each cell in 4, each part being interpolated in the 4 diagonally adjacent sub-cells. As a result, each of the 4 sub-cell will be interpolated once in the same cell, and once in the 3 adjacent cells (which is exactly what interpolation is). The only thing to do is to multiply by the right coefficient. Here's an image to illustrate: We sum 4 times in the 4 diagonal directions. The coefficient for the sum can be represented by a single matrix which is turned.
Finally you just sum the histograms in each cell to obtain the (n_cells_x, n_cells_y, nbins) desired orientation_histogram (which you can further normalise block-wise).
So I implemented a version using this trick, based on the original code, and the result seems to be fast for & 160*160 image. However, as I said, I'm not perfectly sure of the result.
Also, I separated the gradient computation from the binning so that the function can also be used for HOF.
Maybe I could do a pull request so you can have a look on the code?
Cheers,
Jean
On Wednesday, 21 August 2013 08:06:56 UTC+1, Johannes Schönberger wrote: Hi Jean,
First of all, I am not an expert regarding HoG… :-)
1) the way of computing the gradients ( if I'm not mistaking, you use a [-1, 1] filter when they use a centered one [-1, 0, 1].
Not sure why the original author of the implementation did use np.diff rather than central differences or even Sobel / Scharr and the like (apart from performance). It should return much better approximations of the gradient.
2) They use tri-linear interpolation when here the you seem to use hard binning.
The tri-linear interpolation seems to be the original approach, but I do not know of a simple way to implement it in pure Python in a fast way… I guess scipy.ndimage.map_coordinates might be very useful here.
I think, these fixes would be both much appreciated!
Also, I tried to write another version, trying to stick as much as possible to Dalal&Triggs version, although I don't really know how to assess the results it produces. Would that be of interest?
Yes, definitely.
Johannes
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
-- You received this message because you are subscribed to a topic in the Google Groups "scikit-image" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/scikit-image/NsM7xrWSzfI/unsubscribe. To unsubscribe from this group and all of its topics, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
participants (3)
-
Jean K
-
Johannes Schönberger
-
Stéfan van der Walt