Mailman 3 October 2011 - scikit-image

documentation marathon :-)
by Emmanuelle Gouillart 02 Nov '11

02 Nov '11

Hello, as I was browsing through the scikit, I started to make a list of functions for which we don't have examples (neither in their docstring, nor an example script for the documentation/example gallery). I copy it here, since it seems a priority to me that we have more examples in the documentation. I was thinking of writing an example of edge detection wih canny and sobel, and completing their docstrings while I'm at this. Any other volunteer to make this list shrink :-)? Cheers, Emmanuelle ------------ feature: hog filter: median_filter, canny, sobel, prewitt, wiener graph: shortest_path, MCP morphology: * structuring elements in selem.py don't have examples in docstrings. * In grey.py, Tony added recently examples in all the docstrings -- it'd be great to have also a few examples for the gallery * label doesn't have examples transform: probabilistic_hough, homography, integral_image, integrate radon and iradon do not have examples in docstrings (but they do have a nice example in the gallery)

3 3

Pull requests for today
by Stéfan van der Walt 27 Oct '11

27 Oct '11

Hi all, As you can see, I've been very distracted from work this evening :) Binary convex hull goodness: https://github.com/scikits-image/scikits-image/pull/75 Super-fast re-building of the package for developers, using MD5 hashes of Python files: https://github.com/scikits-image/scikits-image/pull/74 Coin dataset--almost as good as the real thing: https://github.com/scikits-image/scikits-image/pull/76 Please have a look, and let me have your critical yet witty feedback on GitHub :) Cheers Stéfan

1 0

Re: Image data type ranges
by Andreas Mueller 26 Oct '11

26 Oct '11

On 10/26/2011 07:07 PM, StÃ¯Â¿Â½fan van der Walt wrote: > Apart from floating point dtypes, all others constrain the values they > contain naturally. So, for float, we can consider ensuring that no > abs-values are greater than one. StÃ¯Â¿Â½fan Why abs-values? Shouldn't the values be >0 any way?

2 1

Re: Roadmap to 0.4
by Andreas Mueller 26 Oct '11

26 Oct '11

On 10/26/2011 08:22 PM, StÃ¯Â¿Â½fan van der Walt wrote: > Hi Andy, > > Sorry for the late reply. No problem. As you might have noticed I haven't really had the time to do anything on scikits-image lately :( > Usually, this will require you to write code that can only be run on > an opencv system, but with the plugin system you can easily share that > code with colleagues who do not have opencv installed: > > use_backend('my_opencv_routines') > output = sobel(input) > > If the opencv backend is not found, the backend gracefully falls back > to the NumPy / Cython implementation. I do see your point. On the other hand, if someone found code that was "way better" than the one in scikits-image, I would rather hope people would integrate it. Making the backends easy to substitute might encourage bad scikits-image code.. or rather encourage not changing it. The above obviously does not apply to OpenCL and Cuda implementations. Then again, it _might_ be a good idea to include OpenCL or Cuda functions in directly into scikits-image. I, personally, would love to have a library with gpu based vision algorithms (except opencv). If scikits-image is a good place for this is certainly debatable. Cheers, Andy

1 0

Re: Image data type ranges
by Andreas Mueller 26 Oct '11

26 Oct '11

Apart from floating point dtypes, all others constrain the values they contain naturally. So, for float, we can consider ensuring that no abs-values are greater than one. StÃ¯Â¿Â½fan

1 0

Re: Roadmap to 0.4
by Andreas Mueller 26 Oct '11

26 Oct '11

On 10/25/2011 02:25 AM, StÃ¯Â¿Â½fan van der Walt wrote: > 2011/10/23 StÃ¯Â¿Â½fan van der Walt<stefan(a)sun.ac.za>: >> - Support for multiple computation back-ends > Would anyone like to comment on this issue? > > StÃ¯Â¿Â½fan Could you give an example on how you picture this? I think I get the general idea but I'm not sure how you would implement it. You want the user to provide some other computational backend that is somehow plugged into scikits-image, as far as i understand. Also: Do you think the easy switching between the implementations is a good enough reason for the additional complexity? I often use CUDA functions in my programs but they are usually just wrapped in some python package. I never really saw the need to fall back on other implementations once I included them. Why is that important? Don't get me wrong, I'm not as opposed as it might sound, I'm just trying to figure the idea out ;) Andy

2 1

Re: Image data type ranges
by Stéfan van der Walt 26 Oct '11

26 Oct '11

On Wed, Oct 26, 2011 at 6:40 AM, Thouis Jones <thouis(a)gmail.com> wrote: > On Wed, Oct 26, 2011 at 11:19, Andreas Mueller <amueller(a)ais.uni-bonn.de> wrote: >> You're right. Maybe there is no way to avoid having the user create >> arrays with unexpected types. >> I think if we check the ranges and throw errors, the users >> should get the idea. > > Where, besides file input and output, does the scikit have algorithmic > assumptions about images having a particular data format or range? > How many of these can be wrapped in such a way that there are no > assumptions about input range (i.e., by prescaling min/max to [0,1] > and postscaling back to the original range)? The problem is that you can only scale if you know the lower and upper bounds of values; in our case, we choose to find that information through a convention: uint8 -> [0, 255]; float -> [0,1]; etc. Now, consider the following routine: def add_half(x): return x + 0.5 That looks innocent enough, but it has at least two subtle problems: 1) It has an entirely different effect on float and int images (both because of the relative magnitude, and because of data-types) and 2) It may take an integer image as input and return a float image. These are very differently interpreted by, for example, display routines! After our previous rounds of discussions, we came up with the following policy: 1) Functions should take any input, as far as possible 2) Functions may return output in any format, as long as it's documented This allows the user to build long pipelines without caring about data-types, e.g.: display_image = img_as_ubyte(func1(func2(input_image))) To support this, functions have to convert the input arguments appropriately, so the correct way to write add_half would be: def add_half(x): return img_as_float(x) + 0.5 Now, the error messages in the dtype conversion functions may still be improved a lot. We may use them to guide users to the problem, e.g. telling them why there is precision loss, or why we do not convert from int32 to float. Apart from floating point dtypes, all others constrain the values they contain naturally. So, for float, we can consider ensuring that no abs-values are greater than one. Stéfan

1 0

Re: Image data type ranges
by Andreas Mueller 26 Oct '11

26 Oct '11

> Where, besides file input and output, does the scikit have algorithmic > assumptions about images having a particular data format or range? I think most of the algorithms now start with a conversion to float, which makes assumptions. > How many of these can be wrapped in such a way that there are no > assumptions about input range (i.e., by prescaling min/max to [0,1] > and postscaling back to the original range)? > Prescaling between zero and one might distort the image heavily if the image did not have the full range, for example if it is mostly white and has no black. What do you mean by postscaling? Algorithms usually don't return an image. Even if an algorithm returns an image, your method would only work correctly if the output depends linearly on the input. Cheers, Andy

1 0

Re: Image data type ranges
by Andreas Mueller 26 Oct '11

26 Oct '11

Am 26.10.2011 12:51, schrieb Neil Yager: >> I do not think that using an np.arange(n) is a reasonable/common >> to do by the way. What is the expected behavior? > I've seen it used for demo/testing for quickly creating an array with > a range of values (it is being used in a unit test). In the context of > this discussion, it is just an example of a way a user may end up with > an array of int32s without really thinking about it, thereby getting > themselves into trouble. > So what do you suggest on the int front? > The core issue is to make sure that users know the assumed range for > floats. You're right. Maybe there is no way to avoid having the user create arrays with unexpected types. I think if we check the ranges and throw errors, the users should get the idea. Cheers, Andy

2 1

Re: Image data type ranges
by Andreas Mueller 26 Oct '11

26 Oct '11

Am 26.10.2011 10:51, schrieb Neil Yager: > I was having conversation about data types with StÃ¯Â¿Â½fan in the line > comments of a PR, and I thought I should move it here so others can > benefit from his explanations as well. > > Being new to the project, I didn't appreciate the intricacies of data > typing. After working with images for some time. This still annoys me. And I don't know of a library having a good solution. OpenCV for example is a mess. I think this is a very important topic since it influences usability a lot. > For example, I was surprised to see that this raises a > ValueError: > >>>> skimage.img_as_float(np.arange(9).reshape((3, 3))) > The problem is that the default dtype of np.arange is int32, which > isn't supported by skimage, so img_as_float doesn't know how to scale > it to [0, 1]. Perhaps it is correct to fail, as it will force the user > to consider the data type issue. However, it does seem like a > reasonable/common thing to want to do. > We briefly discussed this issue on the list and Stefan thought it would be good to make the user think about what they want to achieve. I find this not completely satisfying but I could not come up with a better solution. I do not think that using an np.arange(n) is a reasonable/common to do by the way. What is the expected behavior? By definition, the output can be in any range. If you fix any range, either you'll get out of it for large n or you'll see nothing for small n. Maybe the most reasonable thing would be to expect that img_as_float(np.arange(n)) always returns something with minimum 0 and maximum 1. The only way to achieve that would be to determine the range of an int image by taking the max, each time you use it. This of course would lead to unexpected behavior in other places. So I'm not sure if it actually makes things better. We'd have to be careful. > A related, but different, issue is the following: > >>>> x = np.arange(9, dtype=np.uint8).reshape((3, 3)) >>>> x > array([[0, 1, 2], > [3, 4, 5], > [6, 7, 8]], dtype=uint8) >>>> y = skimage.img_as_ubyte(x.astype(np.float32)) > WARNING:dtype_converter:Possible precision loss, converting from > float32 to uint8 >>>> y > array([[ 0, 255, 254], > [253, 252, 251], > [250, 249, 248]], dtype=uint8) > I think this is perfectly fine. You used "astype". That's evil! > The problem here is that the input to img_as_ubyte violates skimage's > assumption that floating point images have the range [0, 1], leading > to an unexpected result (at least for a beginner). There is a warning, > but that's for a different problem. Should img_as_ubyte, img_as_float, > etc. check and enforce ranges? Or raise warnings? Any thoughts? Maybe we can check whether the upper bound is satisfied. That probably wouldn't hurt much if we convert any way. Also, we should stress in the (at the moment not really existing) user guide, that users should NEVER EVER use "astype" on an image, since that violates all our assumptions. Cheers, Andy

2 1