On Wed, Oct 26, 2011 at 6:40 AM, Thouis Jones <thouis@gmail.com> wrote:
On Wed, Oct 26, 2011 at 11:19, Andreas Mueller <amueller@ais.uni-bonn.de> wrote:
You're right. Maybe there is no way to avoid having the user create arrays with unexpected types. I think if we check the ranges and throw errors, the users should get the idea.
Where, besides file input and output, does the scikit have algorithmic assumptions about images having a particular data format or range? How many of these can be wrapped in such a way that there are no assumptions about input range (i.e., by prescaling min/max to [0,1] and postscaling back to the original range)?
The problem is that you can only scale if you know the lower and upper bounds of values; in our case, we choose to find that information through a convention: uint8 -> [0, 255]; float -> [0,1]; etc. Now, consider the following routine: def add_half(x): return x + 0.5 That looks innocent enough, but it has at least two subtle problems: 1) It has an entirely different effect on float and int images (both because of the relative magnitude, and because of data-types) and 2) It may take an integer image as input and return a float image. These are very differently interpreted by, for example, display routines! After our previous rounds of discussions, we came up with the following policy: 1) Functions should take any input, as far as possible 2) Functions may return output in any format, as long as it's documented This allows the user to build long pipelines without caring about data-types, e.g.: display_image = img_as_ubyte(func1(func2(input_image))) To support this, functions have to convert the input arguments appropriately, so the correct way to write add_half would be: def add_half(x): return img_as_float(x) + 0.5 Now, the error messages in the dtype conversion functions may still be improved a lot. We may use them to guide users to the problem, e.g. telling them why there is precision loss, or why we do not convert from int32 to float. Apart from floating point dtypes, all others constrain the values they contain naturally. So, for float, we can consider ensuring that no abs-values are greater than one. Stéfan