Mailman 3 RFR: IO cleanup - scikit-image

RFR: IO cleanup

Stéfan van der Walt

5 Nov 2010 5 Nov '10

1:10 p.m.

Hi all, At the SciPy sprint (ages ago), Chris suggested that we clean up the image loading API. The following changes simplify `imread` by removing the `dtype` argument. This allows for simpler plugins. Because `as_grey` is such a commonly used parameter, it is left in place, but now works differently: after the image is loaded, the (new) rgb2grey is simply performed on it. Please have a look at https://github.com/stefanv/scikits.image/compare/master...io_cleanup Regards Stéfan

Show replies by date

Chris Colbert

5 Nov 5 Nov

1:29 p.m.

Did we decided on a standard for which dtypes we will support? Some of the color conversions can be sped up in Cython with a lut (ala scivi), but we would need to write one func for each dtype. I am working on a webinar for image processing and am getting back into using th scikit, so there are some things I'm seeing during use that I would like to improve, but would require knowledge of what dtypes to support. Chris 2010/11/5 Stéfan van der Walt <stefan@sun.ac.za>

...

Hi all,

At the SciPy sprint (ages ago), Chris suggested that we clean up the image loading API. The following changes simplify `imread` by removing the `dtype` argument. This allows for simpler plugins. Because `as_grey` is such a commonly used parameter, it is left in place, but now works differently: after the image is loaded, the (new) rgb2grey is simply performed on it.

Please have a look at

https://github.com/stefanv/scikits.image/compare/master...io_cleanup

Regards Stéfan

Chris Colbert

1:38 p.m.

btw, all your changes look good. On Fri, Nov 5, 2010 at 9:29 AM, Chris Colbert <sccolbert@gmail.com> wrote:

...

Did we decided on a standard for which dtypes we will support?

Some of the color conversions can be sped up in Cython with a lut (ala scivi), but we would need to write one func for each dtype.

I am working on a webinar for image processing and am getting back into using th scikit, so there are some things I'm seeing during use that I would like to improve, but would require knowledge of what dtypes to support.

Chris

2010/11/5 Stéfan van der Walt <stefan@sun.ac.za>

Hi all,

...
At the SciPy sprint (ages ago), Chris suggested that we clean up the image loading API. The following changes simplify `imread` by removing the `dtype` argument. This allows for simpler plugins. Because `as_grey` is such a commonly used parameter, it is left in place, but now works differently: after the image is loaded, the (new) rgb2grey is simply performed on it.

Please have a look at

https://github.com/stefanv/scikits.image/compare/master...io_cleanup

Regards Stéfan

Maël Primet

5:27 p.m.

Yes, and also it would be good if we'd specify clearly the correct way to treat image arrays: eg. how to load properly an array in the good type? should the algorithms automatically convert an array to the algorithm's required type, or should the algorithm fail if the data is not of the correct type (some algorithms have no reason to be run on integers / or real -valued data, for instance). there should be an utility function that do the transfer with as little copying as possible, yet it should be possible to force copy when the algorithms request it how do we transfer the arrays to C code ? should we transfer the array directly ? only the memory pointer and the size as is done now ? (but then we cannot call python code from C code, to do for instance an FFT ) How do we ensure the array is contiguous with as few copies as necessary ? How do we force a copy to be contiguous ? etc and also: what is the proper way to raise an exception from C code, etc. if you have any idea, it'd be welcome! On Fri, Nov 5, 2010 at 14:38, Chris Colbert <sccolbert@gmail.com> wrote:

...

btw, all your changes look good.

On Fri, Nov 5, 2010 at 9:29 AM, Chris Colbert <sccolbert@gmail.com> wrote:

...
Did we decided on a standard for which dtypes we will support?

Some of the color conversions can be sped up in Cython with a lut (ala scivi), but we would need to write one func for each dtype.

I am working on a webinar for image processing and am getting back into using th scikit, so there are some things I'm seeing during use that I would like to improve, but would require knowledge of what dtypes to support.

Chris

2010/11/5 Stéfan van der Walt <stefan@sun.ac.za>

Hi all,

...
At the SciPy sprint (ages ago), Chris suggested that we clean up the image loading API. The following changes simplify `imread` by removing the `dtype` argument. This allows for simpler plugins. Because `as_grey` is such a commonly used parameter, it is left in place, but now works differently: after the image is loaded, the (new) rgb2grey is simply performed on it.

Please have a look at

https://github.com/stefanv/scikits.image/compare/master...io_cleanup

Regards Stéfan

Chris Colbert

5:32 p.m.

I think we should be aiming for as little C-code as possible in preference for Cython, escpecially when you start talking about raising exceptions. By using Cython, all of those questions you have are solved. On Fri, Nov 5, 2010 at 1:27 PM, Maël Primet <mael.primet@gmail.com> wrote:

...

Yes, and also it would be good if we'd specify clearly the correct way to treat image arrays: eg. how to load properly an array in the good type? should the algorithms automatically convert an array to the algorithm's required type, or should the algorithm fail if the data is not of the correct type (some algorithms have no reason to be run on integers / or real -valued data, for instance). there should be an utility function that do the transfer with as little copying as possible, yet it should be possible to force copy when the algorithms request it how do we transfer the arrays to C code ? should we transfer the array directly ? only the memory pointer and the size as is done now ? (but then we cannot call python code from C code, to do for instance an FFT ) How do we ensure the array is contiguous with as few copies as necessary ? How do we force a copy to be contiguous ? etc

and also: what is the proper way to raise an exception from C code, etc.

if you have any idea, it'd be welcome!

On Fri, Nov 5, 2010 at 14:38, Chris Colbert <sccolbert@gmail.com> wrote:

...
btw, all your changes look good.

On Fri, Nov 5, 2010 at 9:29 AM, Chris Colbert <sccolbert@gmail.com>wrote:

...
Did we decided on a standard for which dtypes we will support?

Some of the color conversions can be sped up in Cython with a lut (ala scivi), but we would need to write one func for each dtype.

I am working on a webinar for image processing and am getting back into using th scikit, so there are some things I'm seeing during use that I would like to improve, but would require knowledge of what dtypes to support.

Chris

2010/11/5 Stéfan van der Walt <stefan@sun.ac.za>

Hi all,

...
At the SciPy sprint (ages ago), Chris suggested that we clean up the image loading API. The following changes simplify `imread` by removing the `dtype` argument. This allows for simpler plugins. Because `as_grey` is such a commonly used parameter, it is left in place, but now works differently: after the image is loaded, the (new) rgb2grey is simply performed on it.

Please have a look at

https://github.com/stefanv/scikits.image/compare/master...io_cleanup

Regards Stéfan

Chris Colbert

5:33 p.m.

In fact mael, I plan on coverting most of your C code to Cython before pushing it. On Fri, Nov 5, 2010 at 1:32 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...

I think we should be aiming for as little C-code as possible in preference for Cython, escpecially when you start talking about raising exceptions. By using Cython, all of those questions you have are solved.

On Fri, Nov 5, 2010 at 1:27 PM, Maël Primet <mael.primet@gmail.com> wrote:

...
Yes, and also it would be good if we'd specify clearly the correct way to treat image arrays: eg. how to load properly an array in the good type? should the algorithms automatically convert an array to the algorithm's required type, or should the algorithm fail if the data is not of the correct type (some algorithms have no reason to be run on integers / or real -valued data, for instance). there should be an utility function that do the transfer with as little copying as possible, yet it should be possible to force copy when the algorithms request it how do we transfer the arrays to C code ? should we transfer the array directly ? only the memory pointer and the size as is done now ? (but then we cannot call python code from C code, to do for instance an FFT ) How do we ensure the array is contiguous with as few copies as necessary ? How do we force a copy to be contiguous ? etc

and also: what is the proper way to raise an exception from C code, etc.

if you have any idea, it'd be welcome!

On Fri, Nov 5, 2010 at 14:38, Chris Colbert <sccolbert@gmail.com> wrote:

...
btw, all your changes look good.

On Fri, Nov 5, 2010 at 9:29 AM, Chris Colbert <sccolbert@gmail.com>wrote:

...
Did we decided on a standard for which dtypes we will support?

Some of the color conversions can be sped up in Cython with a lut (ala scivi), but we would need to write one func for each dtype.

I am working on a webinar for image processing and am getting back into using th scikit, so there are some things I'm seeing during use that I would like to improve, but would require knowledge of what dtypes to support.

Chris

2010/11/5 Stéfan van der Walt <stefan@sun.ac.za>

Hi all,

...
At the SciPy sprint (ages ago), Chris suggested that we clean up the image loading API. The following changes simplify `imread` by removing the `dtype` argument. This allows for simpler plugins. Because `as_grey` is such a commonly used parameter, it is left in place, but now works differently: after the image is loaded, the (new) rgb2grey is simply performed on it.

Please have a look at

https://github.com/stefanv/scikits.image/compare/master...io_cleanup

Regards Stéfan

Maël Primet

6:02 p.m.

Well, perhaps we should talk about this a little more, because there are many aspects: - of course, Cython has a more clean feeling - however, it is still not as efficient as C (for some of the code I published, I tried the Cython version first, then switched to C when I saw that it was too slow) - some older user of the library, less able with python, might still want to develop C code and bind it if there is an easy way to do it - it is interesting to have some simple C core code, like a function that takes arrays as double * and (w, h) integers, because it enables easy reuse of the core of the algorithm for some other projects (without having to include the whole library) if someone only wants to extract one or two algorithm (this might be interesting when we develop research code and some private company want to use the algorithm )

Chris Colbert

6:28 p.m.

see inline comments: On Fri, Nov 5, 2010 at 2:02 PM, Maël Primet <mael.primet@gmail.com> wrote:

...

Well,

perhaps we should talk about this a little more, because there are many aspects: - of course, Cython has a more clean feeling - however, it is still not as efficient as C (for some of the code I published, I tried the Cython version first, then switched to C when I saw that it was too slow)

I suspect this has more to do with how you've written the Cython versus the speed of Cython vs C. Cython is *very* fast when properly used.

...

- some older user of the library, less able with python, might still want to

...

develop C code and bind it if there is an easy way to do it

Since our data structure is a numpy array, manipulating that pointer in C would take an awful lot of knowledge about NumPy internals. Raising an exception even moreso. It would be less effort for said person to just learn Cython.

...

- it is interesting to have some simple C core code, like a function that takes arrays as double * and (w, h) integers, because it enables easy reuse of the core of the algorithm for some other projects (without having to include the whole library) if someone only wants to extract one or two algorithm (this might be interesting when we develop research code and some private company want to use the algorithm )

There is nothing stopping you from doing that in your own personal library. But seeing as how we creating a library for image processing in Python, I think we should use the best available tools for Python, thus making things most accessible and maintainable to our target audience.

Maël Primet

6:37 p.m.

...

see inline comments:

On Fri, Nov 5, 2010 at 2:02 PM, Maël Primet <mael.primet@gmail.com> wrote:

...
Well,

perhaps we should talk about this a little more, because there are many aspects: - of course, Cython has a more clean feeling - however, it is still not as efficient as C (for some of the code I published, I tried the Cython version first, then switched to C when I saw that it was too slow)

I suspect this has more to do with how you've written the Cython versus

...

speed of Cython vs C. Cython is *very* fast when properly used.

...
- some older user of the library, less able with python, might still want to

...
develop C code and bind it if there is an easy way to do it

Since our data structure is a numpy array, manipulating that pointer in C would take an awful lot of knowledge about NumPy internals. Raising an exception even moreso. It would be less effort for said person to just learn Cython.

...
- it is interesting to have some simple C core code, like a function that takes arrays as double * and (w, h) integers, because it enables easy reuse of the core of the algorithm for some other projects (without having to include the whole library) if someone only wants to extract one or two algorithm (this might be interesting when we develop research code and some private company want to use the algorithm )

There is nothing stopping you from doing that in your own personal

...

But seeing as how we creating a library for image processing in Python, I think we should use the best available tools for Python, thus making

I fully understand this, and am willing to try developing cython code, but keep in mind that the real goal is more to have a widely used library than the most pythonic one (eg the most important thing is the community of users) and having talked to several researchers, they do like C. the library. things

...

most accessible and maintainable to our target audience.

Chris Colbert

6:48 p.m.

On Fri, Nov 5, 2010 at 2:37 PM, Maël Primet <mael.primet@gmail.com> wrote:

...

I fully understand this, and am willing to try developing cython code, but keep in mind that the real goal is more to have a widely used library than the most pythonic one (eg the most important thing is the community of users) and *having talked to several researchers, they do like C.*

That's cause they haven't learned the power of the Python/Cython combo yet ;)

Stéfan van der Walt

6 Nov 6 Nov

10:20 p.m.

On Fri, Nov 5, 2010 at 8:37 PM, Maël Primet <mael.primet@gmail.com> wrote:

...

I fully understand this, and am willing to try developing cython code, but keep in mind that the real goal is more to have a widely used library than the most pythonic one (eg the most important thing is the community of users) and having talked to several researchers, they do like C.

We can significantly reduce the number of lines of code and write much higher-level code using Cython. This improves the maintenance overhead--an important factor with so few contributors. This was one of the first design choices we made for the scikit, and it's worked pretty well so far. We can always re-examine the situation in the future, but let's do so when there is a really compelling reason to. Regards Stéfan

Stéfan van der Walt

10:41 p.m.

On Fri, Nov 5, 2010 at 3:29 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...

Did we decided on a standard for which dtypes we will support?

IIRC, we said we'd write utility functions to convert from whatever input is received to either float64 or int16 types and go from there. The output type is whatever is most convenient for the algorithm, and should be well documented. Can you recall the different approaches discussed at the sprint? Cheers Stéfan

Chris Colbert

7 Nov 7 Nov

1:14 a.m.

i think we should definitely add uint8 support for algorithms. I think float64, int16, and uint8 versions of each algorithm would be a good compromise. 2010/11/6 Stéfan van der Walt <stefan@sun.ac.za>

...

On Fri, Nov 5, 2010 at 3:29 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...
Did we decided on a standard for which dtypes we will support?

IIRC, we said we'd write utility functions to convert from whatever input is received to either float64 or int16 types and go from there. The output type is whatever is most convenient for the algorithm, and should be well documented.

Can you recall the different approaches discussed at the sprint?

Cheers Stéfan

James Turner

2:19 p.m.

Just an informational comment on this (I'm not involved enough to expect a "vote", but I can probably speak for a segment of the community): In astronomy we mainly use float32 (as well as int16). That's usually precise enough and we work with quite large images (eg. 100Mb/file times N), so the additional storage requirements of float64 would be significant. From my viewpoint, it's good for any processing step to preserve the data type of the input unless there's a specification to the contrary. Maybe one can cast the output back to 32 though, as long as there wasn't a division by very small values (that does happen sometimes). Cheers, James. On 06/11/10 22:14, Chris Colbert wrote:

...

i think we should definitely add uint8 support for algorithms.

I think float64, int16, and uint8 versions of each algorithm would be a good compromise.

2010/11/6 StÃ¯Â¿Â½fan van der Walt <stefan@sun.ac.za <mailto:stefan@sun.ac.za>>

On Fri, Nov 5, 2010 at 3:29 PM, Chris Colbert <sccolbert@gmail.com <mailto:sccolbert@gmail.com>> wrote: > Did we decided on a standard for which dtypes we will support?

IIRC, we said we'd write utility functions to convert from whatever input is received to either float64 or int16 types and go from there. The output type is whatever is most convenient for the algorithm, and should be well documented.

Can you recall the different approaches discussed at the sprint?

Cheers StÃ¯Â¿Â½fan

Maël Primet

5:32 p.m.

I agree with previous post, usually, uint8 makes sense because it is the "usual image format", uint32 allows to handle image of labels (where there might be more than 255 labels), I don't really see an use for uint16 from what I've experienced (we might convert them to uint32, I'm not sure the x2 memory loss might be too problematic here), and float32 is often used. I more rarely use float64, except sometime inside an algorithm (rather than in input/output images). For some algorithms, it makes sense to have a uint8 version, for some it doesn't. I'd say we should let the user make the conversions from/to the algorithm intended format himself, so he knows that the algorithm isn't intended for its original data format, and he takes special care in understanding why (rather than using an inapropriate algorithm and not worrying about the possible effects) but we should clearly have conversion routines, that might also ensure that arrays are contiguous (to speed-up C/ Cython ), and possibly to ensure that we copy the array so we can modify it

Chris Colbert

7:24 p.m.

The biggest reason for converging on a set of supported dtypes is for algorithms which are "by-and-large" dtype agnostic. e.g. color conversions, morphology, etc... When implementing such things in Cython, you have to make a separate function for each dtype you wish to support, then dispatch appropriately. Of course there will be cases where an algorithm expects/only operates on a specific dtype, and in those cases we can copy/cast. But I still think there should be an "official" set of supported dtypes. I think the reason float64 was chosen over float32 is because float64 is numpy's default floating point dtype. We could just as well use float32 in its stead if memory issues are of concern. I am agnostic on whether the our official float type is 32 or 64. On Sun, Nov 7, 2010 at 12:32 PM, Maël Primet <mael.primet@gmail.com> wrote:

...

I agree with previous post,

usually, uint8 makes sense because it is the "usual image format", uint32 allows to handle image of labels (where there might be more than 255 labels), I don't really see an use for uint16 from what I've experienced (we might convert them to uint32, I'm not sure the x2 memory loss might be too problematic here), and float32 is often used. I more rarely use float64, except sometime inside an algorithm (rather than in input/output images).

For some algorithms, it makes sense to have a uint8 version, for some it doesn't. I'd say we should let the user make the conversions from/to the algorithm intended format himself, so he knows that the algorithm isn't intended for its original data format, and he takes special care in understanding why (rather than using an inapropriate algorithm and not worrying about the possible effects)

but we should clearly have conversion routines, that might also ensure that arrays are contiguous (to speed-up C/ Cython ), and possibly to ensure that we copy the array so we can modify it

Chris Colbert

7:24 p.m.

oh, and int16 is useful when doing derivative filtering on uint8 images, but could as well be int32. On Sun, Nov 7, 2010 at 2:24 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...

The biggest reason for converging on a set of supported dtypes is for algorithms which are "by-and-large" dtype agnostic. e.g. color conversions, morphology, etc...

When implementing such things in Cython, you have to make a separate function for each dtype you wish to support, then dispatch appropriately.

Of course there will be cases where an algorithm expects/only operates on a specific dtype, and in those cases we can copy/cast. But I still think there should be an "official" set of supported dtypes. I think the reason float64 was chosen over float32 is because float64 is numpy's default floating point dtype. We could just as well use float32 in its stead if memory issues are of concern. I am agnostic on whether the our official float type is 32 or 64.

On Sun, Nov 7, 2010 at 12:32 PM, Maël Primet <mael.primet@gmail.com>wrote:

...
I agree with previous post,

usually, uint8 makes sense because it is the "usual image format", uint32 allows to handle image of labels (where there might be more than 255 labels), I don't really see an use for uint16 from what I've experienced (we might convert them to uint32, I'm not sure the x2 memory loss might be too problematic here), and float32 is often used. I more rarely use float64, except sometime inside an algorithm (rather than in input/output images).

For some algorithms, it makes sense to have a uint8 version, for some it doesn't. I'd say we should let the user make the conversions from/to the algorithm intended format himself, so he knows that the algorithm isn't intended for its original data format, and he takes special care in understanding why (rather than using an inapropriate algorithm and not worrying about the possible effects)

but we should clearly have conversion routines, that might also ensure that arrays are contiguous (to speed-up C/ Cython ), and possibly to ensure that we copy the array so we can modify it

Stéfan van der Walt

9:39 p.m.

On Sun, Nov 7, 2010 at 9:24 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...

oh, and int16 is useful when doing derivative filtering on uint8 images, but could as well be int32.

What is the current status of templating in Cython? It's awkward making the precision/memory trade-off on behalf of the user. But it's still better than not having algorithms :) Cheers Stéfan

Chris Colbert

9:47 p.m.

2010/11/7 Stéfan van der Walt <stefan@sun.ac.za>

...

On Sun, Nov 7, 2010 at 9:24 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...
oh, and int16 is useful when doing derivative filtering on uint8 images, but could as well be int32.

What is the current status of templating in Cython?

You got me. I have no idea....

Stéfan van der Walt

9 Nov 9 Nov

9:08 a.m.

On Sun, Nov 7, 2010 at 11:47 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...

...
What is the current status of templating in Cython?

You got me. I have no idea....

I saw the following link in one of Robert Kern's messages: http://pythonpaste.org/tempita/ Would introducing a simple templating engine into the compile chain solve our problem? Regards Stéfan

Chris Colbert

2:27 p.m.

Took a quick look at it, it looks really interesting. I will give a try when I start rewriting the scivi code. 2010/11/9 Stéfan van der Walt <stefan@sun.ac.za>

...

On Sun, Nov 7, 2010 at 11:47 PM, Chris Colbert <sccolbert@gmail.com> wrote:

...
...
What is the current status of templating in Cython?

You got me. I have no idea....

I saw the following link in one of Robert Kern's messages:

http://pythonpaste.org/tempita/

Would introducing a simple templating engine into the compile chain solve our problem?

Regards Stéfan

5154

Age (days ago)

5158

Last active (days ago)

List overview

Download

20 comments

4 participants

participants (4)

Chris Colbert
James Turner
Maël Primet
Stéfan van der Walt

RFR: IO cleanup

Maël Primet

Maël Primet

Maël Primet

Maël Primet

tags

participants (4)