Often with these sort of discussions I end up taking the opposite side regardless of my actual position to make sure we look at all of the important aspects critically.  Historically I've been very positive on 3D+ support even before Juan joined the party, but I'm going to examine this from a pragmatic angle.

As someone who works in medical imaging, I am generally very keen on having 3D implementations (and, in a perfect world, these would support anisotropic spacing along at least one dimension because slice thicknesses are very often not the same as the in-plane spacing and reslicing volumes to anisotropic is expensive and tricky in terms of interpolation used).  Time isn't a true 4th dimension, and the vast majority of data above 3D is multispectral, rather than volumetric in higher dimensions.  I think for the majority of users, 3D anisotropic implementations would be sufficient; how many people come to us with actual 4D+ data?  When we're talking N-D from an algorithmic standpoint, the data needs to have N spatial dimensions for this abstraction to make real sense.  I think this is actually quite rare above N=3, though I could be wrong (I'm interested in any examples you all are aware of which use 4D+ spatial data).  The reason I bring this up is because if we hypothetically set the bar at some point, say 3D, we have a lot more options than for N-D in terms of clean JIT and maintainer friendly code.

This is because there is actually a massive difference between an algorithm written with nested loops in 2D and 3D separately, versus a truly N-D version of the same algorithm.  This is not a quick fix; it's usually an entirely different approach.  The massive difference dramatically narrows the potential contributors which could bring such an algorithm to us, and even asking the question of people generally will result in abandoned PRs and - like it or not - the perception that if any potential contributor can't (or doesn't think they can) manage things in N-D they might as well not try.  Do we find this acceptable?

The performance and future support implications are also nontrivial.  The theoretical nested loop code I mention above is very easy to understand, for reviewers and future maintainers, and is already in an ideal form for Numba to JIT.  As Numba continues to improve, the nested loops with branching paths for 2D and 3D might well beat a complex N-D implementation written in Cython (which may have no real pathway to JITting).  At SciPy I heard Pandas is considering Numba as a hard dependency.  We should probably be paying close attention to that.

An example from direct experience: I had flood fill written in Cython for 2D over 3 years ago (see https://gist.github.com/JDWarner/83808ce4c9374a242905).  The 3D case is trivial to add as a separate path, but we wanted this to be N-D.  I didn't see an ideal approach to do so, thus it went on the back burner until I saw Lars' work on #3022.  Borrowing and building on that framework, years later we got N-D flood fill in #3245.  Along the way there were unintuitive pitfalls to the "ravel approach" as we're calling it in #4088 which were difficult for me to find, and likely a big challenge for reviewers to catch.  This approach also requires the image be padded by 1, then cropped back down, as a way to know where the borders are since we're working on the raveled array.  Copying the input and then making it even larger is bad practice for large images and may cause memory issues (nested loop implementations, in contrast, do not require copies or padding).  So yes, flood fill is now N-D and there is one code path, but it took over 2 additional years... and would probably be more performant in 2D/3D as nested loops, more maintainable, as well as better future-proofed (Numba).  As nice as it is I do not think this algorithm as written could be extracted from Cython and handed to Numba.  Would it have been better to have robust 2D and 3D flood fills years earlier?

Food for thought.

Josh

On Wed, Aug 7, 2019 at 7:49 AM Mark Harfouche <mark.harfouche@gmail.com> wrote:
Thank you all for responding.

I general, I agree that the "easy ND contributions" should be addressed ASAP.

That said, it seems that there there is consensus that the ND conversation can be moved later in the PR cycle when most of the other issues have been address, docstrings, testing for the 2D case, API choices, "fit", examples, gallery.

I think many of you rightfully pointed out that good code is often generalizable to ND quite readily. My hope is that by delaying the ND conversion, the PR will be in a really good state, that it will be an "easy fix".

Lars, I will have to have a detailed look at your PR.


On Wed, Aug 7, 2019 at 9:48 AM Alexandre de Siqueira <alex.desiqueira@igdore.org> wrote:
Hey everyone,

> Alexandre, something that I want to emphasise right now is that I
>
would like to consider the next ~12-24 months a sprint to 1.0.
>
Therefore, I want to avoid as much as possible taking on new
> technical
debt and focus on fixing/cleaning everything we have so
> far, according
to our (as yet not ratified) values. [1]

We're in the same page, Juan :) Thinking on that, it would be
interesting to have something on our model close to Debian Linux's
release cycle. We'd work on and receive major contributions up to a
certain point (let's say, the next 6-12 months). Then, we clean and
debug everything we can, not accepting new major code for that release;
it'll be something like Debian's frozen cycle. But I'm digressing.

Kind regards,

Alex


On Wed, 2019-08-07 at 00:24 -0500, Juan Nunez-Iglesias wrote:
> Thanks Mark for raising this and Lars and Alex for weighing in.
>
> On Mon, 5 Aug 2019, at 5:40 AM, Alexandre de Siqueira wrote:
> > > > I think what is important is to keep the API forward looking
> > > > for
> > > > ND. We can do things like accept tuples for parameters, instead
> > > > of
> > > > "x_center", "y_center", or "r_center", "c_center". We can also
> > > > just throw errors when people pass in higher dimensions images
> > > > saying that it isn't implemented and contributions are welcome.
> > >
> > > I think this is critical and really like this compromise. Don't
> > > do
> > > more work than is actually needed while not making future work
> > > harder. Equally critical to me is that whether a function
> > > supports
> > > ND or only 2D is clearly documented. Maybe after a parameter's
> > > type
> > > declaration in the docstring?
> >
> > I guess these are all good solutions for us right now. I'm sure
> > that
> > we'll break, fix and clean lots of stuff/code/API until we reach
> > 1.0
> > (and that is okay™). This problems will be tackled naturally on the
> > way.
>
> I agree with Lars, keeping the *API* nD-compatible is already a very
> good step towards keeping the Guardian of the Dimensions happy. =P
> Great suggestion, Mark!
>
> Alexandre, something that I want to emphasise right now is that I
> would like to consider the next ~12-24 months a sprint to 1.0.
> Therefore, I want to avoid as much as possible taking on new
> technical debt and focus on fixing/cleaning everything we have so
> far, according to our (as yet not ratified) values. [1]
>
> > > > Optimizing for the different use cases is just tricky, and
> > > > won't
> > > > be done correctly if contributors are asked to include
> > > > something
> > > > they have no immediate need for.
> >
> > We can help in some of these cases. If we (core group) can't, we
> > can:
> > 1. discuss if the algorithm is ready as-is to be included and
> > upgraded
> > from there; 2. ask for external community help.
>
> My main idea is that I am happy to jump in and make things nD at the
> PR stage. However, I don't have the bandwidth to do this for *all*
> contributions and help is always appreciated. This is why it's
> important to be on the same page.
>
> Additionally, as I raised in the SEEDS PR [2], a contribution might
> be worthwhile in nD but not worthwhile in 2D, because 2D
> implementations in Python already exist. (And in this case they have
> a good chance of being faster than ours!) We are having enough
> trouble keeping up with the maintenance burden as it is, so the value
> of new contributions to the community should be weighed against the
> maintenance burden.
>
> It might be time to revisit the idea of skimage-contrib, a repo in
> which functions that don't quite meet this benchmark can live, until
> they are ready to graduate.
>
> > > In that vein, wasn't there a discussion about writing a guide on
> > > ND-algorithms (e.g. the raveled approach used in watershed,
> > > local_maxima, flood_fill)? Was there any progress on this? I
> > > think
> > > this could be really useful to the community and a good resource
> > > for new contributors regardless of how skimage's policy on this
> > > turns out.
> >
> > +1. Or even, contributors aiming to enhance our 3D/nD
> > possibilities,
> > only :)
>
> Yes, I mentioned that I should write a guide and here we are, no
> further along. Apologies for that. I'll prioritise this work for the
> summer/fall. If we have an easy guide, including pointing to PRs or
> commits that convert code to nD, that will make things a lot easier.
>
> Juan.
>
> .. [1] https://github.com/scikit-image/scikit-image/pull/3585
> .. [2] https://github.com/scikit-image/scikit-image/pull/1557
> _______________________________________________
> scikit-image mailing list -- scikit-image@python.org
> To unsubscribe send an email to scikit-image-leave@python.org
>
--
--------------------------------------------------
Dr. Alexandre de Siqueira
Berkeley Institute for Data Science - BIDS
190 Doe Library
University of California, Berkeley
Berkeley, CA 94720
United States

Lattes CV: 3936721630855880
ORCID: 0000-0003-1320-4347
Github: alexdesiqueira
--------------------------------------------------
_______________________________________________
scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
_______________________________________________
scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org