[Numpy-discussion] dtype repr change?

Mark Wiebe mwwiebe at gmail.com
Wed Jul 27 17:59:17 EDT 2011


On Wed, Jul 27, 2011 at 4:32 PM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> On Wed, Jul 27, 2011 at 1:12 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> > On Wed, Jul 27, 2011 at 3:09 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Wed, Jul 27, 2011 at 14:47, Mark Wiebe <mwwiebe at gmail.com> wrote:
> >> > On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe <mwwiebe at gmail.com>
> wrote:
> >> >> > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett
> >> >> > <matthew.brett at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe <mwwiebe at gmail.com>
> >> >> >> wrote:
> >> >> >> > This was the most consistent way to deal with the parameterized
> >> >> >> > dtype
> >> >> >> > in
> >> >> >> > the
> >> >> >> > repr, making it more future-proof at the same time. It was
> >> >> >> > producing
> >> >> >> > reprs
> >> >> >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is
> >> >> >> > clearly
> >> >> >> > wrong,
> >> >> >> > and putting quotes around it makes it work in general for all
> >> >> >> > possible
> >> >> >> > dtypes, present and future.
> >> >> >>
> >> >> >> I don't know about you, but I find maintaining doctests across
> >> >> >> versions changes rather tricky.  For our projects, doctests are
> >> >> >> important as part of the automated tests.  At the moment this
> means
> >> >> >> that many doctests will break between 1.5.1 and 2.0.  What do you
> >> >> >> think the best way round this problem?
> >> >> >
> >> >> > I'm not sure what the best approach is. I think the primary use of
> >> >> > doctests
> >> >> > should be to validate that the documentation matches the
> >> >> > implementation,
> >> >> > and
> >> >> > anything confirming aspects of a software system should be regular
> >> >> > tests.
> >> >> >  In NumPy, there are platform-dependent differences in 32 vs 64 bit
> >> >> > and
> >> >> > big
> >> >> > vs little endian, so the part of the system that changed already
> >> >> > couldn't be
> >> >> > relied on consistently. I prefer systems where the code output in
> the
> >> >> > documentation is generated as part of the documentation build
> process
> >> >> > instead of being included in the documentation source files.
> >> >>
> >> >> Would it be fair to summarize your reply as 'just deal with it'?
> >> >
> >> > I'm not sure what else I can do to help you, since I think this aspect
> >> > of
> >> > the system should be subject to arbitrary improvement. My
> recommendation
> >> > is
> >> > in general not to use doctests as if they were regular tests. I'd
> rather
> >> > not
> >> > back out the improvements to repr, if that's what you're suggesting
> >> > should
> >> > happen. Do you have any other ideas?
> >>
> >> In general, I tend to agree that doctests are not always appropriate.
> >> They tend to "overtest" and express things that the tester did not
> >> intend. It's just the nature of doctests that you have to accept if
> >> you want to use them. In this case, the tester wanted to test that the
> >> contents of the array were particular values and that it was a boolean
> >> array. Instead, it tested the precise bytes of the repr of the array.
> >> The repr of ndarrays are not a stable API, and we don't make
> >> guarantees about the precise details of its behavior from version to
> >> version. doctests work better to test simpler types and methods that
> >> do not have such complicated reprs. Yes, even as part of an automated
> >> test suite for functionality, not just to ensure the compliance of
> >> documentation examples.
> >>
> >> That said, you could only quote the dtypes that require the extra
> >> [syntax] and leave the current, simpler dtypes alone. That's a
> >> pragmatic compromise to the reality of the situation, which is that
> >> people do have extensive doctest suites already around, without
> >> removing your ability to innovate with the representations of the new
> >> dtypes.
> >
> > That sounds reasonable to me, and I'm happy to review pull requests from
> > anyone who has time to do this change.
>
> Forgive me, but this seems almost ostentatiously unhelpful.
>

I was offering to help, I think you're reading between the lines too much.
The kind of response I was trying to invite is more along the lines of "I'd
like to help, but I'm not sure where to start. Can you give me some
pointers?"

I understand you have little sympathy for the problem, but, just as a
> social courtesy, some pointers as to where to look would have been
> useful.
>

I do have sympathy for the problem, dealing with bad design decisions made
early on in software projects is pretty common. In this case what Robert
proposed is a good temporary solution, but ultimately NumPy needs the
ability to change its repr and other details like it in order to progress as
a software project.

If I recall correctly the relevant functions are in Python and called
array_repr and array2string, and they're in some of the files in numpy/core.
I don't remember the file names, but a grep or find in files should track
that down pretty quick.

Cheers,
Mark



> See you,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110727/d48c8576/attachment.html>


More information about the NumPy-Discussion mailing list