Hello,
I am attempting to set up a numpy.distutils setup.py for a small python program that uses a Fortran module. Currently, the setup is able to compile and install the program seemingly successfully, but the f2py script erroneously maps the data types I am using to float, rather than double. I have the proper mapping set up in a .f2py_f2cmap in the source directory, but it does not seem to be copied to the build directory at compile time, and I cannot figure out how to make it get copied. Is there a simple way to do what I am trying to do? Alternatively, is there a way to specify the mapping in my setup.py scripts?
Here's a github repo with the project:
https://github.com/ehermes/ased3
Thanks,
Eric Hermes
Hi all,
The docstring of np.full indicates that the result of the dtype is
`np.array(fill_value).dtype`, as long as the keyword argument `dtype`
itself is not set. This is actually not the case: the current
implementation always returns a float array when `dtype` is not set, see
e.g.
In [1]: np.full(1, 1)
Out[1]: array([ 1.])
In [2]: np.full(1, None)
Out[2]: array([ nan])
In [3]: np.full(1, None).dtype
Out[3]: dtype('float64')
In [4]: np.array(None)
Out[4]: array(None, dtype=object)
The note about return value of the dtype was actually explicitly discussed
in https://github.com/numpy/numpy/pull/2875 but the tests failed to cover
the case where the `dtype` argument is not passed.
We could either change the docstring to match the current behavior, or fix
the behavior to match what the docstring says (my preference). @njsmith
mentioned in https://github.com/numpy/numpy/issues/6366 that this may be
acceptable as a bug fix, as "it's a very new function so there probably
aren't many people relying on it" (it was introduced in 1.8).
I guess the options are:
- Fix the behavior outright and squeeze this in 1.10 as a bugfix (my
preference).
- Emit a warning in 1.10, fix in 1.11.
- Do nothing for 1.10, warn in 1.11, fix in 1.12 (at that point the
argument of `np.full` being a very new function starts becoming invalid...).
- Change the docstring.
Thoughts?
Antony
At last, goto for python <https://github.com/snoack/python-goto>!
Usage:
from goto import with_goto
@with_goto
def range(start, stop):
i = start
result = []
label .begin
if i == stop:
goto .end
result.append(i)
i += 1
goto .begin
label .end
return result
HT: LWN
Chuck
Hi all,
I'm pleased to announce the availability of Numpy 1.10.0rc1. Sources and 32
bit binary packages for Windows may be found at Sourceforge
<https://sourceforge.net/projects/numpy/files/NumPy/1.10.0rc1/?upload_just_c…>.
Please test this out, as I would like to move to the final release as
rapidly as possible and the lack of error reports from the beta has left me
nervous. It's been quiet, too quiet. In the movies, we would all die in the
next five minutes.
Cheers
Chuck
Hey Jamie, List,
Having just come back from a conference where our toolkit, Py-ART [1]
has picked up a nice following of people keen to contribute I was
wondering if you will be opening this up via a google hangout or similar?
I would love to advertise this to our users. We all want more
contributors and a big roadblock is understanding the fork and pull
request system of GitHub
We did run a course that had some GitGub etc here:
https://github.com/scollis/SusSoPrac You are welcome to use anything
liberally!
Cheers,
Scott
On 9/23/15 4:39 PM, numpy-discussion-request(a)scipy.org wrote:
> Send NumPy-Discussion mailing list submissions to
> numpy-discussion(a)scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> numpy-discussion-request(a)scipy.org
>
> You can reach the person managing the list at
> numpy-discussion-owner(a)scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
> 1. "Become an Open Source Contributor" workshop
> (Jaime Fern?ndez del R?o)
> 2. Re: composition of the steering council (was Re: Governance
> model request) (Travis Oliphant)
> 3. Re: Governance model request (Stefan van der Walt)
> 4. Re: Governance model request (Matthew Brett)
> 5. Re: composition of the steering council (was Re: Governance
> model request) (Chris Barker)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 23 Sep 2015 14:06:08 -0700
> From: Jaime Fern?ndez del R?o <jaime.frio(a)gmail.com>
> To: SciPy Developers List <scipy-dev(a)scipy.org>, Discussion of
> Numerical Python <numpy-discussion(a)scipy.org>
> Subject: [Numpy-discussion] "Become an Open Source Contributor"
> workshop
> Message-ID:
> <CAPOWHWnk7mNm64_FuQkV5KCX=vxAyctBh7P1X7KbcaMgohSAOg(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Apologies for the cross-posting.
>
> The Data Science Student Society of the University of California San Diego,
> or DS3 @ UCSD as they like to call themselves, will be holding biweekly
> Python themed workshops starting this fall. On the week of October 19th,
> they will be having yours truly doing a "Become an Open Source Contributor"
> piece. It will be a shortish event, 60-90 minutes, so my idea was to cover
> the following:
>
> 1. (15 min) An introduction to the Python data science landscape.
> 2. (30 min) An overview of the GitHub workflow that most (all?) of the
> projects follow.
> 3. (30-45 min) A hands on session, where we would make sure everyone
> gets set up in GitHub, and forks and clones their favorite project. Time
> and participant willingness permitting, I would like to take advantage of
> my commit bits, and have some of the participants submit a simple PR, e.g.
> fixing a documentation typo, to NumPy or SciPy, and hit the green button
> right there, so that they get to leave as knighted FOSS contributors.
>
> And this is what I am hoping to get from you, the community:
>
> 1. If anyone in the area would like to get involved, please contact me.
> I have recruited a couple of volunteers from PySanDiego, but could use more
> help.
> 2. I'm also hoping to get some help, especially with the introductory
> part. Given that the crowd will mostly be university students and some
> faculty, it would be great if someone who actually knew what they were
> talking about could deliver a short, 10 minute talk, on Python, data
> science, and academia. I'm sure we could arrange it to have someone join
> by video conference.
> 3. If you have organized anything similar in the past, and have material
> that I could use to, ahem, draw inspiration from, or recommendations to
> make, or whatever, I'd love to hear from you.
>
> Thanks for reading!
>
> Jaime
>
Hi Travis,
On Tue, Sep 22, 2015 at 3:08 AM, Travis Oliphant <travis(a)continuum.io> wrote:
>
>
> On Tue, Sep 22, 2015 at 4:33 AM, Nathaniel Smith <njs(a)pobox.com> wrote:
>>
>> On Tue, Sep 22, 2015 at 1:24 AM, Travis Oliphant <travis(a)continuum.io>
>> wrote:
>>>
>>> I actually do agree with your view of the steering council as being
>>> usually not really being needed. You are creating a straw-man by
>>> indicating otherwise. I don't believe a small council should do anything
>>> *except* resolve disputes that cannot be resolved without one. Like you, I
>>> would expect that would almost never happen --- but I would argue that
>>> extrapolating from Debian's experience is not actually relevant here.
>>
>>
>> To be clear, Debian was only one example -- what I'm extrapolating from is
>> every community-driven F/OSS project that I'm aware of.
>>
>> It's entirely possible my data set is incomplete -- if you have some other
>> examples that you think would be better to extrapolate from, then I'd be
>> genuinely glad to hear them. You may have noticed that I'm a bit of an
>> enthusiast on this topic :-).
>>
>
>
> Yes, you are much better at that than I am. I'm not even sure where I
> would look for this kind of data.
>
>>>
>>>
>>>
>>> So, if the steering council is not really needed then why have it at all?
>>> Let's just eliminate the concept entirely.
>>>
>>
>> In my view, the reasons for having such a council are:
>> 1) The framework is useful even if you never use it, because it means
>> people can run "what if" scenarios in their mind and make decisions on that
>> basis. In the US legal system, only a vanishingly small fraction of cases go
>> to the Supreme Court -- but the rules governing the Supreme Court have a
>> huge effect on all cases, because people can reason about what would happen
>> *if* they tried to appeal to the Supreme Court.
>
>
> O.K. That is a good point. I can see the value in that.
>
>
>>
>> 2) It provides a formal structure for interfacing with the outside world.
>> E.g., one can't do anything with money or corporate contributions without
>> having some kind of written-down and enforceable rules for making decisions
>> (even if in practice you always stick to the "everyone is equal and we
>> govern by consensus" part of the rules).
>
>
> O.K.
>
>>
>> 3) There are rare but important cases where discussions have to be had in
>> private. The main one is "personnel decisions" like inviting people to join
>> the council; another example Fernando has mentioned to me is that when they
>> need to coordinate a press release between the project and a funding body,
>> the steering council reviews the press release before it goes public.
>
>
> O.K.
>
>
>>
>> That's pretty much it, IMO.
>>
>> The framework we all worked out at the dev meeting in Austin seems to
>> handle these cases well AFAICT.
>
>
> How did we "all" work it out when not everyone was there? This is where I
> get lost. You talk about community decision making and yet any actual
> decision is always a subset of the community. I suppose you just rely on
> the "if nobody complains than it's o.k." rule? That really only works if
> the project is moving slowly.
By "all" I just meant "all of us who were there" (which was a majority
of the active maintainers + a number of other interested parties --
the list of attendees is in the meeting notes if you're curious).
In general I totally agree with your concern about only including a
subset of the community. That's why we followed up by posting to the
list a full set of notes on tentative-decisions-made, and the draft
governance document in particular, for further discussion. We've
already had multiple threads talking about it, even before this one.
And it's pretty explicit in the document itself that no non-trivial
decision can be considered final unless it's *at least* been posted on
the mailing list.
We didn't try to legislate the exact review requirements for every
decision, because it's impossible to have a set of rules that scales
from trivial typos (which just get merged, only github subscribers
even know it happened) to foundational discussions like this one. This
means that one of the things we trust contributors (esp. senior
contributors) to do is to use their knowledge of the project to make
judgement calls about how risky or controversial a given change will
be, or if there's some particular expertise that should be consulted.
(E.g. we might make sure to ping Robert Kern if there's some np.random
change being discussed; I'm hesitant to finalize the PyUFunc ABI
changes being discussed in the other thread until Ralf gets back,
because I know that among the core maintainers he's particularly
critical of the idea of breaking ABI.)
And if we new information later comes to light then a decision can
always be revisited -- people may get grumpy if you try to re-open an
issue that's been considered settled for a year, but if you have a
good reason and nothing irrevocable has happened (e.g. a veto
obviously can't remove code from an existing release), then, well,
it's annoying but what can you do, let's hear your reason.
It's could certainly happen that sometimes the steering council +
mailing list readers will all miss something important. But this is
unavoidable in any system -- we're obviously not going to, like,
institute a one month waiting period on every single decision or
something. Ultimately you have to trust the core maintainers to have
good judgement about which changes to accept, together with good
meta-judgement about how controversial or broad-reaching any given
change is likely to be, and then hope that the rest of the community
will also supplement as they can.
>>> But there are real questions that have to have an answer or an approach
>>> to making a decision. The answer to these questions cannot really be a
>>> vague notion of "lack of vigorous opposition by people who read the mailing
>>> list" which then gets parried about as "the community decided this." The
>>> NumPy user base is far, far larger than the number of people that read this
>>> list.
>>
>>
>> According to the dev meeting rules, no particularly "vigorous opposition"
>> is required -- anyone who notices that something bad is happening can write
>> a single email and stop an idea dead in its tracks, with only the steering
>> council able to overrule. We expect this will rarely if ever happen, because
>> the threat will be enough to keep everyone honest and listening, but about
>> the only way we could possibly be *more* democratic is if we started phoning
>> up random users at home to ask their opinion.
>
>
> O.K. so how long is the time allowed for this kind of opposition to be
> noted?
See above. For regular discussions, there are some rough guidelines
(uncontroversial bug fixes can just be merged; substantive API changes
need at least a few days review on the mailing list). This governance
discussion has been left open for a few weeks, and: "worst case, if a
change is more controversial than expected, or a crucial critique is
delayed because someone was on vacation, then it's no big deal: we
apologize for misjudging the situation, [back up, and sort things
out](http://producingoss.com/en/producingoss.html#version-control-relaxatio…."
(I think we all thought the governance discussion was done, actually.
But then you posted, and so now we're talking about it some more. No
worries; if there's an issue, we'd rather know, right?)
For formal council votes (the ones we expect will rarely if every
happen), then we do have a slightly more formal rule: that the vote
"should be held open for long enough to give all interested Council
Members a chance to respond -- at least one week." The words "at
least" are in there to emphasize that the goal is to get an honest
read of the council; e.g. it's not legitimate to play games by
scheduling a vote when you know someone is on vacation.
>>
>> This is actually explicitly designed to prevent the situation where
>> whoever talks the loudest and longest wins, and to put those with more and
>> less time available on an equal footing.
>>
>>>
>>> For better or for worse, we will always be subject to the "tyranny of who
>>> has time to contribute lately". Fundamentally, I would argue that this
>>> kind of "tyranny" should at least be tempered by additional considerations
>>> from long-time contributors who may also be acting more indirectly than is
>>> measured by a simple git log.
>>
>>
>> I guess I am missing something fundamental here. Who are these long-time
>> contributors who will sit on your council of 1/3/5 but who don't even read
>> the mailing list? How will they know when their special insight is
>> necessary?
>
>
> The long-time contributors wouldn't necessarily sit on that council. But,
> I would support the idea of an advisory council that could act if it saw
> things going the wrong direction. This is where those people would sit.
In the draft governance document, anyone who cares enough to pay
attention effectively has a seat on this advisory council. I assume
this is a superset of the people that you would nominate?
> In the case of a 1 / 3 / 5 member council -- I would not argue to be on it
> at all (but I would argue that some care should be taken to be sure it has
> people with some years of experience if they are available). I'm only
> asking to be on a steering council that is larger than 5 people, and I don't
> actually prefer that the steering council be larger than 5 people.
>
>>
>> No, absolutely not. The proposal is that these issues are decided by open
>> discussion on the mailing list. (With the possible exception of #4 -- I
>> suspect that given an extreme situation like this, then once all other
>> efforts to mitigate the situation had failed the steering council would
>> probably feel compelled to talk things over to double-check they had
>> consensus before acting, and likely some part of this would have to be done
>> in private.)
>
>
> O.K. Then, I am misunderstanding.
Oh good, I'm glad when things turn out to be misunderstandings,
because those are (relatively) easy to solve :-).
>>
>> This is pretty explicit in the document (and again, this is text and ideas
>> stolen directly from Jupyter/IPython):
>>
>> "During the everyday project activities, council members participate in
>> all discussions, code review and other project activities as peers with all
>> other Contributors and the Community. In these everyday activities, Council
>> Members do not have any special power or privilege through their membership
>> on the Council. [...] the Council may, if necessary [do pretty much
>> anything, but] the Council's primary responsibility is to facilitate the
>> ordinary community-based decision making procedure described above. If we
>> ever have to step in and formally override the community for the health of
>> the Project, then we will do so, but we will consider reaching this point to
>> indicate a failure in our leadership."
>
>
> Granting commit rights to the repo does not seem to me to be an "everyday
> project activity" --- so I suppose I was confused. I suppose that could
> happen with no serious objections on the list. It seems that the people
> who actually have the commit bit presently should be consulted more than the
> general public, though.
I see, right, slight miscommunication here -- I was thinking about the
decision of "what process do we [in general] use to decide who gets a
commit bit", not the decision "should [this person] get a commit bit".
Currently, yeah, the general process is that commit bits get given out
by a somewhat informal private discussion among active committers (or,
presumably, the "the steering council" if it does get formalized). But
if someone wants to suggest that we switch to some other process
instead, or formalize the current process, then anything like that
would be proposed and debated on the mailing list.
(One of my favorite wacky policies is that there are projects like
Rubinius which automatically grant commit bits to everyone who submits
a successful patch. I guess the theory is that worst case, they go and
merge something they shouldn't have, more senior folks see it going by
and they revert it again, no biggie! Usually the problem is just the
opposite -- there are never enough reviewers. But I have not quite had
the guts to seriously propose this for numpy ;-).)
-n
--
Nathaniel J. Smith -- http://vorpus.org
Hi all,
Here's a first draft NEP for comments.
--
Synopsis
========
Improving numpy's dtype system requires that ufunc loops start having
access to details of the specific dtype instance they are acting on:
e.g. an implementation of np.equal for strings needs access to the
dtype object in order to know what "n" to pass to strncmp. Similar
issues arise with variable length strings, missing values, categorical
data, unit support, datetime with timezone support, etc. -- this is a
major blocker for improving numpy.
Unfortunately, the current ufunc inner loop function signature makes
it very difficult to provide this information. We might be able to
wedge it in there, but it'd be ugly.
The other option would be to change the signature. What would happen
if we did this? For most common uses of the C API/ABI, we could do
this easily while maintaining backwards compatibility. But there are
also some rarely-used parts of the API/ABI that would be
prohibitively difficult to preserve.
In addition, there are other potential changes to ufuncs on the
horizon (e.g. extensions of gufuncs to allow them to be used more
generally), and the current API exposure is so massive that any such
changes will be difficult to make in a fully compatible way. This NEP
thus considers the possibility of closing down the ufunc API to a
minimal, maintainable subset of the current API.
To better understand the consequences of this potential change, I
performed an exhaustive analysis of all the code on Github, Bitbucket,
and Fedora, among others. The results make me highly confident that of
all the publically available projects in the world, the only ones
which touch the problematic parts of the ufunc API are: Numba,
dynd-python, and `gulinalg <https://github.com/ContinuumIO/gulinalg>`_
(with the latter's exposure being trivial).
Given this, I propose that for 1.11 we:
1) go ahead and hide/disable the problematic parts of the ABI/API,
2) coordinate with the known affected projects to minimize disruption
to their users (which is made easier since they are all projects that
are almost exclusively distributed via conda, which enforces strict
NumPy ABI versioning),
3) publicize these changes widely so as to give any private code that
might be affected a chance to speak up or adapt, and
4) leave the "ABI version tag" as it is, so as not to force rebuilds
of the vast majority of projects that will be unaffected by these
changes.
This NEP defers the question of exactly what the improved API should
be, since there's no point in trying to nail down the details until
we've decided whether it's even possible to change.
Details
=======
The problem
-----------
Currently, a ufunc inner loop implementation is called via the
following function prototype::
typedef void (*PyUFuncGenericFunction)
(char **args,
npy_intp *dimensions,
npy_intp *strides,
void *innerloopdata);
Here ``args`` is an array of pointers to 1-d buffers of input/output
data, ``dimensions`` is a pointer to the number of entries in these
buffers, ``strides`` is an array of integers giving the strides for
each input/output array, and ``innerloopdata`` is an arbitrary void*
supplied by whoever registered the ufunc loop. (For gufuncs, extra
shape and stride information about the core dimensions also gets
packed into the ends of these arrays in a somewhat complicated way.)
There are 4 key items that define a NumPy array: data, shape, strides,
dtype. Notice that this function only gets access to 3 of them. Our
goal is to fix that. For example, a better signature would be::
typedef void (*PyUFuncGenericFunction_NEW)
(char **data,
npy_intp *shapes,
npy_intp *strides,
PyArray_Descr *dtypes, /* NEW */
void *innerloopdata);
(In practice I suspect we might want to make some more changes as
well, like upgrading gufunc core shape/strides to proper arguments
instead of tacking it onto the existing arrays, and adding an "escape
valve" void* reserved for future extensions. But working out such
details is outside the scope of this NEP; the above will do for
illustration.)
The goal of this NEP is to clear the ground so that we can start
supporting ufunc inner loops that take dtype arguments, and make other
enhancements to ufunc functionality going forward.
Proposal
--------
Currently, the public API/ABI for ufuncs consists of the functions::
PyUFunc_GenericFunction
PyUFunc_FromFuncAndData
PyUFunc_FromFuncAndDataAndSignature
PyUFunc_RegisterLoopForDescr
PyUFunc_RegisterLoopForType
PyUFunc_ReplaceLoopBySignature
PyUFunc_SetUsesArraysAsData
together with direct access to PyUFuncObject's internal fields::
typedef struct {
PyObject_HEAD
int nin, nout, nargs;
int identity;
PyUFuncGenericFunction *functions;
void **data;
int ntypes;
int check_return;
const char *name;
char *types;
const char *doc;
void *ptr;
PyObject *obj;
PyObject *userloops;
int core_enabled;
int core_num_dim_ix;
int *core_num_dims;
int *core_dim_ixs;
int *core_offsets;
char *core_signature;
PyUFunc_TypeResolutionFunc *type_resolver;
PyUFunc_LegacyInnerLoopSelectionFunc *legacy_inner_loop_selector;
PyUFunc_InnerLoopSelectionFunc *inner_loop_selector;
PyUFunc_MaskedInnerLoopSelectionFunc *masked_inner_loop_selector;
npy_uint32 *op_flags;
npy_uint32 iter_flags;
} PyUFuncObject;
Obviously almost any future changes to how ufuncs work internally will
involve touching some part of this public API/ABI.
Concretely, the proposal here is that we avoid this by disabling the
following functions (i.e., any attempt to call them should simply
raise a ``NotImplementedError``)::
PyUFunc_ReplaceLoopBySignature
PyUFunc_SetUsesArraysAsData
and that we reduce the publicly visible portion of PyUFuncObject down to::
typedef struct {
PyObject_HEAD
int nin, nout, nargs;
} PyUFuncObject;
Data on current API/ABI usage
-----------------------------
In order to assess how much code would be affected by this proposal, I
used a combination of Github search and Searchcode.com to trawl
through the majority of all publicly available open source code.
Neither search tool provides a fine-grained enough query language to
directly tell us what we want to know, so I instead followed the
strategy of first, casting a wide net: picking a set of search terms
that are likely to catch all possibly-broken code (together with many
false positives), and second, using automated tools to sift out the
false positives and see what remained. Altogether, I reviewed 4464
search results.
The tool I wrote to do this is `available on github
<https://github.com/njsmith/codetrawl>`_, and so is `the analysis code
itself <https://github.com/njsmith/ufunc-abi-analysis>`_.
Uses of PyUFuncObject internals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are no functions in the public API which return
``PyUFuncObject*`` values directly, so any code that access
PyUFuncObject fields will have to mention that token in the course of
defining a variable, performing a cast, setting up a typedef, etc.
Therefore, I searched Github for all files written in C, C++,
Objective C, Python, or Cython, which mentioned either "PyUFuncObject
AND types" or "PyUFuncObject AND NOT types". (This is to work around
limitations on how many results Github search is willing to return to
a single query.) In addition, I searched for ``PyUFuncObject`` on
searchcode.com.
The full report on these searches is available here:
https://rawgit.com/njsmith/ufunc-abi-analysis/master/reports/pyufuncobject-…
The following were screened out as non-problems:
- Copies of NumPy itself (an astonishing number of people have checked
in copies of it to their own source tree)
- NumPy forks / precursors / etc. (e.g. Numeric also had a type called
PyUFuncObject, the "bohrium" project has a fork of numpy 1.6, etc.)
- Cython-generated boilerplate used to generate the "object has
changed size" warning (which we `unconditionally filter out anyway
<https://github.com/numpy/numpy/blob/master/numpy/__init__.py#L226>`_)
- Lots of calls to ``PyUFunc_RegisterLoopForType`` and friends, which
require casting the first argument to ``PyUFuncObject*``
- Misc. other unproblematic stuff (like Cython header declarations
that never get used)
There were also several cases that actually referenced PyUFuncObject
internal fields:
- The "rational" dtype from numpy-dtypes, which is used in a few
projects, accesses ``ufunc->nargs`` as a safety check, but does not
touch any other fields (`see here
<https://github.com/numpy/numpy-dtypes/blob/c0175a6b1c5aa89b4520b29487f06d0e…>`_).
- Numba: does some rather elaborate things to support the definition
of on-the-fly JITted ufuncs. These seem to be clear deficiencies in
the ufunc API (e.g., there's no good way to control the lifespan of
the array of function pointers passed to ``PyUFunc_FromFuncAndData``),
so we should work with them to provide the API they need to do this in
a maintainable way. Some of the relevant code:
https://github.com/numba/numba/tree/master/numba/npyufunchttps://github.com/numba/numba/blob/98752647a95ac6c9d480e81ca5c8afcfa3ddfd1…
- dynd-python: Contains some code that attempts to extract the inner
loops from a numpy ufunc object and wrap them into the dynd 'ufunc'
equivalent:
https://github.com/libdynd/dynd-python/blob/c06f8fc4e72257abac589faf76f10df…
- gulinalg: I'm not sure if anyone is still using this code since most
of it was merged into numpy itself, but it's not a big deal in any
case: all it contains is a `debugging function
<https://github.com/ContinuumIO/gulinalg/blob/2ef365c48427c026dab4f45dc6f8b1…>`_
that dumps some internal fields from the PyUFuncObject. If you look,
though, all calls to this function are already commented out :-).
The full report is available here:
https://rawgit.com/njsmith/ufunc-abi-analysis/master/reports/pyufuncobject-…
In the course of this analysis, it was also noted that the standard
Cython pxd files contain a wrapper for ufunc objects::
cdef class ufunc [object PyUFuncObject]:
...
which means that Cython code can access internal struct fields via an
object of type ``ufunc``, and thus escape our string-based search
above. Therefore I also examined all Cython files on Github or
searchcode.com that matched the query ``ufunc``, and searched for any
lines matching any of the following regular expressions::
cdef\W+ufunc
catches: 'cdef ufunc fn'
cdef\W+.*\.\W*ufunc
catches: 'cdef np.ufunc fn'
<.*ufunc\W*>
catches: '(<ufunc> fn).nargs', '(< np.ufunc > fn).nargs'
cdef.*\(.*ufunc
catches: 'cdef doit(np.ufunc fn, ...):'
(I considered parsing the actual source and analysing it that way, but
decided I was too lazy. This could be done if anyone is worried that
the above regexes might miss things though.)
There were zero files that contained matches for any of the above regexes:
https://rawgit.com/njsmith/ufunc-abi-analysis/master/reports/ufunc-cython-r…
Uses of PyUFunc_ReplaceLoopBySignature
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Applying the same screening as above, the only code that was found
that used this function is also in Numba:
https://rawgit.com/njsmith/ufunc-abi-analysis/master/reports/PyUFunc_Replac…
Uses of PyUFunc_SetUsesArraysAsData
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aside from being semi-broken since 1.7 (it never got implemented for
"masked" ufunc loops, i.e. those that use where=), there appear to be
zero uses of this functionality either inside or outside NumPy:
https://rawgit.com/njsmith/ufunc-abi-analysis/master/reports/PyUFunc_SetUse…
Rationale
---------
**Rationale for preserving the remaining API functions**::
PyUFunc_GenericFunction
PyUFunc_FromFuncAndData
PyUFunc_FromFuncAndDataAndSignature
PyUFunc_RegisterLoopForDescr
PyUFunc_RegisterLoopForType
In addition to being widely used, these functions can easily be
preserved even if we change how ufuncs work internally, because they
only ingest loop function pointers, they never return them. So they
can easily be modified to wrap whatever loop function(s) they receive
inside an adapter function that calls them at the appropriate time,
and then register that adapter function using whatever API we add in
the future.
**Rationale for preserving the particular fields that are preserved**:
Preserving ``nargs`` lets us avoid a little bit of breakage with the
random dtype, and it doesn't seem like preserving ``nin``, ``nout``,
``nargs`` fields will produce any undue burden on future changes to
ufunc internals; even if we were to introduce variadic ufuncs we could
always just stick a -1 in the appropriate fields or whatever.
**Rationale for removing PyUFunc_ReplaceLoopBySignature**: this
function *returns* the PyUFunc_GenericFunction that was replaced; if
we stop representing all loops using the legacy
PyUFunc_GenericFunction type, then this will not be possible to do in
the future.
**Rationale for removing PyUFunc_SetUsesArraysAsData**: If set as the
``innerloopdata`` on a ufunc loop, then this function acts as a
sentinel value, and causes the ``innerloopdata`` to instead be set to
a pointer to the passed-in PyArrayObjects. In principle we could
preserve this function, but it has a number of deficiencies:
- No-one appears to use it.
- It's been buggy for several releases and no-one noticed.
- AFAIK the only reason it was included in the first place is that it
provides a backdoor for ufunc loops to get access to the dtypes -- but
we are planning to fix this in a better way.
- It can't be shimmed as easily as the loop registration functions,
because we don't anticipate that the new-and-improved ufunc loop
functions will *get* access to the array objects, only to the dtypes;
so this would have to remain cluttering up the core dispatch path
indefinitely.
- We have good reason for *not* wanting to get ufunc loops get access
to the actual array objects, because one of the goals on our roadmap
is exactly to enable the use of ufuncs on non-ndarray objects. Giving
ufuncs access to dtypes alone creates a clean boundary here: it
guarantees that ufunc loops can work equally on all duck-array objects
(so long as they have a dtype), and enforces the invariant that
anything which affects the interpretation of data values should be
attached to the dtype, not to the array object.
Rejected alternatives
---------------------
**Do nothing**: there's no way we'll ever be able to touch ufuncs at
all if we don't hide those fields sooner or later. While any amount of
breakage is regrettable, the costs of cleaning things up now are less
than the costs of never improving numpy's APIs.
**Somehow sneak the dtype information in via ``void
*innerloopdata``**: This might let us preserve the signature of
PyUFunc_GenericFunction, and thus preserve
PyUFunc_ReplaceLoopBySignature. But we'd still have the problem of
leaving way too much internal state exposed, and it's not even clear
how this would work, given that we actually do want to preserve the
use of ``innerloopdata`` for actual per-loop data. (This is where the
PyUFunc_SetUsesArraysAsData hack fails.)
--
Nathaniel J. Smith -- http://vorpus.org