Proposal: Deprecate np.int, np.float, etc.?
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
Hi all, So one of the things exposed in the numpy namespace are objects called np.int np.float np.bool etc. These are commonly used -- in fact, just yesterday on another project I saw a senior person reviewing a pull request instruct a more junior person that they should use np.float instead of float or np.float64. But AFAICT everyone who is actually using them is doing this based on a very easy-to-fall-for misconception, i.e., that these objects have something to do with numpy. In fact they are just aliases for the regular builtin Python types: 'int', 'float', 'bool', etc. NumPy *does have* special numpy-specific types -- but these are called np.int_, np.float_, np.bool_, etc. Apparently they were set up this way back in numpy 0.something, as a backwards compatibility (!) hack: https://github.com/numpy/numpy/pull/6103#issuecomment-123801937 Now, 10+ years later, they continue to confuse people on a regular, ongoing basis, and new users are still being taught misleading "facts" about them. I suggest that we should deprecate them, with no fixed schedule for actually removing them. (I have no idea if/when people will actually stop using them to the point that we can get away with removing them entirely, but in the mean time we should at least be publicizing that any code which is using them is almost certainly based on a misunderstanding.) The technical challenge here is that historically it has simply been impossible to deprecate a global constant like this without using version-specific hacks or accepting unacceptable slowdowns on every attribute access. But, python 3.5 finally adds the necessary machinery to do this in a future-proof way, so now it can be done safely across all versions of Python that we care about, including future unreleased versions: https://github.com/njsmith/metamodule/ Hence: https://github.com/numpy/numpy/pull/6103 Thoughts? -n P.S.: using metamodule.py also gives us the option of making np.testing lazily imported, which last time this came up was benchmarked to improve numpy's import speed by ~35% [1] -- not too bad given that most production code will never touch np.testing. But this is just a teaser postscript; I'm not proposing that we actually do this at this time :-). [1] http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063147.html -- Nathaniel J. Smith -- http://vorpus.org
![](https://secure.gravatar.com/avatar/c0da24f75f763b6bac90b519064f30b3.jpg?s=120&d=mm&r=g)
On 07/23/2015 04:29 AM, Nathaniel Smith wrote:
I don't see the issue. They are just aliases so how is np.float worse than just float? Too me this does not seem worth the bother of deprecation. An argument could be made for deprecating creating dtypes from python builtin types as they are ambiguous (C float != python float) and platform dependent. E.g. dtype=int is just an endless source of bugs. But this is also so invasive that the deprecation would never be completed and just be a bother to everyone. So -1 from me.
I doubt these numbers from 2012 are still correct. When this was last profiled last year the import there were two main offenders, add_docs and np.polynomial. Both have been fixed in 1.9. I don't recall np.testing even showing up.
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Julian Taylor <jtaylor.debian@googlemail.com> wrote:
I don't see the issue. They are just aliases so how is np.float worse than just float?
I have burned my fingers on it. Since np.double is a C double I assumed np.float is a C float. It is not. np.int has the same problem by being a C long. Pure evil. Most users of NumPy probably expect the np.foobar dtype to map to the corresponding foobar C type. This is actually inconsistent and plain dangerous. It would be much better if dtype=float meant Python float, dtype=np.float meant C float, dtype=int meant Python int, and dtype=np.int meant C int. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Fri, Jul 24, 2015 at 10:03 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
I must have too -- but I don't recall, because I am VERY careful about not using np.float, no.int, etc... but I do have to constantly evangelize and correct code others put in my code base. This really is very, very, ugly. we get away with np.float, because every OS/compiler that gets any regular use has np.float == a c double, which is always 64 bit. but, as Sturla points our, no.int being a C long is a disaster! So +inf on deprecating this, though I have no opinion about the mechanism. Ans sadly, it will be a long time before we can actually remove them, so the evangelizing and code reviews will need to co continue for a long time... -Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Chris Barker <chris.barker@noaa.gov> wrote:
we get away with np.float, because every OS/compiler that gets any regular use has np.float == a c double, which is always 64 bit.
Not if we are passing an array of np.float to a ac routine that expects float*, e.g. in OpenGL, BLAS or LAPACK. That will for sure give crazy results, just hang, or segfault. I got away with pisting a PR with a "bugfix" which supposedly should fix a case of precision loss in a SciPy routine, because I thought np.float was np.float32 and not np.float64 (which it is). But it did make me feel rather stupid. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sun, Jul 26, 2015 at 11:19 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
well, yes, it is confusing, but at least consistent. So if you use it once correctly in your Python-C transition code, it should work the same way everywhere. As opposed to a np.int which is a python int, which is (if I have this right): 32 bits on all (most) 32 bit platforms 64 bits on 64 bit Linux and OS-X 32 bits on 64 bit Windows (also if compiled by cygwin??) And who knows on a Cray or ARM, or??? Ouch!!! Anyway -- we agree on this -- having the python types in the numpy namespace is confusing and dangerous -- even if it will take forever to deprecate them! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Chris Barker <chris.barker@noaa.gov> wrote:
sizeof(long) is 8 on 64-bit Cygwin. This is to make sure it is inconsistent with MSVC and MinGW-w64, and make sure there will always be ABI mismatches unless the headerfiles are modified accordingly. OTOH, it is one only sane 64-bit compiler on Windows. You can actually take code written for 64 bit Linux or OSX and expect that it will work correctly. Sturla
![](https://secure.gravatar.com/avatar/3f3ddaf8c73d11ed6ab7ea905305eaf6.jpg?s=120&d=mm&r=g)
Been using numpy in it's various forms since like 2005. burned on int, int_ just today with boost.python / ndarray conversions and a number of times before that. intc being C's int!? Didn't even know it existed till today. This isn't the first time, esp with float. Bool is actually expected for me and I'd prefer it stay 1 byte for storage efficiency - I'll use a long if I want it machine word wide. This really needs changing though. scientific researchers don't catch this subtlety and expect it to be just like the c and matlab types they know a little about. I can't even keep it straight in all circumstances, how can I expect them to? This makes all the newcomers face the same pain and introduce more bugs into otherwise good code. +1 Change it now like ripping off a bandaid. Match C11/C++11 types and solve much pain past present and future in exchange for a few lashings for the remainder of the year. Thankfully stdint like types have existed for quite some times so protocol descriptions have been correct most of the time. -Jason On Fri, Jul 24, 2015 at 8:51 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
![](https://secure.gravatar.com/avatar/c0da24f75f763b6bac90b519064f30b3.jpg?s=120&d=mm&r=g)
On 31.07.2015 08:24, Jason Newton wrote:
A long is only machine word wide on posix, in windows its not. This nonsense is unfortunately also in numpy. It also affects dtype=int. The correct word size type is actually np.intp. btw. if something needs deprecating it is np.float128, this is the most confusing type name in all of numpy as its precision is actually a 80 bit in most cases (x86), 64 bit sometimes (arm) and very rarely actually 128 bit (sparc).
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
On 31/07/15 09:38, Julian Taylor wrote:
A long is only machine word wide on posix, in windows its not.
Actually it is the opposite. A pointer is 64 bit on AMD64, but the native integer and pointer offset is only 32 bit. But it does not matter because it is int that should be machine word sized, not long, which it is on both platforms. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sun, Aug 2, 2015 at 5:13 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
All this illustrates that there is a lot of platform independence and complexity to the "standard" C types. I suppose it's a good thing -- you can use something like "int" in C code, and presto! more precision in the future when you re-compile on a newer system. However, for any code that needs some kind of binary compatibility between systems (or is dynamic, like python -- i.e. types are declared at run-time, not compile time), the "fixed width types are a lot safer (or at least easier to reason about). So we have tow issue with numpy: 1) confusing python types with C types -- e.g. np.int is currently a python integer, NOT a C int -- I think this is a litte too confusing, and should be depricated. (and np.long -- even more confusing!!!) 2) The vagaries of the standard C types: int, long, etc (spelled np.intc, which is a int32 on my machine, anyway) [NOTE: is there a C long dtype? I can't find it at the moment...] It's probably a good idea to keep these, particularly for interfacing with C code (like my example of calling C code that use int). Though it would be good to make sure the docstring make it clear what they are. However, I"d like to see a recommended practice of using sized types wherevver you can: uint8 int32 float32 float54 etc.... not sure how to propagate that practice, but I'd love to see it become common. Should we add aliases for the stdint names? np.int_32_t, etc??? might be good to adhere to an established standard. -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 08/03/2015 12:25 PM, Chris Barker wrote:
Numpy does define "the platform dependent C integer types short, long, longlong and their unsigned versions" according to the docs. size_t is the same size as intc. Even though float and double are virtually always IEEE single and double precision, maybe for consistency we should also define np.floatc, np.doublec and np.longdoublec? Allan
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, Aug 3, 2015 at 11:05 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
On 03/08/15 18:25, Chris Barker wrote:
[NOTE: is there a C long dtype? I can't find it at the moment...]
There is, it is called np.int.
well, IIUC, np.int is the python integer type, which is a C long in all the implemtations of cPython that I know about -- but is that a guarantee? in the future as well? For instance, if it were up to me, I'd use an int_64_t on all 64 bit platforms, rather than having that odd 32 bit on Windows, 64 bit on *nix silliness.... This just illustrates the problem... So another minor proposal: add a numpy.longc type, which would be platform C long. (and probably just an alias to something already there). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
On 03/08/15 20:51, Chris Barker wrote:
It is a Python int on Python 2. On Python 3 dtype=np.int means the dtype will be C long, because a Python int has no size limit. But np.int aliases Python int. And creating an array with dype=int therefore does not create an array of Python int, it creates an array of C long. To actually get dtype=int we have to write dtype=object, which is just crazy. Sturla
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote:
Since it seemes there may be a few half truths flying around in this thread. See http://docs.scipy.org/doc/numpy/user/basics.types.html and also note the sentence below the table (maybe the table should also note these): Additionally to intc the platform dependent C integer types short, long, longlong and their unsigned versions are defined. - Sebastian
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Tue, Aug 4, 2015 at 4:39 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Quote: "Note that, above, we use the *Python* float object as a dtype. NumPy knows that int refers to np.int_, bool meansnp.bool_, that float is np.float_ and complex is np.complex_. The other data-types do not have Python equivalents." Is there a conflict with the current thread? Josef (I'm not a C person, so most of this is outside my scope, except for watching bugfixes to make older code work for larger datasets. Use `intp`, Luke.)
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Jul 30, 2015 at 11:24 PM, Jason Newton <nevion@gmail.com> wrote:
well, C types are a %&$ nightmare as well! In fact, one of the biggest issues comes from cPython's use of a C "long" for an integer -- which is not clearly defined. If you are writing code that needs any kind of binary compatibility, cross platform compatibility, and particularly if you want to be abel to distribute pre-compiled binaries of extensions, etc, then you'd better use well-defined types. numpy has had well-defined types for ages, but it is a shame that it's so easy to use the poorly-defined ones. I can't even keep it straight in all circumstances, how can I expect them
to? This makes all the newcomers face the same pain and introduce more bugs into otherwise good code.
indeed.
Sorry -- I'm not sure what C11 types are -- is "int", "long", etc, deprecated? If so, then yes. What about Fortan -- I've been out of that loop for ages -- does semi-modern Fortran use well defined integer types? Is it possible to deprecate a bunch of the built-in numpy dtypes? Without annoying the heck out everyone -- because tehre is a LOT of code out there that just uses np.float, np.int, etc..... An argument could be made for deprecating creating dtypes from python
yeah, that is a big concern. :-( -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Chris Barker <chris.barker@noaa.gov> wrote:
What about Fortan -- I've been out of that loop for ages -- does semi-modern Fortran use well defined integer types?
Modern Fortran is completely sane. INTEGER without kind number (Fortran 77) is the fastest integer on the CPU. On AMD64 that is 32 bit, because it is designed to use a 64 bit pointer with a 32 bit offset. (That is also why Microsoft decided to use a 32 bit long, because it by definition is the fastest integer of at least 32 bits. One can actually claim that the C standard is violated with a 64 bit long on AMD64.) Because of this we use a 32 bit interger in BLAS and LAPACK linked to NumPy and SciPy. The function KIND (Fortran 90) allows us to query the kind number of a given variable, e.g. to find out the size of INTEGER and REAL. The function SELECTED_INT_KIND (Fortran 90) returns the kind number of smallest integer with a specified range. The function SELECTED_REAL_KIND (Fortran 90) returns the kind number of smallest float with a given range and precision. THe returned kind number can be used for REAL and COMPLEX. KIND, SELECTED_INT_KIND and SELECTED_REAL_KIND will all return compile-time constants, and can be used to declare other variables if the return value is stored in a variable with the attribute PARAMETER. This allows te programmer to get the REAL, COMPLEX or INTEGER the algorithm needs numerically, without thinking about how big they need to be in bits. ISO_C_BINDING is a Fortran 2003 module which contains kind numbers corresponding to all C types, including size_t and void*, C structs, an attribute for using pass-by-value semantics, controlling the C name to avoid name mangling, as well as functions for converting between C and Fortran pointers. It allows portable interop between C and Fortran (either calling C from Fortran or calling Fortran from C). ISO_FORTRAN_ENV is a Fortran 2003 and 2008 module. In F2003 it contain kind numbers for integers with specified size: INT8, INT16, INT32, and INT64. In F2008 it also contains kind numbers for IEEE floating point types: REAL32, REAL64, and REAL128. The kind numbers for floating point types can also be used to declare complex numbers. So with modern Fortran we have a completely portable and unambiguous type system. C11/C++11 is sane as well, but not quite as sane as that of modern Fortran. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
So one more bit of anecdotal evidence: I just today revived some Cython code I wrote a couple years ago and haven't tested since. It wraps a C library that uses a lot of "int" typed values. Turns out I was passing in numpy arrays that I had typed as "np.int". It worked OK two years ago when I was testing only on 32 bit pythons, but today I got a bunch of failed tests on 64 bit OS-X -- a np.int is now a C long! I really thought I knew better, even a couple years ago, but I guess it's just too easy to slip up there. Yeah to Cython for keeping types straight (I got a run-time error). And Yeah to me for having at least some basic tests. But Boo to numpy for a very easy to confuse type API. -Chris Sent from my iPhone
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Of course, it's that a c long was a c int on the platform I wrote the code on the first time. Which is part of the problem with C -- if two types happen to be the same, the compiler is perfectly happy. But that was an error in the first place, it never should have passed. But that's just me. ;-) Anyway, as far as concrete proposals go. I say we deprecate the Python types in the numpy namespace (i.e int and float) Other than that, I'm not sure there's any problem. -Chris
![](https://secure.gravatar.com/avatar/abbe213fe63945165f7f3408cbc3bf48.jpg?s=120&d=mm&r=g)
-- Kind regards Nick Papior On 31 Jul 2015 17:53, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Thu, Jul 30, 2015 at 11:24 PM, Jason Newton <nevion@gmail.com> wrote:
This really needs changing though. scientific researchers don't catch
this subtlety and expect it to be just like the c and matlab types they know a little about.
well, C types are a %&$ nightmare as well! In fact, one of the biggest
issues comes from cPython's use of a C "long" for an integer -- which is not clearly defined. If you are writing code that needs any kind of binary compatibility, cross platform compatibility, and particularly if you want to be abel to distribute pre-compiled binaries of extensions, etc, then you'd better use well-defined types.
numpy has had well-defined types for ages, but it is a shame that it's so
easy to use the poorly-defined ones.
I can't even keep it straight in all circumstances, how can I expect
them to? This makes all the newcomers face the same pain and introduce more bugs into otherwise good code.
indeed.
+1 Change it now like ripping off a bandaid. Match C11/C++11 types and
solve much pain past present and future in exchange for a few lashings for the remainder of the year.
Sorry -- I'm not sure what C11 types are -- is "int", "long", etc,
deprecated? If so, then yes.
What about Fortan -- I've been out of that loop for ages -- does
semi-modern Fortran use well defined integer types? Yes, this is much like the c equivalent, integer is int, real is float, for long and double constant castings are needed.
Is it possible to deprecate a bunch of the built-in numpy dtypes? Without
annoying the heck out everyone -- because tehre is a LOT of code out there that just uses np.float, np.int, etc.....
![](https://secure.gravatar.com/avatar/3f3ddaf8c73d11ed6ab7ea905305eaf6.jpg?s=120&d=mm&r=g)
On Fri, Jul 31, 2015 at 5:19 PM, Nick Papior <nickpapior@gmail.com> wrote:
There was some truth to this but if you, like the majority of scientific researchers only produce code for x86 or x86_64 on windows and linux... as long as you aren't treating pointers as int's, everything behaves in accordance to general expectations. The standards did and still do allow for a bit of flux but things like OpenCL [ https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/scalarDataTypes.h... ] made this really strict so we stop writing ifdef's to deal with varying bitwidths and just implement the algorithms - which is typically a researcher’s top priority. I'd say I use the strongly defined types (e.g. int/float32) whenever doing protocol or communications work - it makes complete sense there. But often for computation, especially when interfacing with c extensions it makes more sense for the developer to use types/typenames that ought to match 1:1 with c in every case. -Jason
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Jul 24, 2015 08:55, "Julian Taylor" <jtaylor.debian@googlemail.com> wrote:
Because np.float systematically confuses people in a way that plain float does not. Which is problematic given that we have a lot of users who aren't expert programmers and are easily confused.
Yeah, I don't see any way to ever make dtype=int an error, though I can see an argument for making it unconditionally int64 or intp. That's a separate discussion... but every step we can make to simplify these names makes it easier to untangle the overall knot, IMHO. (E.g. if people have different expectations about what int and np.int should mean -- as they obviously do -- then changing the meaning of both of them is harder than deprecating one and then changing the other, so this deprecation puts us in a better position even if it doesn't immediately help much.)
So -1 from me.
Do you really mean this as a true veto? While some of the thread has gotten a bit confused about how much of a change we're actually talking about, AFAICT everyone else is very much in favor of this deprecation, including testimony from multiple specific users who have gotten burned. -n
![](https://secure.gravatar.com/avatar/c0da24f75f763b6bac90b519064f30b3.jpg?s=120&d=mm&r=g)
On 07/23/2015 04:29 AM, Nathaniel Smith wrote:
I don't see the issue. They are just aliases so how is np.float worse than just float? Too me this does not seem worth the bother of deprecation. An argument could be made for deprecating creating dtypes from python builtin types as they are ambiguous (C float != python float) and platform dependent. E.g. dtype=int is just an endless source of bugs. But this is also so invasive that the deprecation would never be completed and just be a bother to everyone. So -1 from me.
I doubt these numbers from 2012 are still correct. When this was last profiled last year the import there were two main offenders, add_docs and np.polynomial. Both have been fixed in 1.9. I don't recall np.testing even showing up.
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Julian Taylor <jtaylor.debian@googlemail.com> wrote:
I don't see the issue. They are just aliases so how is np.float worse than just float?
I have burned my fingers on it. Since np.double is a C double I assumed np.float is a C float. It is not. np.int has the same problem by being a C long. Pure evil. Most users of NumPy probably expect the np.foobar dtype to map to the corresponding foobar C type. This is actually inconsistent and plain dangerous. It would be much better if dtype=float meant Python float, dtype=np.float meant C float, dtype=int meant Python int, and dtype=np.int meant C int. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Fri, Jul 24, 2015 at 10:03 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
I must have too -- but I don't recall, because I am VERY careful about not using np.float, no.int, etc... but I do have to constantly evangelize and correct code others put in my code base. This really is very, very, ugly. we get away with np.float, because every OS/compiler that gets any regular use has np.float == a c double, which is always 64 bit. but, as Sturla points our, no.int being a C long is a disaster! So +inf on deprecating this, though I have no opinion about the mechanism. Ans sadly, it will be a long time before we can actually remove them, so the evangelizing and code reviews will need to co continue for a long time... -Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Chris Barker <chris.barker@noaa.gov> wrote:
we get away with np.float, because every OS/compiler that gets any regular use has np.float == a c double, which is always 64 bit.
Not if we are passing an array of np.float to a ac routine that expects float*, e.g. in OpenGL, BLAS or LAPACK. That will for sure give crazy results, just hang, or segfault. I got away with pisting a PR with a "bugfix" which supposedly should fix a case of precision loss in a SciPy routine, because I thought np.float was np.float32 and not np.float64 (which it is). But it did make me feel rather stupid. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sun, Jul 26, 2015 at 11:19 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
well, yes, it is confusing, but at least consistent. So if you use it once correctly in your Python-C transition code, it should work the same way everywhere. As opposed to a np.int which is a python int, which is (if I have this right): 32 bits on all (most) 32 bit platforms 64 bits on 64 bit Linux and OS-X 32 bits on 64 bit Windows (also if compiled by cygwin??) And who knows on a Cray or ARM, or??? Ouch!!! Anyway -- we agree on this -- having the python types in the numpy namespace is confusing and dangerous -- even if it will take forever to deprecate them! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Chris Barker <chris.barker@noaa.gov> wrote:
sizeof(long) is 8 on 64-bit Cygwin. This is to make sure it is inconsistent with MSVC and MinGW-w64, and make sure there will always be ABI mismatches unless the headerfiles are modified accordingly. OTOH, it is one only sane 64-bit compiler on Windows. You can actually take code written for 64 bit Linux or OSX and expect that it will work correctly. Sturla
![](https://secure.gravatar.com/avatar/3f3ddaf8c73d11ed6ab7ea905305eaf6.jpg?s=120&d=mm&r=g)
Been using numpy in it's various forms since like 2005. burned on int, int_ just today with boost.python / ndarray conversions and a number of times before that. intc being C's int!? Didn't even know it existed till today. This isn't the first time, esp with float. Bool is actually expected for me and I'd prefer it stay 1 byte for storage efficiency - I'll use a long if I want it machine word wide. This really needs changing though. scientific researchers don't catch this subtlety and expect it to be just like the c and matlab types they know a little about. I can't even keep it straight in all circumstances, how can I expect them to? This makes all the newcomers face the same pain and introduce more bugs into otherwise good code. +1 Change it now like ripping off a bandaid. Match C11/C++11 types and solve much pain past present and future in exchange for a few lashings for the remainder of the year. Thankfully stdint like types have existed for quite some times so protocol descriptions have been correct most of the time. -Jason On Fri, Jul 24, 2015 at 8:51 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
![](https://secure.gravatar.com/avatar/c0da24f75f763b6bac90b519064f30b3.jpg?s=120&d=mm&r=g)
On 31.07.2015 08:24, Jason Newton wrote:
A long is only machine word wide on posix, in windows its not. This nonsense is unfortunately also in numpy. It also affects dtype=int. The correct word size type is actually np.intp. btw. if something needs deprecating it is np.float128, this is the most confusing type name in all of numpy as its precision is actually a 80 bit in most cases (x86), 64 bit sometimes (arm) and very rarely actually 128 bit (sparc).
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
On 31/07/15 09:38, Julian Taylor wrote:
A long is only machine word wide on posix, in windows its not.
Actually it is the opposite. A pointer is 64 bit on AMD64, but the native integer and pointer offset is only 32 bit. But it does not matter because it is int that should be machine word sized, not long, which it is on both platforms. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sun, Aug 2, 2015 at 5:13 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
All this illustrates that there is a lot of platform independence and complexity to the "standard" C types. I suppose it's a good thing -- you can use something like "int" in C code, and presto! more precision in the future when you re-compile on a newer system. However, for any code that needs some kind of binary compatibility between systems (or is dynamic, like python -- i.e. types are declared at run-time, not compile time), the "fixed width types are a lot safer (or at least easier to reason about). So we have tow issue with numpy: 1) confusing python types with C types -- e.g. np.int is currently a python integer, NOT a C int -- I think this is a litte too confusing, and should be depricated. (and np.long -- even more confusing!!!) 2) The vagaries of the standard C types: int, long, etc (spelled np.intc, which is a int32 on my machine, anyway) [NOTE: is there a C long dtype? I can't find it at the moment...] It's probably a good idea to keep these, particularly for interfacing with C code (like my example of calling C code that use int). Though it would be good to make sure the docstring make it clear what they are. However, I"d like to see a recommended practice of using sized types wherevver you can: uint8 int32 float32 float54 etc.... not sure how to propagate that practice, but I'd love to see it become common. Should we add aliases for the stdint names? np.int_32_t, etc??? might be good to adhere to an established standard. -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 08/03/2015 12:25 PM, Chris Barker wrote:
Numpy does define "the platform dependent C integer types short, long, longlong and their unsigned versions" according to the docs. size_t is the same size as intc. Even though float and double are virtually always IEEE single and double precision, maybe for consistency we should also define np.floatc, np.doublec and np.longdoublec? Allan
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, Aug 3, 2015 at 11:05 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
On 03/08/15 18:25, Chris Barker wrote:
[NOTE: is there a C long dtype? I can't find it at the moment...]
There is, it is called np.int.
well, IIUC, np.int is the python integer type, which is a C long in all the implemtations of cPython that I know about -- but is that a guarantee? in the future as well? For instance, if it were up to me, I'd use an int_64_t on all 64 bit platforms, rather than having that odd 32 bit on Windows, 64 bit on *nix silliness.... This just illustrates the problem... So another minor proposal: add a numpy.longc type, which would be platform C long. (and probably just an alias to something already there). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
On 03/08/15 20:51, Chris Barker wrote:
It is a Python int on Python 2. On Python 3 dtype=np.int means the dtype will be C long, because a Python int has no size limit. But np.int aliases Python int. And creating an array with dype=int therefore does not create an array of Python int, it creates an array of C long. To actually get dtype=int we have to write dtype=object, which is just crazy. Sturla
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote:
Since it seemes there may be a few half truths flying around in this thread. See http://docs.scipy.org/doc/numpy/user/basics.types.html and also note the sentence below the table (maybe the table should also note these): Additionally to intc the platform dependent C integer types short, long, longlong and their unsigned versions are defined. - Sebastian
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Tue, Aug 4, 2015 at 4:39 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Quote: "Note that, above, we use the *Python* float object as a dtype. NumPy knows that int refers to np.int_, bool meansnp.bool_, that float is np.float_ and complex is np.complex_. The other data-types do not have Python equivalents." Is there a conflict with the current thread? Josef (I'm not a C person, so most of this is outside my scope, except for watching bugfixes to make older code work for larger datasets. Use `intp`, Luke.)
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Jul 30, 2015 at 11:24 PM, Jason Newton <nevion@gmail.com> wrote:
well, C types are a %&$ nightmare as well! In fact, one of the biggest issues comes from cPython's use of a C "long" for an integer -- which is not clearly defined. If you are writing code that needs any kind of binary compatibility, cross platform compatibility, and particularly if you want to be abel to distribute pre-compiled binaries of extensions, etc, then you'd better use well-defined types. numpy has had well-defined types for ages, but it is a shame that it's so easy to use the poorly-defined ones. I can't even keep it straight in all circumstances, how can I expect them
to? This makes all the newcomers face the same pain and introduce more bugs into otherwise good code.
indeed.
Sorry -- I'm not sure what C11 types are -- is "int", "long", etc, deprecated? If so, then yes. What about Fortan -- I've been out of that loop for ages -- does semi-modern Fortran use well defined integer types? Is it possible to deprecate a bunch of the built-in numpy dtypes? Without annoying the heck out everyone -- because tehre is a LOT of code out there that just uses np.float, np.int, etc..... An argument could be made for deprecating creating dtypes from python
yeah, that is a big concern. :-( -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Chris Barker <chris.barker@noaa.gov> wrote:
What about Fortan -- I've been out of that loop for ages -- does semi-modern Fortran use well defined integer types?
Modern Fortran is completely sane. INTEGER without kind number (Fortran 77) is the fastest integer on the CPU. On AMD64 that is 32 bit, because it is designed to use a 64 bit pointer with a 32 bit offset. (That is also why Microsoft decided to use a 32 bit long, because it by definition is the fastest integer of at least 32 bits. One can actually claim that the C standard is violated with a 64 bit long on AMD64.) Because of this we use a 32 bit interger in BLAS and LAPACK linked to NumPy and SciPy. The function KIND (Fortran 90) allows us to query the kind number of a given variable, e.g. to find out the size of INTEGER and REAL. The function SELECTED_INT_KIND (Fortran 90) returns the kind number of smallest integer with a specified range. The function SELECTED_REAL_KIND (Fortran 90) returns the kind number of smallest float with a given range and precision. THe returned kind number can be used for REAL and COMPLEX. KIND, SELECTED_INT_KIND and SELECTED_REAL_KIND will all return compile-time constants, and can be used to declare other variables if the return value is stored in a variable with the attribute PARAMETER. This allows te programmer to get the REAL, COMPLEX or INTEGER the algorithm needs numerically, without thinking about how big they need to be in bits. ISO_C_BINDING is a Fortran 2003 module which contains kind numbers corresponding to all C types, including size_t and void*, C structs, an attribute for using pass-by-value semantics, controlling the C name to avoid name mangling, as well as functions for converting between C and Fortran pointers. It allows portable interop between C and Fortran (either calling C from Fortran or calling Fortran from C). ISO_FORTRAN_ENV is a Fortran 2003 and 2008 module. In F2003 it contain kind numbers for integers with specified size: INT8, INT16, INT32, and INT64. In F2008 it also contains kind numbers for IEEE floating point types: REAL32, REAL64, and REAL128. The kind numbers for floating point types can also be used to declare complex numbers. So with modern Fortran we have a completely portable and unambiguous type system. C11/C++11 is sane as well, but not quite as sane as that of modern Fortran. Sturla
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
So one more bit of anecdotal evidence: I just today revived some Cython code I wrote a couple years ago and haven't tested since. It wraps a C library that uses a lot of "int" typed values. Turns out I was passing in numpy arrays that I had typed as "np.int". It worked OK two years ago when I was testing only on 32 bit pythons, but today I got a bunch of failed tests on 64 bit OS-X -- a np.int is now a C long! I really thought I knew better, even a couple years ago, but I guess it's just too easy to slip up there. Yeah to Cython for keeping types straight (I got a run-time error). And Yeah to me for having at least some basic tests. But Boo to numpy for a very easy to confuse type API. -Chris Sent from my iPhone
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Of course, it's that a c long was a c int on the platform I wrote the code on the first time. Which is part of the problem with C -- if two types happen to be the same, the compiler is perfectly happy. But that was an error in the first place, it never should have passed. But that's just me. ;-) Anyway, as far as concrete proposals go. I say we deprecate the Python types in the numpy namespace (i.e int and float) Other than that, I'm not sure there's any problem. -Chris
![](https://secure.gravatar.com/avatar/abbe213fe63945165f7f3408cbc3bf48.jpg?s=120&d=mm&r=g)
-- Kind regards Nick Papior On 31 Jul 2015 17:53, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Thu, Jul 30, 2015 at 11:24 PM, Jason Newton <nevion@gmail.com> wrote:
This really needs changing though. scientific researchers don't catch
this subtlety and expect it to be just like the c and matlab types they know a little about.
well, C types are a %&$ nightmare as well! In fact, one of the biggest
issues comes from cPython's use of a C "long" for an integer -- which is not clearly defined. If you are writing code that needs any kind of binary compatibility, cross platform compatibility, and particularly if you want to be abel to distribute pre-compiled binaries of extensions, etc, then you'd better use well-defined types.
numpy has had well-defined types for ages, but it is a shame that it's so
easy to use the poorly-defined ones.
I can't even keep it straight in all circumstances, how can I expect
them to? This makes all the newcomers face the same pain and introduce more bugs into otherwise good code.
indeed.
+1 Change it now like ripping off a bandaid. Match C11/C++11 types and
solve much pain past present and future in exchange for a few lashings for the remainder of the year.
Sorry -- I'm not sure what C11 types are -- is "int", "long", etc,
deprecated? If so, then yes.
What about Fortan -- I've been out of that loop for ages -- does
semi-modern Fortran use well defined integer types? Yes, this is much like the c equivalent, integer is int, real is float, for long and double constant castings are needed.
Is it possible to deprecate a bunch of the built-in numpy dtypes? Without
annoying the heck out everyone -- because tehre is a LOT of code out there that just uses np.float, np.int, etc.....
![](https://secure.gravatar.com/avatar/3f3ddaf8c73d11ed6ab7ea905305eaf6.jpg?s=120&d=mm&r=g)
On Fri, Jul 31, 2015 at 5:19 PM, Nick Papior <nickpapior@gmail.com> wrote:
There was some truth to this but if you, like the majority of scientific researchers only produce code for x86 or x86_64 on windows and linux... as long as you aren't treating pointers as int's, everything behaves in accordance to general expectations. The standards did and still do allow for a bit of flux but things like OpenCL [ https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/scalarDataTypes.h... ] made this really strict so we stop writing ifdef's to deal with varying bitwidths and just implement the algorithms - which is typically a researcher’s top priority. I'd say I use the strongly defined types (e.g. int/float32) whenever doing protocol or communications work - it makes complete sense there. But often for computation, especially when interfacing with c extensions it makes more sense for the developer to use types/typenames that ought to match 1:1 with c in every case. -Jason
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Jul 24, 2015 08:55, "Julian Taylor" <jtaylor.debian@googlemail.com> wrote:
Because np.float systematically confuses people in a way that plain float does not. Which is problematic given that we have a lot of users who aren't expert programmers and are easily confused.
Yeah, I don't see any way to ever make dtype=int an error, though I can see an argument for making it unconditionally int64 or intp. That's a separate discussion... but every step we can make to simplify these names makes it easier to untangle the overall knot, IMHO. (E.g. if people have different expectations about what int and np.int should mean -- as they obviously do -- then changing the meaning of both of them is harder than deprecating one and then changing the other, so this deprecation puts us in a better position even if it doesn't immediately help much.)
So -1 from me.
Do you really mean this as a true veto? While some of the thread has gotten a bit confused about how much of a change we're actually talking about, AFAICT everyone else is very much in favor of this deprecation, including testimony from multiple specific users who have gotten burned. -n
participants (10)
-
Allan Haldane
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Jason Newton
-
josef.pktd@gmail.com
-
Julian Taylor
-
Nathaniel Smith
-
Nick Papior
-
Sebastian Berg
-
Sturla Molden