
i have a simple array with floating values
print a array([ 0. , 0.2, 0.4, 0.6, 0.8])
what i need to do is change each index into another dimension with 3 indices. hmm, kind of confusing, an example will explain better
ok, this does the job, but once i start using large arrays, the back-n-forth conversion between arrays and python lists is costing me quite a bit. is there a way i can reshape the array this way without the call to python's 'zip'? (no, 'map' doesn't count either) i've tried much fooling with NewAxis, resize, and friends. but either i haven't stumbled upon the correct combination or i'm not trying the correct tools. i'd like to keep this as quick as possible, so hopefully it can be done without anything too elaborate. thanks (doh, sorry for the previous distutils related post... hit the wrong mailing list)

Your mail is in some windows format :-) reshape(resize(a,(1,15)),(5,3)) ,should do it, perhaps reshape makes a new copy, to circumvent this you can do it in two steps. b=resize(a,(1,15)) b.shape = (5,3) Another way is ones((5,3), a.typecode())*a[:,NewAxis] HTH, __Janko

There are probably many solutions, my preferred one is repeat(a[:, NewAxis], [3], 1) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

I have been working on my first C extension, using NumPy arrays, and have run into a problem. I have a very mysterious set of bugs, that result in a segmentation fault and core dump. Frankly it's still mysterious, but at the moment the problem seems to be that I have passed in a list of tuples, rather than a list of PyArrayObjects. I don't expect this to work, but when I put a : site = PyList_GetItem(spam_list, index); if (! PyArray_Check(spam)) result = 0; I shouldn't get a crash!! (and it does crash at this line, as near as I can tell) Isn't that exactly what PyArray_Check is for?? Note: with: spam = PyList_GetItem(spam_list, index); I get a "warning: assignment from incompatible pointer type " which goes away if I typecast it: spam = (PyArrayObject *) PyList_GetItem(spam_list, index); Should I be doing this, and should PyArray_Check(spam) work either way? I'm using Redhat Linux 6.1, Python 1.5.2, and python-numpy-1.11-2.i386.rpm Also: is there a compelling reason to upgrade either Python of NumPy, from the NumPy perspective? How are NumPy and 2.0 working together? Thanks, -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

Not "site"?
I shouldn't get a crash!! (and it does crash at this line, as near as I can tell)
If spam is NULL, then the check will crash. If it is anything but a pointer to a Python object, there's a good chance that it might crash. PyArray_Check just tells you if a given Python object is of type "array". A look at the value of "spam" with a debugger should tell you what's going on.
It works either way, but I am not sure that you are supposed to rely on that. I prefer to use a cast to array objects only *after* verifying that I have an array object, if only for clarity. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
oops! I thought I had Pythonized everything by putting spam everywhere.
Fair enough. In this case, I was getting spam from a Python list. Is it possible for it to be anything but a pointer to a Python object?
A look at the value of "spam" with a debugger should tell you what's going on.
I have not yet figured out how to use a de-bugger with a python extension. Does anyone know of any nifty tutorial for how to do this (with gdb, preferably)? note that I'm a newby to C too, so I need a really basic run through.
That sounds like sound advice. I do feel like I'm doing something wrong if I get a warning form the compiler however. Thanks for your help. -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

I used gdb with gcc under Linux, and I just use it as I would on any other program: gdb /usr/bin/python run spam.py and wait for the crash. Of course your module should have been compiled with -g if you want symbolic debugging. The only difficulty with debugging extension modules built as dynamic libraries is setting breakpoints. You can't do it right away because the module isn't loaded yet, so its symbols are unknown to gdb. My solution is to start my script with the import, followed immediately by "time.sleep(5)". That gives me five seconds to press Control-C to reenter the debugger, and from then on everything works as advertised. If someone knows a more elegant solution, please let me know! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Hi all, I started a recent dission on c.l.p recently (http://mail.python.org/pipermail/python-list/2001-September/063502.html), and it brought up an interesting idea. In general, I'd like to see Python become more array-oriented, which PEP 209 and friends will help with. I want this because it provides a natural and effecient way of expressing your program, and because there are large oportunities for performance enhancements. WHile I make a lot of use of NumPy arrays at the moment, and get substantial performance benefits when I can use the overloaded operators and ufuncs, etc. Python itself still doesn't know that a NumPy array is a homogenous sequence (or might be), because it has no concept of such a thing. If the Python interpreter knew that it was homogenous, there could be a lot of opportunities for performance enhancements. In the above stated thread, I suggested the addition of a "homogenous" flag to sequences. I havn't gotten an enthusiastic response, but Skip Montanaro suggested: """ One approach might be to propose that the array object be "brought into the fold" as a builtin object and modifications be made to the virtual machine so that homogeneity can be assumed when operating on arrays. """ PEP 209 does propose that an array object be "brought into the fold" (or does it? would it be a builtin?, if not, at least being part of the standard library would be a help ), but it makes no mention of any modifications to the virtual machine that would allow optimisations outside of Numeric2 defined functions. Is this something worth adding? I understand that it is one thing for the homogenous sequence (or array) to be acknowledged in the VM, and quite another for actual optimizations to be written, but we need the former before we can get the latter. What do you all think?? -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

Such as? Given the overall dynamic nature of Python, I don't see any real opportunities outside array-specific code. What optimizations could be done knowing *only* that all elements of a sequence are of the same type, but without a particular data layout? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
I remember Guido's answer to a FAQ a while ago (the current FAQ has a more terse version) that essentially stated that compiled Python wouldn't be much faster because of Python's dynamic nature, so that the expression x = y*z needs to be evaluated to see what type y is, what type z is, and what it means to multiply them, before the actual multiplication can take place. All that checking takes a whole lot longer than the multiplication itself. NumPy, or course, helps this along, because once it is determined that y and z are both NumPy arrays of the same shape and type, it can multiply all the elements in a C loop, without any further type checking. Where this breaks down, of course, is when you can't express what you want to do as a set of array functions. Once you learn the tricks these times are fairly rare (which is why NumPy is useful), but there are times when you can't use array-math, and need to loop through the elements of an array and operate on them. Python currently has to type check each of those elements every time something is done on them. In principle, if the Python VM knew that A was an array of Floats, it would know that A[i,j,k] was a Float, and it would not have to do a check. I think it would be easiest to get optimization in sequence-oriented operations, such as map() and list comprehensions: map(fun, A) when this is called, the function bound to the "fun" name is passed to map, as is the array bound to the name "A". The Array is known to be homogeous. map could conceivably compile a version of fun that worked on the type of the items in A, and then apply that function to all the elements in A, without type checking, and looping at the speed of C. This is the kind of optimization I am imagining. Kind of an instant U-func. Something similar could be done with list comprehensions. Of course, most things that can be done with list comprehensions and map() can be done with array operators anyway, so the real advantage would be a smarter compiler that could do that kind of thing inside a bunch of nested for loops. There is at least one project heading in that direction: * Psyco (Armin Rego) - a run-time specializing virtual machine that sees what sorts of types are input to a function and compiles a type- or value-specific version of that function on-the-fly. I believe Armin is looking at some JIT code generators in addition to or instead of another virtual machine. knowing that all the elements of an Array (or any other sequence) are the same type could help here a lot. Once a particular function was compiled with a given set of types, it could be called directly on all the elements of that array (and other arrays) with no type checking. What it comes down to is that while Python's dynamic nature is wonderful, and very powerful and flexible, there are many, many, times that it is not really needed, particularly inside a given small function. The standard answer about Python optimization is that you just need to write those few small functions in C. This is only really practical if they are functions that operate on particular expected inputs: essentially statically typed input (or at least not totally general). Even then, it is a substantial effort, even for those with extensive C experience (unlike me). It would be so much better if a Py2C or a JIT compiler could optimize those functions for you. I know this is a major programming effort, and will, at best, be years away, but I'd like to see Python move in a direction that makes it easier to do, and allows small steps to be done at a time. I think introducing the concept of a homogenous sequence could help a lot of these efforts. Numeric Arrays would be just a single example. Python already has strings, and tuples could be marked as homogenous when created, if they were. So could lists, but since they are mutable, their homogenaity could change at any time, so it might not be useful. I may be wrong about all this, I really don't know a whit about writing compilers or interpreters, or VMs, but I'm throughing around the idea, to see what I can learn, and see if it makes any sense. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

Right, but that would only work for types that are specially treated by the interpreter. Just knowing that all elements are of the same type is not enough. In fact, the VM does not do any check, it just looks up type-specific pointers and calls the relevant functions. The other question is how much effort the Python developers would be willing to spend on this, it looks like a very big job to me, in fact a reimplementation of the interpreter.
Yes, that could be done, provided there is also a means for compiling type-specific versions of a function.
I'd say that adding this feature is much less work than doing even the slightest bit of optimization. I know I am sounding pessimistic here, but, well, I am... Konrad. -- -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

I have a narge array of type "1" (single bytes). I need to convert it to Int32, in the manner that fromstring() would. Right now, I am doing: Array = fromstring(Array.tostring(),'f') This works fine, but what concerns me is that I need to do this on potentially HUGE arrays, and if I understand this right, I am going to create a copy with tostring, and then another copy with fromstring, that then gets referenced to Array, at which point the first original copy gets de-referenced, and should be deleted, and the temporary one gets deleted at some point in this process. I don't know when stuff created in the middle of a statement gets deleted, so I could potentially have three copies of the data around at the same time, and at least two. Since it is exactly the same C array, I'd like to be able to do this without making any copies at all. Is it possible? It seems like it should be a simple matter of changing the typecode and shape, but is this possible? While I'm asking questions: can I byteswap in place as well? The greater problem: To give a little background, and to see if anyone has a better idea of how to do what I am doing, I thought I'd run through the task that I really need to do. I am reading a binary file full of a lot of data. I have some control over the form of the file, but it needs to be compact, so I can't just make everything the same large type. The file is essentially a whole bunch of records, each of which is a collection of a couple of different types, and which I would eventually like to get into a couple of NumPy arrays. My first cut at the problem was to read each record one at a time in a loop, and use the struct module to convert everything. This worked fine, but was pretty darn slow, so I am now doing it all with NumPy like this (one example, I have more complex ones): num_bytes = 9 # number of bytes in a record: two longs and a char # read all the data into a single byte array data = fromstring(file.read(num_bytes*num_timesteps*num_LEs),'1') # rearrange into 3-d array data.shape = (num_timesteps,num_LEs,num_bytes) # extract LE data: LEs = data[:,:,:8] # extract flag data flags = data[:,:,8] # convert LE data to longs LEs = fromstring(LEs.tostring(),Int32) if endian == 'L': # byteswap if required LEs = LEs.byteswapped() # convert to 3-d array LEs.shape = (num_timesteps,num_LEs,2) Anyone have any better ideas on how to do this? Thanks, -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

Use the numpyio module from Travis. With this it should be possible to read the data directly and do any typecode conversion you want with. It has fread and fwrite functions, and it can be used with any NumPy type like Int0 in your case. It's part of the signaltools package. http://oliphant.netpedia.net/signaltools_0.5.2.tar.gz HTH, __Janko

Janko Hauser wrote:
I've downloaded it , and it looks pretty handy. It does include a byteswap-in-place, which I need. What is not clear to me from the minimal docs is whether I can read file set up like: long long char long long char .... and have it put the longs into one array, and the chars into another. Also, It wasn't clear whether I could put use it to read a file that has already been opened, starting at the file's current position. I am working with a file that has a text header, so I can't just suck in the whole thing until I've parsed out the header. I can figure out the answer to these questions with some reding of the source, but it wasn't obvious at first glance, so it would be great if someone knows the answer off the top of there head. Travis? By the way, there seem to be a few methods that produce a copy, rather than doing things in place, where it seems more intuitive to do it in place. byteswapped() and astype() come to mind. With byteswapped, I imagine it's rare that you would want to keep a copy around. With astype it would also be rare to keep a copy around, but since it changes the size of the array, I imagine it would be a lot harder to code as an in-place operation. Is there a reason these operations are not available in-place? or is it just that no one has seen enough of a need to write the code. -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net]On Behalf Of Chris Barker Sent: Thursday, November 02, 2000 1:46 PM Cc: numpy-discussion@sourceforge.net Subject: [Numpy-discussion] "formstring()" in place? I have a narge array of type "1" (single bytes). I need to convert it to Int32, in the manner that fromstring() would. Right now, I am doing: Array = fromstring(Array.tostring(),'f') This works fine, but what concerns me is that I need to do this on potentially HUGE arrays, and if I understand this right, I am going to create a copy with tostring, and then another copy with fromstring, that then gets referenced to Array, at which point the first original copy gets de-referenced, and should be deleted, and the temporary one gets deleted at some point in this process. I don't know when stuff created in the middle of a statement gets deleted, so I could potentially have three copies of the data around at the same time, and at least two. Since it is exactly the same C array, I'd like to be able to do this without making any copies at all. Is it possible? It seems like it should be a simple matter of changing the typecode and shape, but is this possible? While I'm asking questions: can I byteswap in place as well? The greater problem: To give a little background, and to see if anyone has a better idea of how to do what I am doing, I thought I'd run through the task that I really need to do. I am reading a binary file full of a lot of data. I have some control over the form of the file, but it needs to be compact, so I can't just make everything the same large type. The file is essentially a whole bunch of records, each of which is a collection of a couple of different types, and which I would eventually like to get into a couple of NumPy arrays. My first cut at the problem was to read each record one at a time in a loop, and use the struct module to convert everything. This worked fine, but was pretty darn slow, so I am now doing it all with NumPy like this (one example, I have more complex ones): num_bytes = 9 # number of bytes in a record: two longs and a char # read all the data into a single byte array data = fromstring(file.read(num_bytes*num_timesteps*num_LEs),'1') # rearrange into 3-d array data.shape = (num_timesteps,num_LEs,num_bytes) # extract LE data: LEs = data[:,:,:8] # extract flag data flags = data[:,:,8] # convert LE data to longs LEs = fromstring(LEs.tostring(),Int32) if endian == 'L': # byteswap if required LEs = LEs.byteswapped() # convert to 3-d array LEs.shape = (num_timesteps,num_LEs,2) Anyone have any better ideas on how to do this? Thanks, -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/numpy-discussion

"Paul F. Dubois" wrote:
Actually, this is exactly NOT what I want to do. In this case, each 1 byte interger was converted to a 4byte integer, of the same VALUE. What I want is to convert each SET of four bytes into a SINGLE 4 byte integer as it:
The four one byte items in a are turned into one four byte item. What I want is to be able to do this in place, rather than have tostring() create a copy. I think fromstring may create a copy as well, having a possible total of three copies around at once. Does anyone know how many copies will be around at once with this line of code? -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

Chris Barker writes:
A brut force way would be to do the transformation yourself :-)
So you need to reshape your array into (?,4) and multiply by bits. And regarding your numpyio question, you can also read characters, which are then put into an array by itself. It seems you have a very messy file format (but the data world is never easy) HTH, __Janko -- Institut fuer Meereskunde phone: 49-431-597 3989 Dept. Theoretical Oceanography fax : 49-431-565876 Duesternbrooker Weg 20 email: jhauser@ifm.uni-kiel.de 24105 Kiel, Germany

Sorry I forgot to mention that these two operations can be done inplace, but the result can not be stored inplace, as the shape is changing. So you need to live with one copy, if your array a is of type 'i'.
HTH, __Janko multiply(array([1,2,3,4]),bits,a)

Hi all, The MATLAB Digest just put out a little article about array indexing in MATLAB. I thought some of yo might find it interesting, and it might give some ideas for NumPy2. Most of what MATLAB has, NumPy has an equivalent, but I would love to see what matlab calls vector indexing, and a more natural way to do masks. I know vector indexing would be a pretty tricky thing to have work, at least with slices being references and all, but it would be a very nice thing!! Perhaps some brilliant person can figure out an elegant and efficient way to do it. Here is where you eill find the article: http://www.mathworks.com/company/digest/sept01/matrix.shtml -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

OOPS! I replied to an arbitray message to get the address,and I forgot to change the subject, so here is the same message I just posted, but with an appropriate subject. Hi all, The MATLAB Digest just put out a little article about array indexing in MATLAB. I thought some of yo might find it interesting, and it might give some ideas for NumPy2. Most of what MATLAB has, NumPy has an equivalent, but I would love to see what matlab calls vector indexing, and a more natural way to do masks. I know vector indexing would be a pretty tricky thing to have work, at least with slices being references and all, but it would be a very nice thing!! Perhaps some brilliant person can figure out an elegant and efficient way to do it. Here is where you will find the article: http://www.mathworks.com/company/digest/sept01/matrix.shtml -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

Your mail is in some windows format :-) reshape(resize(a,(1,15)),(5,3)) ,should do it, perhaps reshape makes a new copy, to circumvent this you can do it in two steps. b=resize(a,(1,15)) b.shape = (5,3) Another way is ones((5,3), a.typecode())*a[:,NewAxis] HTH, __Janko

There are probably many solutions, my preferred one is repeat(a[:, NewAxis], [3], 1) Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

I have been working on my first C extension, using NumPy arrays, and have run into a problem. I have a very mysterious set of bugs, that result in a segmentation fault and core dump. Frankly it's still mysterious, but at the moment the problem seems to be that I have passed in a list of tuples, rather than a list of PyArrayObjects. I don't expect this to work, but when I put a : site = PyList_GetItem(spam_list, index); if (! PyArray_Check(spam)) result = 0; I shouldn't get a crash!! (and it does crash at this line, as near as I can tell) Isn't that exactly what PyArray_Check is for?? Note: with: spam = PyList_GetItem(spam_list, index); I get a "warning: assignment from incompatible pointer type " which goes away if I typecast it: spam = (PyArrayObject *) PyList_GetItem(spam_list, index); Should I be doing this, and should PyArray_Check(spam) work either way? I'm using Redhat Linux 6.1, Python 1.5.2, and python-numpy-1.11-2.i386.rpm Also: is there a compelling reason to upgrade either Python of NumPy, from the NumPy perspective? How are NumPy and 2.0 working together? Thanks, -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

Not "site"?
I shouldn't get a crash!! (and it does crash at this line, as near as I can tell)
If spam is NULL, then the check will crash. If it is anything but a pointer to a Python object, there's a good chance that it might crash. PyArray_Check just tells you if a given Python object is of type "array". A look at the value of "spam" with a debugger should tell you what's going on.
It works either way, but I am not sure that you are supposed to rely on that. I prefer to use a cast to array objects only *after* verifying that I have an array object, if only for clarity. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
oops! I thought I had Pythonized everything by putting spam everywhere.
Fair enough. In this case, I was getting spam from a Python list. Is it possible for it to be anything but a pointer to a Python object?
A look at the value of "spam" with a debugger should tell you what's going on.
I have not yet figured out how to use a de-bugger with a python extension. Does anyone know of any nifty tutorial for how to do this (with gdb, preferably)? note that I'm a newby to C too, so I need a really basic run through.
That sounds like sound advice. I do feel like I'm doing something wrong if I get a warning form the compiler however. Thanks for your help. -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

I used gdb with gcc under Linux, and I just use it as I would on any other program: gdb /usr/bin/python run spam.py and wait for the crash. Of course your module should have been compiled with -g if you want symbolic debugging. The only difficulty with debugging extension modules built as dynamic libraries is setting breakpoints. You can't do it right away because the module isn't loaded yet, so its symbols are unknown to gdb. My solution is to start my script with the import, followed immediately by "time.sleep(5)". That gives me five seconds to press Control-C to reenter the debugger, and from then on everything works as advertised. If someone knows a more elegant solution, please let me know! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Hi all, I started a recent dission on c.l.p recently (http://mail.python.org/pipermail/python-list/2001-September/063502.html), and it brought up an interesting idea. In general, I'd like to see Python become more array-oriented, which PEP 209 and friends will help with. I want this because it provides a natural and effecient way of expressing your program, and because there are large oportunities for performance enhancements. WHile I make a lot of use of NumPy arrays at the moment, and get substantial performance benefits when I can use the overloaded operators and ufuncs, etc. Python itself still doesn't know that a NumPy array is a homogenous sequence (or might be), because it has no concept of such a thing. If the Python interpreter knew that it was homogenous, there could be a lot of opportunities for performance enhancements. In the above stated thread, I suggested the addition of a "homogenous" flag to sequences. I havn't gotten an enthusiastic response, but Skip Montanaro suggested: """ One approach might be to propose that the array object be "brought into the fold" as a builtin object and modifications be made to the virtual machine so that homogeneity can be assumed when operating on arrays. """ PEP 209 does propose that an array object be "brought into the fold" (or does it? would it be a builtin?, if not, at least being part of the standard library would be a help ), but it makes no mention of any modifications to the virtual machine that would allow optimisations outside of Numeric2 defined functions. Is this something worth adding? I understand that it is one thing for the homogenous sequence (or array) to be acknowledged in the VM, and quite another for actual optimizations to be written, but we need the former before we can get the latter. What do you all think?? -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

Such as? Given the overall dynamic nature of Python, I don't see any real opportunities outside array-specific code. What optimizations could be done knowing *only* that all elements of a sequence are of the same type, but without a particular data layout? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
I remember Guido's answer to a FAQ a while ago (the current FAQ has a more terse version) that essentially stated that compiled Python wouldn't be much faster because of Python's dynamic nature, so that the expression x = y*z needs to be evaluated to see what type y is, what type z is, and what it means to multiply them, before the actual multiplication can take place. All that checking takes a whole lot longer than the multiplication itself. NumPy, or course, helps this along, because once it is determined that y and z are both NumPy arrays of the same shape and type, it can multiply all the elements in a C loop, without any further type checking. Where this breaks down, of course, is when you can't express what you want to do as a set of array functions. Once you learn the tricks these times are fairly rare (which is why NumPy is useful), but there are times when you can't use array-math, and need to loop through the elements of an array and operate on them. Python currently has to type check each of those elements every time something is done on them. In principle, if the Python VM knew that A was an array of Floats, it would know that A[i,j,k] was a Float, and it would not have to do a check. I think it would be easiest to get optimization in sequence-oriented operations, such as map() and list comprehensions: map(fun, A) when this is called, the function bound to the "fun" name is passed to map, as is the array bound to the name "A". The Array is known to be homogeous. map could conceivably compile a version of fun that worked on the type of the items in A, and then apply that function to all the elements in A, without type checking, and looping at the speed of C. This is the kind of optimization I am imagining. Kind of an instant U-func. Something similar could be done with list comprehensions. Of course, most things that can be done with list comprehensions and map() can be done with array operators anyway, so the real advantage would be a smarter compiler that could do that kind of thing inside a bunch of nested for loops. There is at least one project heading in that direction: * Psyco (Armin Rego) - a run-time specializing virtual machine that sees what sorts of types are input to a function and compiles a type- or value-specific version of that function on-the-fly. I believe Armin is looking at some JIT code generators in addition to or instead of another virtual machine. knowing that all the elements of an Array (or any other sequence) are the same type could help here a lot. Once a particular function was compiled with a given set of types, it could be called directly on all the elements of that array (and other arrays) with no type checking. What it comes down to is that while Python's dynamic nature is wonderful, and very powerful and flexible, there are many, many, times that it is not really needed, particularly inside a given small function. The standard answer about Python optimization is that you just need to write those few small functions in C. This is only really practical if they are functions that operate on particular expected inputs: essentially statically typed input (or at least not totally general). Even then, it is a substantial effort, even for those with extensive C experience (unlike me). It would be so much better if a Py2C or a JIT compiler could optimize those functions for you. I know this is a major programming effort, and will, at best, be years away, but I'd like to see Python move in a direction that makes it easier to do, and allows small steps to be done at a time. I think introducing the concept of a homogenous sequence could help a lot of these efforts. Numeric Arrays would be just a single example. Python already has strings, and tuples could be marked as homogenous when created, if they were. So could lists, but since they are mutable, their homogenaity could change at any time, so it might not be useful. I may be wrong about all this, I really don't know a whit about writing compilers or interpreters, or VMs, but I'm throughing around the idea, to see what I can learn, and see if it makes any sense. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

Right, but that would only work for types that are specially treated by the interpreter. Just knowing that all elements are of the same type is not enough. In fact, the VM does not do any check, it just looks up type-specific pointers and calls the relevant functions. The other question is how much effort the Python developers would be willing to spend on this, it looks like a very big job to me, in fact a reimplementation of the interpreter.
Yes, that could be done, provided there is also a means for compiling type-specific versions of a function.
I'd say that adding this feature is much less work than doing even the slightest bit of optimization. I know I am sounding pessimistic here, but, well, I am... Konrad. -- -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

I have a narge array of type "1" (single bytes). I need to convert it to Int32, in the manner that fromstring() would. Right now, I am doing: Array = fromstring(Array.tostring(),'f') This works fine, but what concerns me is that I need to do this on potentially HUGE arrays, and if I understand this right, I am going to create a copy with tostring, and then another copy with fromstring, that then gets referenced to Array, at which point the first original copy gets de-referenced, and should be deleted, and the temporary one gets deleted at some point in this process. I don't know when stuff created in the middle of a statement gets deleted, so I could potentially have three copies of the data around at the same time, and at least two. Since it is exactly the same C array, I'd like to be able to do this without making any copies at all. Is it possible? It seems like it should be a simple matter of changing the typecode and shape, but is this possible? While I'm asking questions: can I byteswap in place as well? The greater problem: To give a little background, and to see if anyone has a better idea of how to do what I am doing, I thought I'd run through the task that I really need to do. I am reading a binary file full of a lot of data. I have some control over the form of the file, but it needs to be compact, so I can't just make everything the same large type. The file is essentially a whole bunch of records, each of which is a collection of a couple of different types, and which I would eventually like to get into a couple of NumPy arrays. My first cut at the problem was to read each record one at a time in a loop, and use the struct module to convert everything. This worked fine, but was pretty darn slow, so I am now doing it all with NumPy like this (one example, I have more complex ones): num_bytes = 9 # number of bytes in a record: two longs and a char # read all the data into a single byte array data = fromstring(file.read(num_bytes*num_timesteps*num_LEs),'1') # rearrange into 3-d array data.shape = (num_timesteps,num_LEs,num_bytes) # extract LE data: LEs = data[:,:,:8] # extract flag data flags = data[:,:,8] # convert LE data to longs LEs = fromstring(LEs.tostring(),Int32) if endian == 'L': # byteswap if required LEs = LEs.byteswapped() # convert to 3-d array LEs.shape = (num_timesteps,num_LEs,2) Anyone have any better ideas on how to do this? Thanks, -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

Use the numpyio module from Travis. With this it should be possible to read the data directly and do any typecode conversion you want with. It has fread and fwrite functions, and it can be used with any NumPy type like Int0 in your case. It's part of the signaltools package. http://oliphant.netpedia.net/signaltools_0.5.2.tar.gz HTH, __Janko

Janko Hauser wrote:
I've downloaded it , and it looks pretty handy. It does include a byteswap-in-place, which I need. What is not clear to me from the minimal docs is whether I can read file set up like: long long char long long char .... and have it put the longs into one array, and the chars into another. Also, It wasn't clear whether I could put use it to read a file that has already been opened, starting at the file's current position. I am working with a file that has a text header, so I can't just suck in the whole thing until I've parsed out the header. I can figure out the answer to these questions with some reding of the source, but it wasn't obvious at first glance, so it would be great if someone knows the answer off the top of there head. Travis? By the way, there seem to be a few methods that produce a copy, rather than doing things in place, where it seems more intuitive to do it in place. byteswapped() and astype() come to mind. With byteswapped, I imagine it's rare that you would want to keep a copy around. With astype it would also be rare to keep a copy around, but since it changes the size of the array, I imagine it would be a lot harder to code as an in-place operation. Is there a reason these operations are not available in-place? or is it just that no one has seen enough of a need to write the code. -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net]On Behalf Of Chris Barker Sent: Thursday, November 02, 2000 1:46 PM Cc: numpy-discussion@sourceforge.net Subject: [Numpy-discussion] "formstring()" in place? I have a narge array of type "1" (single bytes). I need to convert it to Int32, in the manner that fromstring() would. Right now, I am doing: Array = fromstring(Array.tostring(),'f') This works fine, but what concerns me is that I need to do this on potentially HUGE arrays, and if I understand this right, I am going to create a copy with tostring, and then another copy with fromstring, that then gets referenced to Array, at which point the first original copy gets de-referenced, and should be deleted, and the temporary one gets deleted at some point in this process. I don't know when stuff created in the middle of a statement gets deleted, so I could potentially have three copies of the data around at the same time, and at least two. Since it is exactly the same C array, I'd like to be able to do this without making any copies at all. Is it possible? It seems like it should be a simple matter of changing the typecode and shape, but is this possible? While I'm asking questions: can I byteswap in place as well? The greater problem: To give a little background, and to see if anyone has a better idea of how to do what I am doing, I thought I'd run through the task that I really need to do. I am reading a binary file full of a lot of data. I have some control over the form of the file, but it needs to be compact, so I can't just make everything the same large type. The file is essentially a whole bunch of records, each of which is a collection of a couple of different types, and which I would eventually like to get into a couple of NumPy arrays. My first cut at the problem was to read each record one at a time in a loop, and use the struct module to convert everything. This worked fine, but was pretty darn slow, so I am now doing it all with NumPy like this (one example, I have more complex ones): num_bytes = 9 # number of bytes in a record: two longs and a char # read all the data into a single byte array data = fromstring(file.read(num_bytes*num_timesteps*num_LEs),'1') # rearrange into 3-d array data.shape = (num_timesteps,num_LEs,num_bytes) # extract LE data: LEs = data[:,:,:8] # extract flag data flags = data[:,:,8] # convert LE data to longs LEs = fromstring(LEs.tostring(),Int32) if endian == 'L': # byteswap if required LEs = LEs.byteswapped() # convert to 3-d array LEs.shape = (num_timesteps,num_LEs,2) Anyone have any better ideas on how to do this? Thanks, -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/numpy-discussion

"Paul F. Dubois" wrote:
Actually, this is exactly NOT what I want to do. In this case, each 1 byte interger was converted to a 4byte integer, of the same VALUE. What I want is to convert each SET of four bytes into a SINGLE 4 byte integer as it:
The four one byte items in a are turned into one four byte item. What I want is to be able to do this in place, rather than have tostring() create a copy. I think fromstring may create a copy as well, having a possible total of three copies around at once. Does anyone know how many copies will be around at once with this line of code? -Chris -- Christopher Barker, Ph.D. cbarker@jps.net --- --- --- http://www.jps.net/cbarker -----@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Water Resources Engineering ------ @ ------ @ ------ @ Coastal and Fluvial Hydrodynamics ------- --------- -------- ------------------------------------------------------------------------ ------------------------------------------------------------------------

Chris Barker writes:
A brut force way would be to do the transformation yourself :-)
So you need to reshape your array into (?,4) and multiply by bits. And regarding your numpyio question, you can also read characters, which are then put into an array by itself. It seems you have a very messy file format (but the data world is never easy) HTH, __Janko -- Institut fuer Meereskunde phone: 49-431-597 3989 Dept. Theoretical Oceanography fax : 49-431-565876 Duesternbrooker Weg 20 email: jhauser@ifm.uni-kiel.de 24105 Kiel, Germany

Sorry I forgot to mention that these two operations can be done inplace, but the result can not be stored inplace, as the shape is changing. So you need to live with one copy, if your array a is of type 'i'.
HTH, __Janko multiply(array([1,2,3,4]),bits,a)

Hi all, The MATLAB Digest just put out a little article about array indexing in MATLAB. I thought some of yo might find it interesting, and it might give some ideas for NumPy2. Most of what MATLAB has, NumPy has an equivalent, but I would love to see what matlab calls vector indexing, and a more natural way to do masks. I know vector indexing would be a pretty tricky thing to have work, at least with slices being references and all, but it would be a very nice thing!! Perhaps some brilliant person can figure out an elegant and efficient way to do it. Here is where you eill find the article: http://www.mathworks.com/company/digest/sept01/matrix.shtml -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------

OOPS! I replied to an arbitray message to get the address,and I forgot to change the subject, so here is the same message I just posted, but with an appropriate subject. Hi all, The MATLAB Digest just put out a little article about array indexing in MATLAB. I thought some of yo might find it interesting, and it might give some ideas for NumPy2. Most of what MATLAB has, NumPy has an equivalent, but I would love to see what matlab calls vector indexing, and a more natural way to do masks. I know vector indexing would be a pretty tricky thing to have work, at least with slices being references and all, but it would be a very nice thing!! Perhaps some brilliant person can figure out an elegant and efficient way to do it. Here is where you will find the article: http://www.mathworks.com/company/digest/sept01/matrix.shtml -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------
participants (6)
-
Chris Barker
-
Chris Barker
-
Janko Hauser
-
Konrad Hinsen
-
Paul F. Dubois
-
Pete Shinners