
Here are my two cents worth on the subject... Most of what has been said in this thread (at least what I have read) I find very valuable. Apparently, many people have been thinking about the subject. I view this problem as inherent in a language without an Lvalue (like C) that allows a very explicit and clear definition from the programmer's point of view as to the size of the container you are going put things in. The language in many cases simply returns an object to you and has made some decision as to what you "needed" or "wanted". Of course, this is one of the things that makes languages such an Numerical Python, Matlab, IDL, etc. very nice for protyping and investigating. In many cases this decision will be adequate or acceptable. In some, quite simply, it will not be. At this point the programmer has to have a good means of managing this decision himself. If memory is not a constraint, I can think of very few situations (none, actually) where I would choose to go with something other than the Numerical Python default of double. In general, that is what you get when creating python arrays unless you make some effort to obtain some other type. However, in some important (read "cases that affect the author") situations memory is a very critical constraint. Typically, in Numerical Python I use 4-byte floats. In fact, one of the reasons I use Numerical Python is because I don't *need* doubles and matlab for example is really only setup to work gracefully with doubles. I do *need* to conserve memory as I deal with very large data sets. It seems the question we are discussing is not really "what *should* be done in terms of casting?" but "what provides good enough decisions much of the time *and* a gracefull way to manage the decisions when "good enough" no longer applies to you?" Currently, this is not a trivial thing to manage. Reading in a 100 MB data set and multiplying by the python scalar 2 produces a 200 MB data set. I manage this by wrapping the 2 in an array. This happens, of course, all the time. Having to do this once is not a big deal, but doing everywhere in code that uses floats makes for cluttered code -- not something which I expect to have to write in an otherwise very elegant and concise language. Also, I often find myself trudging through code looking for the subtlety that converted my floats to doubles, doubled my memory usage and then caused subsequent float only routines to error out. To those who are constrained to use floats this is awkward and time consuming. To those who are not I would say -- use doubles. The flag that causes an array to be a "space saving array" seems to be a temporary fix (that doesn't mean it was a bad idea -- just that it feels messy and effectively adds complexity that shouldn't be there). It also, mearly postpones the problem as I understand it -- what happens when I multiply two space saving arrays? We simply will never get away from situations where we have to manage the interaction ourselves and so we should be careful not to make that management so awkward (and currently I think it is awkward) that the floats, bytes, shorts, etc. become marginalized in their utility. My suggestion is to go with the rule that a simple hirearchy (in which downcasting is the rule) longs integers shorts bytes cardinals booleans doubles complex doubles <--- default floats complex floats for the most part makes good decisions: Principally because people who are not constrained to conserve memory will use the larger, default types all the time and not wince. They don't *need* floats or bytes. If anyone gives them a float a simple astype('d') or astype('D') to make sure it becomes a double lets them go on their way. Types like integers and bytes are effectively treated as being precise. If you are constrained to conserve memory by staying with floats or bytes instead of just reading things in from disk and making them doubles it will not be so awkward to manage the types in large programs. If I use someones code and they have a scalar anywhere in it at some point, even if I (or they) cast the output, memory usage swells at least for intermediate calculations. Effectively, python *has* 4-byte floats but programming with them is awkward. This means, of course, that multiplying a float array by a double array produces a float. Multiplying a double array by anything above it produces a double. etc. For my work, if I have a float anywhere in the calculation I don't believe precision beyond that in the output so getting a float back is reasonable. I know that some operations produce "more precision" and so I would cast the array if I needed to take advantage of that. Perhaps the downcasting is *not* the way to go. However, I definately think the current awkwardness should be eliminated. I hope my comments will not be percieved as being critical of the original language designers. I find python to be very useful or I wouldn't have bothered to make the comments at all. -Todd Pitts
participants (1)
-
Pitts, Todd A., Ph.D.