Mailman 3 September 2012 - Python-ideas

Expansion of the range of small integers
by Serhiy Storchaka Sept. 18, 2012

Sept. 18, 2012

Now in the CPython small integer numbers from -5 up to 256 inclusive are preallocated at the start. It allows to reduce memory consumption and time of creation of the integers in this range. In particular this affects the speed of short enumerations. Increasing the range to the maximum (from -32767 to 32767 inclusive), we can speed up longer enumerations. Microbenchmarks: ./python -m timeit "for i in range(10000): pass" ./python -m timeit -s "a=[0]*10000" "for i, x in enumerate(a): pass" ./python -m timeit -s "a=[0]*10000" "i=0" "for x in a: i+=1" ./python -m timeit -s "a=[0]*10000" "for i in range(len(a)): x=a[i]" Results: non-patched patched 530 usec 337 usec 57% 1.06 msec 811 usec 31% 1.34 msec 1.13 msec 19% 1.42 msec 1.22 msec 16% Shortcomings: 1) Memory consumption increases by constant 1-1.5 MB. Or half of it if the range is expanded only in a positive direction. This is not a problem on most modern computers. But would be better if the parameters NSMALLPOSINTS and NSMALLNEGINTS have been configurable at build time. 2) A little bit larger Python start time. I was not able to measure the difference, it is too small.

6 9

Optional kwarg making attrgetter & itemgetter always return a tuple
by Masklinn Sept. 15, 2012

Sept. 15, 2012

attrgetter and itemgetter are both very useful functions, but both have a significant pitfall if the arguments passed in are validated but not controlled: if receiving the arguments (list of attributes, keys or indexes) from an external source and *-applying it, if the external source passes a sequence of one element both functions will in turn return an element rather than a singleton (1-element tuple). This means such code, for instance code "slicing" a matrix of some sort to get only some columns and getting the slicing information from its caller (in situation where extracting a single column may be perfectly sensible) will have to implement a manual dispatch between a "manual" getitem (or getattr) and an itemgetter (resp. attrgetter) call, e.g. slicer = (operator.itemgetter(*indices) if len(indices) > 1 else lambda ar: [ar[indices[0]]) This makes for more verbose and less straightforward code, I think it would be useful to such situations if attrgetter and itemgetter could be forced into always returning a tuple by way of an optional argument: # works the same no matter what len(indices) is slicer = operator.itemgetter(*indices, force_tuple=True) which in the example equivalences[0] would be an override (to False) of the `len` check (`len(items) == 1` would become `len(items) == 1 and not force_tuple`) The argument is backward-compatible as neither function currently accepts any keyword argument. Uncertainty note: whether force_tuple (or whatever its name is) silences the error generated when len(indices) == 0, and returns a null tuple rather than raising a TypeError. [0] http://docs.python.org/dev/library/operator.html#operator.attrgetter

7 11

Re: [Python-ideas] Optional kwarg making attrgetter & itemgetter always return a tuple
by Oscar Benjamin Sept. 15, 2012

Sept. 15, 2012

On Sep 14, 2012 10:02 PM, "Jim Jewett" <jimjjewett(a)gmail.com> wrote: > > On 9/14/12, Oscar Benjamin <oscar.j.benjamin(a)gmail.com> wrote: > > > I can see why you would expect different behaviour here, though. I tend not > > to think of the functions in the operator module as convenience functions > > but as *efficient* nameable functions referring to operations that are > > normally invoked with a non-function syntax. Which is more convenient out > > of the following: > > > 1) using operator > > import operator > > result = sorted(values, key=operator.attrgetter('name')) > > I would normally write that as > > from operator import attrgetter as attr > ... # may use it several times > > result=sorted(values, key=attr('name')) > > which is about the best I could hope for, without being able to use > the dot itself. To be clear, I wasn't complaining about the inconvenience of importing and referring to attrgetter. I was saying that if the obvious alternative (lambda functions) is at least as convenient then it's odd to describe itemgetter/attrgetter as convenience functions. > > 2) using lambda > > result = sorted(values, key=lambda v: v.name) > > And I honestly think that would be worse, even if lambda didn't have a > code smell. It focuses attention on the fact that you're creating a > callable, instead of on the fact that you're grabbing the name > attribute. I disagree here. I find the fact that a lambda function shows me the expression I would normally use to get the quantity I'm interested in makes it easier for me to read. When I look at it I don't see it as a callable function but as an expression that I'm passing for use somewhere else. > > > In general it is bad to conflate scalar/sequence semantics so that a caller > > should get a different type of object depending on the length of a > > sequence. > > Yeah, but that can't really be solved well in python, except maybe by > never extending an API to handle sequences. I would personally not > consider that an improvement. > > Part of the problem is that the cleanest way to take a variable number > of arguments is to turn them into a sequence under the covers (*args), > even if they weren't passed that way. > > -jJ You can extend an API to support sequences by adding a new entry point. This is a common idiom in python: think list.append vs list.extend. Oscar

1 0

Why is there no way to pass PYTHONPATH on the command line?
by Daniel Holth Sept. 14, 2012

Sept. 14, 2012

Why is there no way to pass PYTHONPATH on the command line? Oversight or intentional? Given path_item/something.py python -p path_item -c "import something; something.foo()" I am aware that the __main__.py behavior lessens the need for this significantly.

6 8

Memoryview tolist() method is misleading
by Stefan Krah Sept. 14, 2012

Sept. 14, 2012

Alexander Belopolsky wrote: > Consider this: > >>> memoryview(b'x').cast('B', ()).tolist() > 120 > > The return value of to list() is an int, not a list. That's because NumPy's tolist() does the same thing: >>> x = numpy.array(120, dtype='B') >>> x array(120, dtype=uint8) >>> x.tolist() 120 If you implement tolist() recursively like in _testbuffer.c and choose the zeroth dimension as the base case, you arrive at single elements. So at least it's not completely unnatural. Stefan Krah

1 0

Re: [Python-ideas] issue15824
by Ben Toews Sept. 10, 2012

Sept. 10, 2012

Hello, We have been discussing the value of having namedtuple as the return type for urlparse.urlparse and urlparse.urlsplit. See that thread here: http://bugs.python.org/issue15824 . I jumped the gun and submitted a patch without seeing if anyone else thought different behavior was desirable. My argument is that it would be a major usability improvement if the return type supported item assignment. Currently, something like the following is necessary in order to parse, make changes, and unparse: import urlparse url = list(urlparse.urlparse('http://www.example.com/foo/bar?hehe=haha')) url[1] = 'python.com' new_url = urllib.urlunparse(url) I think this is really clunky. I don't see any reason why we should be using a type that doesn't support item assignment and needs to be casted to a another type in order to make changes. I think an interface like this is more useful: import urlparse url = urlparse.urlparse('http://www.example.com/foo/bar?hehe=haha') url.netloc = 'www.python.com' urlparse.urlunparse(url) What do other people think? -- -Ben Toews

3 2

Re: [Python-ideas] reprs of recursive datastructures.
by Guido van Rossum Sept. 9, 2012

Sept. 9, 2012

On Sat, Sep 8, 2012 at 9:17 AM, Terry Reedy <tjreedy(a)udel.edu> wrote: > On 9/8/2012 2:27 AM, Guido van Rossum wrote: >> Can someone explain what problem we are trying to solve? I fail to >> uderstand what's wrong with the current behavior... > Pairs of different things have the same representation, making the > representation ambiguous to both people and the interpreter. Well yeah, when designing a repr() we usually have to compromise. E.g. if you render a class instance it often shows the class name but not the module name (e.g. decimal.Decimal.) > Moreover, the interpreter's guess is usually wrong. The requirement that the interpreter can evaluate a repr() and return a similar value is pretty weak, and I'm not sure that in this case the fact that copying the output back into the interpreter returns an object of a different share matters much to anyone. A subtler but similar bug appears with lists containing multiple references to the same sublist, e.g. >>> a = [1, 2] >>> b = [a, a] >>> b [[1, 2], [1, 2]] >>> b[0].append(3) >>> b [[1, 2, 3], [1, 2, 3]] >>> x = [[1, 2], [1, 2]] >>> x[0].append(3) >>> x [[1, 2, 3], [1, 2]] >>> I don't think we should attempt to fix this particular one -- first of all, the analysis would be tricky (there could be a user-defined object involved) and second of all, I can't think of a solution that still produces a valid expression (except perhaps a very ugly one). > In particular, the representations of recursive lists use what is now the > Ellipsis literal '...', so they are also valid list displays for a > non-recursive nested list containing Ellipsis. The interpreter always reads > ... as the Ellipsis literal, which it nearly always is not what is meant. But when does it ever matter? > It would be trivial to tweak the representations of recursive lists so they > are not valid list displays. To what purpose? I still don't understand what the actual use case is where you think that will produce a better experience for the user. -- --Guido van Rossum (python.org/~guido)

7 9

reprs of recursive datastructures.
by Mike Graham Sept. 8, 2012

Sept. 8, 2012

With the Python 3 loosening of where ... can occur, this somewhat suboptimal behaviour occurs >>> x = [] >>> x.append(x) >>> x [[...]] >>> eval(repr(x)) [[Ellipsis]] Is this something that can be improved? Is it something worth improving? Mike

6 8

Add annotations to global statement
by Alexandre Bosc Sept. 6, 2012

Sept. 6, 2012

I think the annotations of parameters and return value of a function, a useful practice for the user of the function. As a function can modify or create global variables, and as it's important for the end user, I would appreciate to add annotations in the global statement. An annotation syntax similar to that of parameters could be employed : global var : expression global var1 : expression1, var2 : expression2,... cheers, Alex (geoscience modeler)

3 3

Memoryview tolist() method is misleading
by Alexander Belopolsky Sept. 3, 2012

Sept. 3, 2012

Consider this: >>> memoryview(b'x').cast('B', ()).tolist() 120 The return value of to list() is an int, not a list. I suggest to deprecate memoryview.tolist() and .tobytes() methods (soft deprecation - in documentation only) and recommend using list(m) and bytes(m) instead. For the multidimensional (and 0-dimensional) views, I suggest adding an unpack([depth]) method that would unpack a view into a nested list of tuples or subviews. For example a single-byte scalar should unpack as follows: >>> m = memoryview(b'x').cast('B', ()) >>> m.unpack() (120,) consistent with >>> struct.unpack_from(m.format, m) (120,)

2 2