Mailman 3 September 2016 - Python-ideas

Re: [Python-ideas] Changing optimisation level from a script
by Damien George Sept. 13, 2016

Sept. 13, 2016

Hi Petr, > The API you proposed here comes is similar to something I see a lot in > MicroPython libraries: functions/methods that combine a getter and setter. > For example, to set the value on a pin, you do: > pin.value(1) > and to read, you do: > result = pin.value() > > If an API like this was added to the stdlib, I'd expect it to use a > property, e.g. > pin.value = 1 > result = pin.value > > > I was wondering, what's the story of this aspect of MicroPython API? > Does it have hidden advantages? Were you inspired by another library? Or > was it just the easiest way to get the functionality (I assume you > implemented functions before properties), and then it stuck? Yes we do use this pattern a fair bit for things that are property-like, eg machine.freq() to get and machine.freq(42000000) to set the CPU frequency. The history reaches back to this issue: https://github.com/micropython/micropython/issues/378 . The main thing is that (for example) setting a frequency is a real action (a function if you will) that is doing lots of things behind the scenes, and which may fail with an exception. It therefore doesn't feel right to make this an attribute, but rather a proper function. To me, an attribute should only be used for things that are true constants, or that are conceptually just a "member variable of an object", to which you can assign any value, and reading it back gives you the same value. So, that's the rationale behind using functions. Cheers, Damien.

1 0

Re: [Python-ideas] [Python-Dev] Drastically improving list.sort() for lists of strings/ints
by Tim Peters Sept. 13, 2016

Sept. 13, 2016

[Elliot Gorokhovsky <elliot.gorokhovsky(a)gmail.com>] > Wow, Tim himself! And Elliot himself! It's a party :-) > Regarding performance on semi-ordered data: we'll have to > benchmark to see, but intuitively I imagine radix would meet Timsort > because verifying that a list of strings is sorted takes Omega(nw) > (which gives a lower bound on Timsort), where w is the word length. Don't take that stuff so seriously. The number of character comparisons needed when comparing two strings is _typically_ small. Consecutive corresponding characters are compared only until the first non-equal pair is found. For example, if the alphabet is 256 characters, for random strings it only requires one character comparison 255 of 256 times on average (because there's only 1 chance in 256 that the first characters _are_ equal). > Radix sort is Theta(nw). So at least asymptotically it checks out. Analogously, most-significant-byte-first radix sort stops on the pass after the longest common prefix is processed. But the real devils are in the details. More on that below. > I think if one uses the two-array algorithm, other semi-sortings can also > be exploited, since the items get placed into their respective buckets > in the order in which they appear in the list. So, for the example you gave, > one pass would sort it correctly Can't know that unless you specify which algorithm you have in mind. If you're talking about the one in the paper, _any_ sequence with values all from range(256) will be sorted in the first pass, because each value would get its own bucket. > (since the list has the property if x_1 and x_2 are in bucket b, x1 comes > before x2 in the list, so x1 will also come before x2 in the bucket. Except > possibly for one "border bucket" that includes n/2). And then > it would just be Theta(nw/b) in each bucket to verify sorted. I don't think you want to go there. If you're keen on radix sorting, stick to radix sorting until it's as good as you can make it. "To verify sorted" requires the (potentially) arbitrarily expensive kinds of comparisons you're trying to _eliminate_. That's not saying there's no possible advantage from doing some of each; it's warning that pursuing every off-target idea that comes up will leave you spinning in circles. > I mean honestly the cool thing about radix is that the best case for > Timsort on strings, Omega(nw), is the worst case for radix! "Details": I wrote a prototype radix sort along the lines of the paper's 2-array version, in Python. When there's "a real" O() improvement to be had, one can usually demonstrate it by a Python program incorporating the improvement even though it's fighting against (unimproved) C code. For example, I helped someone here craft a Python radix sort for integers that runs faster than list.sort(), although it required a list over 10,000,000 elements and a radix of at least 4096 ;-) http://stackoverflow.com/questions/20207791/pushing-radix-sort-and-python-t… However, I'm not having that kind of luck with strings. Here's a detail that bites: as above, string comparisons _usually_ get out fast, after comparing just a few characters. So to construct a bad case for list.sort() in this respect, I want lots of strings with a long common prefix. For example, suppose we have a list of a million strings each of which starts with "A"*20000 (20 thousand "A"). String comparisons are then very expensive (each one requires at least 20001 character comparisons to resolve). But that also makes the radix sort excruciatingly slow! On the first pass, it copies the million elements, puts them all in the same bucket, then copies them all back to exactly where they started from. Pass 2, exactly the same, but looking at the 2nd occurrence of "A" in each string. Ditto for pass 3, pass 4, and ... pass 20,000. By then we've copied a million elements 40,000 times - but still have exactly the same array we started with. The data-copying costs are killing it (yes, it's only copying pointers, but 40 billion pointer copies aren't cheap - and on my 64-bit box, amounts to copying 320 billion bytes). On the same data, list.sort() is doing a great many character comparisons, but doing _far_ less pointer copying, and that leaves it running circles around the radix sort - list.sort() can't copy pointers more than about N*log2(N) times, and log2(1000000) is much smaller than 40,000. To be complete, I'll attach the code I used. Note that the paper is only concerned with "C strings": 0 bytes are special, acting as string terminators. 0 bytes aren't special in Python, so we need 257 buckets. I'm also using Python 3, where strings are always Unicode, and code points (`ord()` results) can be as large as 1,114,111. So this radix sort can't work directly on Python 3 strings; instead it expects a list of `bytes` (or `bytearray`) objects. def rsort(xs): count = [0] * 257 bindex = [None] * 257 stack = [(0, len(xs), 0)] # start index, length, byte index push = stack.append pop = stack.pop while stack: assert all(c == 0 for c in count) si, n, bi = pop() txs = xs[si: si + n] #if n < 30: # xs[si: si + n] = sorted(txs) # continue minbyte, maxbyte = 255, 0 for x in txs: b = x[bi] if bi < len(x) else -1 count[b] += 1 if count[b] == 1 and b >= 0: if b < minbyte: minbyte = b if b > maxbyte: maxbyte = b bindex[-1] = si sofar = si + count[-1] count[-1] = 0 for b in range(minbyte, maxbyte + 1): c = count[b] if c == 0: continue if c > 1: push((sofar, c, bi + 1)) bindex[b] = sofar sofar += c count[b] = 0 assert sofar == si + n for x in txs: b = x[bi] if bi < len(x) else -1 xs[bindex[b]] = x bindex[b] += 1 A little utility to generate "random" input may also be helpful: def build(n, m): from random import randrange xs = [] for _ in range(n): xs.append(bytes(randrange(256) for _ in range(randrange(m)))) return xs Then, e.g., >>> xs = build(1000000, 20) >>> ys = xs[:] >>> rsort(xs) >>> assert xs == sorted(ys)

1 0

Re: [Python-ideas] [Python-Dev] Drastically improving list.sort() for lists of strings/ints
by Tim Peters Sept. 11, 2016

Sept. 11, 2016

[redirected from python-dev, to python-ideas; please send followups only to python-ideas] [Elliot Gorokhovsky <elgo8537(a)colorado.edu>] > ... > TL;DR: Should I spend time making list.sort() detect if it is sorting > lexicographically and, if so, use a much faster algorithm? It will be fun to find out ;-) As Mark, and especially Terry, pointed out, a major feature of the current sort is that it can exploit many kinds of pre-existing order. As the paper you referenced says, "Realistic sorting problems are usually far from random." But, although they did run some tests against data with significant order, they didn't test against any algorithms _aiming_ at exploiting uniformity. Just against their radix sort variants, and against a quicksort. That's where it's likely next to impossible to guess in advance whether radix sort _will_ have a real advantage. All the kinds of order the current sort can exploit are far from obvious, because the mechanisms it employs are low-level & very general. For example, consider arrays created by this function, for even `n`: def bn(n): r = [None] * n r[0::2] = range(0, n//2) r[1::2] = range(n//2, n) return r Then, e.g., >>> bn(16) [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15] This effectively takes range(n), cuts it in half, and does a "perfect shuffle" on the two halves. You'll find nothing in the current code looking for this as a special case, but it nevertheless sorts such arrays in "close to" O(n) time, and despite that there's no natural run in the input longer than 2 elements. That said, I'd encourage you to write your code as a new list method at first, to make it easiest to run benchmarks. If that proves promising, then you can worry about how to make a single method auto-decide which algorithm to use. Also use the two-array version. It's easier to understand and to code, and stability is crucial now. The extra memory burden doesn't bother me - an array of C pointers consumes little memory compared to the memory consumed by the Python objects they point at. Most of all, have fun! :-)

1 0

Re: [Python-ideas] Shuffled
by Arek Bulski Sept. 11, 2016

Sept. 11, 2016

> And if random.shuffle() returned a new list, other people would be bitten because they expected it to be in-place. No one proposes that. shuffled() should be a new function. > One moderately stong piece of evidence would be if this function is widely available in third-party libraries and other languages. Wrong. Python is NOT known for doing everything by how it was done before. My father used statically typed variables and his father used statically typed variables so I will use... > If so, then I would suggest than each list provides a .shuffle method as well (just as sorted/sort does). Well, that could be problematic because std lib would have to use a random generator. We should leave things that require a source of randomness somewhere in the random module. Would you agree on that? Randomized unit testing is obvious example. Besides, any argument that can be used for having shuffle() could be also made for adding shuffled(). Randomizing lists has been used for a long time. We only need an non mutating analog. I am willing to make a patch. pozdrawiam, Arkadiusz Bulski

15 38

Fwd: Re: Null coalescing operator
by David Mertz Sept. 10, 2016

Sept. 10, 2016

On Sat, Sep 10, 2016 at 4:10 PM, Guido van Rossum <guido(a)python.org> wrote: > > So you're offering `NoneCoalesce(x).bar` as less-ugly alternative to > `x?.bar`... Color me unconvinced. No, I'm offering a more realistic use pattern: > for x in get_stuff(): > > x = NoneCoalesce(x) > > # ... bunch of stuff with x ... > # ... more stuff with nested keys or attributes ... > > x2 = x.foo > > x3 = x.bar.baz[x2] > > x4 = x(x.val) > > result = x3(x4) > > As a less ugly alternative in the fairly uncommon case that you want None coalescing as the behavior of getting attributes, keys, call values, etc. that may or may not be available (AND where you don't want to wrap all of those access patterns in one try/except block). In contrast, the ugly version of even this pretty simple toy code with the hypothetical syntax would be: > for x in get_stuff(): > > # ... bunch of stuff with x ... > > # ... more stuff with nested keys or attributes ... > > x2 = x?.foo > > x3 = x?.bar?.baz?[x2] > > x4 = x?(x?.val) > > result = x3?(x4) This second case looks absolutely awful to me. And real world uses, if implemented, would quickly get much worse than that. > > On Sat, Sep 10, 2016 at 4:06 PM, David Mertz <mertz(a)gnosis.cx> wrote: > > Actually, I guess the example I liked was from the year ago discussion. And > > it didn't do *exactly* what I think a wrapper should. What I'd want would > > be like this: > > > > class NoneCoalesce(object): > > "Standard operations on object for 'is not None'" > > def __init__(self, obj): > > self.obj = obj > > > > def __getattr__(self, name): > > try: > > return getattr(self.obj, name) > > except AttributeError: > > return NoneCoalesce(None) > > > > def __getitem__(self, item): > > try: > > return self.obj[item] > > except (TypeError, KeyError): > > return NoneCoalesce(None) > > > > def __call__(self, *args, **kwds): > > try: > > return self.obj(*args, **kwds) > > except TypeError: > > return NoneCoalesce(None) > > > > def __bool__(self): > > return self.obj is not None > > > > def __repr__(self): > > return "NoneCoalesce[%r]" % self.obj > > > > def __str__(self): > > return "NoneCoalesce[%r]" % self.obj > > > > def __len__(self): > > try: > > return len(self.obj) > > except TypeError: > > return 0 > > > > > > Then we might use it similar to this: > > > >>>> from boltons.dictutils import OrderedMultiDict > >>>> from NoneCoalesce import NoneCoalesce > >>>> omd = OrderedMultiDict() > >>>> omd['a'] = 1 > >>>> omd['b'] = 2 > >>>> omd.add('a', 3) > >>>> nc = NoneCoalesce(omd) > >>>> nc or "Spanish Inquisition" > > Out[8]: NoneCoalesce[OrderedMultiDict([('a', 1), ('b', 2), ('a', 3)])] > >>>> nc.spam or "Spam" > > Out[9]: 'Spam' > >>>> nc['nope'].bar.baz() > > Out[10]: NoneCoalesce[None] > >>>> nc['a'] > > Out[11]: 3 > >>>> nc.getlist('a') > > Out[12]: [1, 3] > > > > > > Nothing special about boltons' OrderedMultiDict here, just something I've > > been playing with that has some distinctive methods. > > > > The idea is that we can easily have both "regular" behavior and None > > coalescing just by wrapping any objects in a utility class... and WITHOUT > > adding ugly syntax. I might have missed some corners where we would want > > behavior wrapped, but those shouldn't be that hard to add in principle. > > > > On Sat, Sep 10, 2016 at 3:21 PM, David Mertz <mertz(a)gnosis.cx> wrote: > >> > >> I find the '?.' syntax very ugly, much more so in the examples of chained > >> attributes. > >> > >> A much better way to handle the use case is to wrap objects in a class > >> that gives this "propagating None" behavior with plain attribute access. A > >> nice implementation was presented in this thread. > >> > >> > >> On Sep 10, 2016 3:16 PM, "Random832" <random832(a)fastmail.com> wrote: > >>> > >>> On Sat, Sep 10, 2016, at 13:26, Guido van Rossum wrote: > >>> > The way I recall it, we arrived at the perfect syntax (using ?) and > >>> > semantics. The issue was purely strong hesitation about whether > >>> > sprinkling ? all over your code is too ugly for Python > >>> > >>> I think that if there's "strong hesitation" about something being "too > >>> ugly" it can't really be described as "the perfect syntax". IIRC there > >>> were a couple alternatives being discussed that would have reduced the > >>> number of question marks to one [or one per object which might be None]. > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas(a)python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > > > > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas(a)python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

1 0

Changing optimisation level from a script
by Damien George Sept. 10, 2016

Sept. 10, 2016

Hi all, When starting CPython from the command line you can pass the -O option to enable optimisations (eg `assert 0` won't raise an exception when -O is passed). But, AFAIK, there is no way to change the optimisation level after the interpreter has started up, ie there is no Python function call or variable that can change the optimisation. In MicroPython we want to be able to change the optimisation level from within a script because (on bare metal at least) there is no analog of passing options like -O. My idea would be to have a function like `sys.optimise(value)` that sets the optimisation level for all subsequent runs of the parser/compiler. For example: import sys import mymodule # no optimisations exec('assert 0') # will raise an exception sys.optimise(1) # enable optimisations import myothermodule # optimisations are enabled for this (unless it's already imported by mymodule) exec('assert 0') # will not raise an exception What do you think? Sorry if this has been discussed before! Cheers, Damien.

11 13

Expose reasons for SSL/TLS cert verification failures
by Chi Hsuan Yen Sept. 9, 2016

Sept. 9, 2016

Hi Python enthusiasts, Currently _ssl.c always reports CERTIFICATE_VERIFY_FAILED for any certification verification errors. In OpenSSL, it's possible to tell from different reasons that lead to CERTIFICATE_VERIFY_FAILED. For example, https://expired.badssl.com/ reports X509_V_ERR_CERT_HAS_EXPIRED, and https://self-signed.badssl.com/ reports X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT. Seems CPython does not expose such information yet? I hope it can be added to CPython. For example, creating a new exception class SSLCertificateError, which is a subclass of SSLError, that provides error codes like X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT. Any ideas? The attachment is a naive try to printf some information about a verification failure. It's just a proof-of-concept and does not provide any practical advantage :) Best, Yen Chi Hsuan

2 2

Re: [Python-ideas] Typecheckers: there can be only one
by Hugh Fisher Sept. 9, 2016

Sept. 9, 2016

> From: Steven D'Aprano <steve(a)pearwood.info> > > I fear that you haven't grasped the fundamental difference between > gradual static typing of a dynamically-typed language like Python, and > non-gradual typing of statically-typed languages like C, Haskell, Java, > etc. Your statement seems like a reasonable fear if you think of static > typing as a mandatory pre-compilation step which prevents the code from > compiling or running if it doesn't pass. But it makes no sense in the > context of an optional code analysis step which warns of things which > may lead to runtime exceptions. > > I think that a good analogy here is that type checkers in the Python > ecosystem are equivalent to static analysis tools like Coverity and > BLAST in the C ecosystem. I guess time will show which of our analogies and expectations work out. I've made my points, other people have made theirs, so I'll shut up now. Thanks to everyone who responded. -- cheers, Hugh Fisher

1 0

shuffled as a way to shuffle an iterable
by Xavier Combelle Sept. 8, 2016

Sept. 8, 2016

When thinking about the shuffled thread, it occurred to me that it was quite easy to pass an iterable and expect the iterable to be shuffled. but two mentioned implementation are close to success but fail by not taking this use case in account: def shuffled1(iterable): result = iterable[:] random.shuffle(result) return result This one might fail because an iterable don't have subscript, but in the case of range() it has. In this case it fail at the affectation place def shuffled2(iterable): random.sample(iterable,len(iterable)) This one perfectly work on range() but fail in a less specific iterable without len like shuffled2((i for i in range(5))) and eventually this one work in all case of iterable def shuffled3(iterable): result = list(iterable) random.shuffle(result) return result I don't think that the use case of shuffle an iterable is important enough to create a function just to take care about it but if one want to implement it in an analog way sorted is implemented this should be take in account.

2 1

kw to be ordered dict
by Arek Bulski Sept. 8, 2016

Sept. 8, 2016

I have a piece of code that essentially reimplements OrderedDict. I have been wondering why the key ordered is not preserved correctly (tests fail). c = Container(a=1,b=2,c=3,d=4) self.assertEqual(c.keys(), ["a","b","c","d"]) self.assertEqual(c.values(), [1,2,3,4]) self.assertEqual(c.items(), [("a",1),("b",2),("c",3),("d",4)]) Then I looked at collections.OrderedDict ctor docstring and it says "Initialize an ordered dictionary. The signature is the same as regular dictionaries, but keyword arguments are not recommended because their insertion order is arbitrary." I expected the **kw to reproduce the same order as actual keyword arguments. The only right way to solve it, as it seems to me, would be to make **kw an OrderedDict. Arkadiusz Bulski

4 3