Mailman 3 October 2016 - Python-ideas

Optimizing list.sort() by checking type in advance
by Elliot Gorokhovsky Oct. 15, 2016

Oct. 15, 2016

Thanks for looking at this! That's why I spent months of my life (overall) devising a sequence of sorting algorithms for Python that reduced the number of comparisons needed. Yes, that's why I think this is so cool: for a couple dozen lines of code, we can get (at least for some cases, according to my questionable benchmarks) the kinds of massive improvements you had to use actual computer science to achieve (as opposed to mere hackery). Note that when Python's current sort was adopted in … [View More]Java, they still kept a quicksort variant for "unboxed" builtin types. The adaptive merge sort incurs many overheads that often cost more than they save unless comparisons are in fact very expensive compared to the cost of pointer copying (and in Java comparison of unboxed types is cheap). Indeed, for native numeric types, where comparison is dirt cheap, quicksort generally runs faster than mergesort despite that the former does _more_ comparisons (because mergesort does so much more pointer-copying). Ya, I think this may be a good approach for floats: if the list is all floats, just copy all the floats into a seperate array, use the standard library quicksort, and then construct a sorted PyObject* array. Like maybe set up a struct { PyObject* payload, float key } type of deal. This wouldn't work for strings (unicode is scary), and probably not for ints (one would have to check that all the ints are within C long bounds). Though on the other hand perhaps this would be too expensive? I had considered something "like this" for Python 2, but didn't pursue it because comparison was defined between virtually any two types (34 < [1], etc), and people were careless about that (both by design and by accident). In Python 3, comparison "blows up" for absurdly mixed types, so specializing for homogeneously-typed lists is a more promising idea on the face of it. The comparisons needed to determine _whether_ a list's objects have a common type is just len(list)-1 C-level pointer comparisons, and so goes fast. So I expect that, when it applies, this would speed even sorting an already-ordered list with at least 2 elements. That's what my crude benchmarks indicate... when I applied my sort to a list of 1e7 ints with a float tacked on the end, my sort actually ended up being a bit faster over several trials (which I attribute to PyObject_RichCompare == Py_True being faster than PyObject_RichCompareBool == 1, apologies for any typos in that code). For a mixed-type list with at least 2 elements, it will always be pure loss. But (a) I expect such lists are uncommon (and especially uncommon in Python 3); and (b) a one-time scan doing C-level pointer comparisons until finding a mismatched type is bound to be a relatively tiny cost compared to the expense of all the "rich comparisons" that follow. So +1 from me on pursuing this. Elliot, please: - Keep this on python-ideas. python-dev is for current issues in Python development, not for speculating about changes. - Open an issue on the tracker: https://bugs.python.org/ OK - At least browse the info for developers: https://docs.python.org/devguide/ Ya, I'm working on setting this up as a patch in the hg repo as opposed to an extension module to make benchmarking cleaner/more sane. - Don't overlook Lib/test/sortperf.py. As is, it should be a good test of what your approach so far _doesn't_ help, since it sorts only lists of floats (& I don't think you're special-casing them). If the timing results it reports aren't significantly hurt (and I expect they won't be), then add specialization for floats too and gloat about the speedup :-) Ya, I mean they aren't special-cased, but homogenous lists of floats still fit in the tp->rich_compare case, which still bypasses the expensive PyObject_RichCompare. I'll guess I'll see when I implement this as a patch and can run it on sortperf.py. - I expect tuples will also be worth specializing (complex sort keys are often implemented as tuples). I'm not sure what you mean here... I'm looking at the types of lo.keys, not of saved_ob_item (I think I said that earlier in this thread by mistake actually). So if someone is passing tuples and using itemgetter to extract ints or strings or whatever, the current code will work fine; lo.keys will be scalar types. Unless I misunderstand you here. I mean, when would lo.keys actually be tuples? Nice start! :-) Thanks! [View Less]

5 7

Add sorted (ordered) containers
by Марк Коренберг Oct. 14, 2016

Oct. 14, 2016

I mean mutable containers that are always sorted when iterating over them. See http://bugs.python.org/issue28433 for example: * SortedSet (sorted unique elements, implemented using (rb?)tree instead of hash) * SortedList (sorted elements, the same as SortedSet, but without uniquiness constraint) - actually a (rb?)tree, not a list (i.e. not an array) * SortedDict (sorted by key when interating) - like C++'s ordered_map There are many implementations in the net, like: https://bitbucket.… [View More]

5 5

PEP 505 -- None-aware operators
by Gustavo Carneiro Oct. 14, 2016

Oct. 14, 2016

Sorry if I missed the boat, but only just now saw this PEP. Glancing through the PEP, I don't see mentioned anywhere the SQL alternative of having a coalesce() function: https://www.postgresql.org/docs/9.6/static/functions-conditional.html#FUNCT… In Python, something like this: def coalesce(*args): for arg in args: if arg is not None: return arg return None Just drop it into builtins, and voila. No need for lengthy discussions about which operator to use … [View More]

4 4

Re: [Python-ideas] PEP 505 -- None-aware operators
by Franklin? Lee Oct. 14, 2016

Oct. 14, 2016

On Oct 14, 2016 9:14 AM, "Gustavo Carneiro" <gjcarneiro(a)gmail.com> wrote: > > Sorry if I missed the boat, but only just now saw this PEP. > > Glancing through the PEP, I don't see mentioned anywhere the SQL alternative of having a coalesce() function: https://www.postgresql.org/docs/9.6/static/functions-conditional.html#FUNCT… > > In Python, something like this: > > def coalesce(*args): > for arg in args: > if arg is not None: > … [View More]

2 2

Fwd: Add sorted (ordered) containers
by Grant Jenks Oct. 14, 2016

Oct. 14, 2016

On Thu, Oct 13, 2016 at 1:36 PM, Марк Коренберг <socketpair(a)gmail.com> wrote: > > I mean mutable containers that are always sorted when iterating over them. > > See http://bugs.python.org/issue28433 > > for example: > > * SortedSet (sorted unique elements, implemented using (rb?)tree instead of hash) > * SortedList (sorted elements, the same as SortedSet, but without uniquiness constraint) - actually a (rb?)tree, not a list (i.e. not an array) > * … [View More]SortedDict (sorted by key when interating) - like C++'s ordered_map Can you share more about your use cases for these containers? What are you making? Nick Coghlan gave an answer to this question on StackOverflow at http://stackoverflow.com/a/5958960/232571 The answer kind of boils down to "there should be one obvious way to do it" and existing Python features like lists, sorted, bisect, and heapq cover many use cases. I wrote the answer that is now the second highest rated for that question. I've noticed that the upvotes have been accumulating at a slightly higher rate than Nick's answer. I think that reflects an increase in interest and maybe gradual tide change of opinion. > There are many implementations in the net, like: > > http://www.grantjenks.com/docs/sortedcontainers That's my project. I am the primary developer of the SortedContainers project. You may also be interested in the [SortedCollections](http://www.grantjenks.com/docs/sortedcollections/) module which builds atop SortedContainers with data types like ValueSortedDict and ItemSortedDict. Because it's pure-Python, SortedContainers offers a lot of opportunity for extension/customization. That's also made it easier for the API to adapt/expand over time. > I think it should be one standardized implementation of such containers in CPython. > > Instead of trees, implementation may use SkipList structure, but this is just implementation details. > > Such structres imply fast insertion and deletion, ability to iterate, and also memory efficiency. I gave a talk at PyCon 2016 about Python Sorted Collections[1] that's worth watching. The first third discusses about six different implementations with different strengths and weaknesses. The choice of data type is more than implementation details. One of the biggest issues is the various tradeoffs of data types like blists, rbtrees, etc. I have been meaning to discuss sorted collections further with Raymond Hettinger (author of the Python collections module). We spoke after the PyCon talk and wanted to continue the conversation. But I had a busy summer and just a few weeks ago welcomed my first son into the world. So realistically that conversation won't happen until 2017. [1]: http://www.grantjenks.com/docs/sortedcontainers/pycon-2016-talk.html > I recommend to read thorough review articles written by Andrew Barnert: > > http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.… > > http://stupidpythonideas.blogspot.com/2014/04/sortedcontainers.html One of Andrew Barnert's conclusions is that SortedContainers could not scale. I did a pretty rigorous performance analysis and benchmarking at http://www.grantjenks.com/docs/sortedcontainers/performance-scale.html Short answer: I scaled SortedContainers up through ten billion elements, well past the memory limits of most machines. [View Less]

1 0

Suggestion: Deprecate metaclasses that are not instances of type
by Neil Girdhar Oct. 13, 2016

Oct. 13, 2016

Background: I asked a stackoverflow question here <http://stackoverflow.com/questions/40029807/why-does-the-class-definitions-…> . The Python documentation <https://docs.python.org/3.6/reference/datamodel.html#determining-the-approp…> is very confusing to me. It says that: if an explicit metaclass is given and it is not an instance of type, then it is used directly as the metaclass This seems to suggest that in this case, the "explicit metaclass" does not need to be "subtype … [View More]

2 3

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
by Chris Angelico Oct. 13, 2016

Oct. 13, 2016

On Thu, Oct 13, 2016 at 5:17 PM, Stephen J. Turnbull <turnbull.stephen.fw(a)u.tsukuba.ac.jp> wrote: > Chris Angelico writes: > > > I'm not sure what you mean by "strcmp-able"; do you mean that the > > lexical ordering of two Unicode strings is guaranteed to be the same > > as the byte-wise ordering of their UTF-8 encodings? > > This is definitely not true for the Han characters. In Japanese, the > most commonly used lexical ordering is based on the … [View More]

1 0

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
by Elliot Gorokhovsky Oct. 12, 2016

Oct. 12, 2016

To answer your question: I special-case unicode (strings), ints, and floats. I am working on special-casing tuples (can even be different types, just need homogeneity column-wise). The best speedups will be tuples of floats: it'll bypass three layers of useless checks. If I run it without special-casing floats (just using tp->rich_compare) I only get like a 5-10% speedup. I'm working on rigorous benchmarks for all this stuff, will post a pdf along with the patch once it's all done. But it's … [View More]certainly <10%. However, part of this is because my special case is really low-level; for strings I've actually found the opposite, using tp->richcompare gives me almost the same results as my special case compare, since it still has to PyUnicode_READY the strings (or whatever it's called). Regarding generalization: the general technique for special-casing is you just substitute all type checks with 1 or 0 by applying the type assumption you're making. That's the only way to guarantee it's safe and compliant. Elliot On Tue, Oct 11, 2016, 5:19 PM Jim J. Jewett <jimjjewett(a)gmail.com> wrote: > Excellent. > I'm surprised cache didn't save more, but less surprised than I was ... I > hadn't realized that you were skipping the verifications in > PyFloat_RichCompare as well. Does that generalize to other element types > without exposing too much of the per-type internals to list.sort? > > Oh ... and I appreciate your not quoting private email as a general > courtesy, but I hereby give you permission if it was mine that was private. > [Though I think your summary was better than a quote anyhow.] > > -jJ > > On Oct 11, 2016 4:58 PM, "Elliot Gorokhovsky" < > elliot.gorokhovsky(a)gmail.com> wrote: > > So I got excited here. And the reason why is that I got those numbers *on > Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I > figured there was probably a problem with they way I was timing, and > certainly the gains couldn't be as extreme as they suggested. But this is > on a benchmark that's already in the codebase! > > > Here is a detailed explanation of how to reproduce my results, and the > circumstances that would lead them to be invalid: > > ****************************************** > > To reproduce, just activate a virtualenv, and then clone > https://github.com/embg/python-fast-listsort.git. Then python setup.py > install and python sortperf.py. > > > Now let's look at what sortperf.py does and how it relates to Tim's > benchmark at Lib/test/sortperf.py. If you diff the two, you'll find I made > three changes: > > > 1. I added an import, "import fastlist". This obviously would not make > sorting twice faster. > > > 2. I changed the way it formats the output: I changed "fmt = ("%2s %7s" + > " %7s"*len(cases))" to "fmt = ("%2s %7s" + " %6s"*len(cases))". Again > irrelevant. > > > 3. I changed the timing function > > #from this > > > def doit_fast(L): > t0 = time.perf_counter() > L.fastsort() > t1 = time.perf_counter() > print("%6.2f" % (t1-t0), end=' ') > flush() > > > > #to this > > > def doit(L): > F = FastList(L) > f0 = time.perf_counter() > F.fastsort() > f1 = time.perf_counter() > F = FastList(L) > t0 = time.perf_counter() > F.sort() > t1 = time.perf_counter() > print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ') > flush() > > > ******************************************* > > So what we've shown is that (1) if you trust the existing sorting > benchmark and (2) if my modification to doit() doesn't mess anything up (I > leave this up to you to judge), then the measurements are as valid. Which > is a pretty big deal (50% !!!!!!!), hence my overexcitement. > > **************************************** > > > Now I'd like to respond to responses (the one I'm thinking of was off-list > so I don't want to quote it) I've gotten questioning how it could be > possible for such a small optimization, bypassing the typechecks, to > possibly have such a large effect, even in theory. Here's my answer: > > Let's ignore branch prediction and cache for now and just look at a high > level. The cost of sorting is related to the cost of a single comparison, > because the vast majority of our time (let's say certainly at least 90%, > depending on the list) is spent in comparisons. So let's look at the cost > of a comparison. > > Without my optimization, comparisons for floats (that's what this > benchmark looks at) go roughly like this: > > 1. Test type of left and right for PyObject_RichCompare (which costs two > pointer dereferences) and compare them. "3 ops" (quotes because counting > ops like this is pretty hand-wavy). "2 memory accesses". > > 2. Get the address of the float compare method from > PyFloat_Type->tp_richcompare. "1 op". "1 memory access". > > 3. Call the function whose address we just got. "1 op". "Basically 0 > memory accesses because we count the stack stuff in that 1 op". > > 4. Test type of left and right again in PyFloat_RichCompare and compare > both of them to PyFloat_Type. "4 ops". "2 memory accesses". > > 5. Get floats from the PyObject* by calling PyFloat_AS_DOUBLE or whatever. > "2 ops". "2 memory accesses". > > 6. Compare the floats and return. "2 ops". > > Now let's tally the "cost" (sorry for use of quotes here, just trying to > emphasize this is an intuitive, theoretical explanation for the numbers > which doesn't take into account the hardware): > "13 ops, 7 memory accesses". > > Here's what it looks like in my code: > > 1. Call PyFloat_AS_DOUBLE on left and right. "2 ops". "2 memory acceses". > > 2. Compare the floats and return. "2 ops". > > Tally: "4 ops, 2 memory accesses". > > Now you can argue branch prediction alleviates a lot of this cost, since > we're taking the same branches every time. But note that, branch prediction > or not, we still have to do all of those memory acceses, and since they're > pointers to places all over memory, they miss the cache basically every > time (correct me if I'm wrong). So memory-wise, we really are doing > something like a 7:2 ratio, and op-wise, perhaps not as bad because of > branch prediction, but still, 13:4 is probably bad no matter what's going > on in the hardware. > > Now consider that something like 90% of our time is spent in those steps. > Are my numbers really that unbelievable? > > Thanks for everything, looking forward to writing this up as a nice latex > doc with graphs and perf benchmarks and all the other rigorous goodies, as > well as a special case cmp func for homogeneous tuples and a simple patch > file, > > Elliot > > [View Less]

11 23

Re: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance
by Tim Peters Oct. 12, 2016

Oct. 12, 2016

[please restrict follow-ups to python-ideas] Let's not get hung up on meta-discussion here - I always thought "massive clusterf**k" was a precise technical term anyway ;-) While timing certainly needs to be done more carefully, it's obvious to me that this approach _should_ pay off significantly when it applies. Comparisons are extraordinarily expensive in Python, precisely because of the maze of test-and-branch code it requires just to figure out which bottom-level comparison function to … [View More]invoke each time. That's why I spent months of my life (overall) devising a sequence of sorting algorithms for Python that reduced the number of comparisons needed. Note that when Python's current sort was adopted in Java, they still kept a quicksort variant for "unboxed" builtin types. The adaptive merge sort incurs many overheads that often cost more than they save unless comparisons are in fact very expensive compared to the cost of pointer copying (and in Java comparison of unboxed types is cheap). Indeed, for native numeric types, where comparison is dirt cheap, quicksort generally runs faster than mergesort despite that the former does _more_ comparisons (because mergesort does so much more pointer-copying). I had considered something "like this" for Python 2, but didn't pursue it because comparison was defined between virtually any two types (34 < [1], etc), and people were careless about that (both by design and by accident). In Python 3, comparison "blows up" for absurdly mixed types, so specializing for homogeneously-typed lists is a more promising idea on the face of it. The comparisons needed to determine _whether_ a list's objects have a common type is just len(list)-1 C-level pointer comparisons, and so goes fast. So I expect that, when it applies, this would speed even sorting an already-ordered list with at least 2 elements. For a mixed-type list with at least 2 elements, it will always be pure loss. But (a) I expect such lists are uncommon (and especially uncommon in Python 3); and (b) a one-time scan doing C-level pointer comparisons until finding a mismatched type is bound to be a relatively tiny cost compared to the expense of all the "rich comparisons" that follow. So +1 from me on pursuing this. Elliot, please: - Keep this on python-ideas. python-dev is for current issues in Python development, not for speculating about changes. - Open an issue on the tracker: https://bugs.python.org/ - At least browse the info for developers: https://docs.python.org/devguide/ - Don't overlook Lib/test/sortperf.py. As is, it should be a good test of what your approach so far _doesn't_ help, since it sorts only lists of floats (& I don't think you're special-casing them). If the timing results it reports aren't significantly hurt (and I expect they won't be), then add specialization for floats too and gloat about the speedup :-) - I expect tuples will also be worth specializing (complex sort keys are often implemented as tuples). Nice start! :-) [View Less]

7 10

Expose condition._waiters
by Ram Rachum Oct. 12, 2016

Oct. 12, 2016

Hi guys, I'm writing some code that uses `threading.Condition` and I found that I want to access condition._waiters. I want to do it in two different parts of my code for two different reasons: 1. When shutting down the thread that manages the condition, I want to be sure that there are no waiters on the condition, so I check whether `condition._waiters` is empty before exiting, otherwise I'll let them finish and only then exit. 2. When I do notify_all, I actually want to do as many notify … [View More]

1 0