PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation. It's been in discussion on c.l.p for about a week, here: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3... I'm not a Python guru, and my initial benchmark had many mistakes. With help from the community correct benchmarks emerged: + for string concatenation is now roughly as fast as the usual "".join() idiom when appending. (It appears to be *much* faster for prepending.) The patched Python passes all the tests in regrtest.py for which I have source; I didn't install external packages such as bsddb and sqlite3. My approach was to add a "string concatenation" object; I have since learned this is also called a "rope". Internally, a PyStringConcatationObject is exactly like a PyStringObject but with a few extra members taking an additional thirty-six bytes of storage. When you add two PyStringObjects together, string_concat() returns a PyStringConcatationObject which contains references to the two strings. Concatenating any mixture of PyStringObjects and PyStringConcatationObjects works similarly, though there are some internal optimizations. These changes are almost entirely contained within Objects/stringobject.c and Include/stringobject.h. There is one major externally-visible change in this patch: PyStringObject.ob_sval is no longer a char[1] array, but a char *. Happily, this only requires a recompile, because the CPython source is *marvelously* consistent about using the macro PyString_AS_STRING(). (One hopes extension authors are as consistent.) I only had to touch two other files (Python/ceval.c and Objects/codeobject.c) and those were one-line changes. There is one remaining place that still needs fixing: the self-described "hack" in Mac/Modules/MacOS.c. Fixing that is beyond my pay grade. I changed the representation of ob_sval for two reasons: first, it is initially NULL for a string concatenation object, and second, because it may point to separately-allocated memory. That's where the speedup came from--it doesn't render the string until someone asks for the string's value. It is telling to see my new implementation of PyString_AS_STRING, as follows (casts and extra parentheses removed for legibility): #define PyString_AS_STRING(x) ( x->ob_sval ? x->ob_sval : PyString_AsString(x) ) This adds a layer of indirection for the string and a branch, adding a tiny (but measurable) slowdown to the general case. Again, because the changes to PyStringObject are hidden by this macro, external users of these objects don't notice the difference. The patch is posted, and I have donned the thickest skin I have handy. I look forward to your feedback. Cheers, /larry/
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation.
yay! i'm glad to see this. i hate the "".join syntax. i still write that as string.join() because thats at least readable). it also fixes the python idiom for fast string concatenation as intended; anyone whos ever written code that builds a large string value by pushing substrings into a list only to call join later should agree. mystr = "prefix" while bla: #... mystr += moredata is much nicer to read than mystr = "prefix" strParts = [mystr] while bla: #... strParts.append(moredata) mystr = "".join(strParts) have you run any generic benchmarks such as pystone to get a better idea of what the net effect on "typical" python code is?
"Gregory P. Smith" <greg@electricrain.com> wrote:
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation.
yay! i'm glad to see this. i hate the "".join syntax. i still write that as string.join() because thats at least readable). it also fixes the python idiom for fast string concatenation as intended; anyone whos ever written code that builds a large string value by pushing substrings into a list only to call join later should agree.
mystr = "prefix" while bla: #... mystr += moredata
Regardless of "nicer to read", I would just point out that Guido has stated that Python will not have strings implemented as trees. Also, Python 3.x will have a data type called 'bytes', which will be the default return of file.read() (when files are opened as binary), which uses an over-allocation strategy like lists to get relatively fast concatenation (on the order of lst1 += lst2). - Josiah
On 5 Oct 2006, at 20:28, Gregory P. Smith wrote:
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge:
http://sourceforge.net/tracker/index.php? func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation.
yay! i'm glad to see this. i hate the "".join syntax.
Here here. Being able to write what you mean and have the language get decent performance none the less seems to me to be a "good thing".
have you run any generic benchmarks such as pystone to get a better idea of what the net effect on "typical" python code is?
Yeah, "real world" performance testing is always important with anything that uses lazy evaluation. If you get to control if and when the computation actually happens you have even more scope than usual for getting the benchmark answer you want to see! Cheers, Nicko
Gregory P. Smith wrote:
have you run any generic benchmarks such as pystone to get a better idea of what the net effect on "typical" python code is? I hadn't, but I'm happy to. On my machine (a fire-breathing Athlon 64 x2 4400+), best of three runs:
Python 2.5 release: Pystone(1.1) time for 50000 passes = 1.01757 This machine benchmarks at 49136.8 pystones/second Python 2.5 concat: Pystone(1.1) time for 50000 passes = 0.963191 This machine benchmarks at 51910.8 pystones/second I'm surprised by this; I had expected it to be slightly *slower*, not the other way 'round. I'm not sure why this is. A cursory glance at pystone.py doesn't reveal any string concatenation using +, so I doubt it's benefiting from my speedup. And I didn't change the optimization flags when I compiled Python, so that should be the same. Josiah Carlson wrote:
Regardless of "nicer to read", I would just point out that Guido has stated that Python will not have strings implemented as trees.
I suspect it was more a caution that Python wouldn't *permanently* store strings as "ropes". In my patch, the rope only exists until someone asks for the string's value, at which point the tree is rendered and dereferenced. From that point on the object is exactly like a normal PyStringObject to the external viewer. But you and I are, as I believe the saying goes, "channeling Guido (badly)". Perhaps some adult supervision will intervene soon and make its opinions known. For what it's worth, I've realized two things I want to change about my patch: * I left in a couple of /* lch */ comments I used during development as markers to find my own code. Whoops; I'll strip those out. * I realized that, because of struct packing, all PyStringObjects are currently wasting an average of two bytes apiece. (As in, that's something Python 2.5 does, not something added by my code.) I'll change my patch so strings are allocated more precisely. If my string concatenation patch is declined, I'll be sure to submit this patch separately. I'll try to submit an updated patch today. Cheers, /larry/
Gregory P. Smith wrote:
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation.
yay! i'm glad to see this. i hate the "".join syntax. i still write that as string.join() [...]
instance.method(*args) <==> type.method(instance, *args) You can nowadays spell this as str.join("", lst) - no need to import a whole module! regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden
Steve Holden wrote:
instance.method(*args) <==> type.method(instance, *args)
You can nowadays spell this as str.join("", lst) - no need to import a whole module!
except that str.join isn't polymorphic:
str.join(u",", ["1", "2", "3"]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: descriptor 'join' requires a 'str' object but received a 'unicode' string.join(["1", "2", "3"], u",") u'1,2,3'
</F>
Greg> have you run any generic benchmarks such as pystone to get a Greg> better idea of what the net effect on "typical" python code is? MAL's pybench would probably be better for this presuming it does some addition with string operands. Skip
skip@pobox.com wrote:
Greg> have you run any generic benchmarks such as pystone to get a Greg> better idea of what the net effect on "typical" python code is?
MAL's pybench would probably be better for this presuming it does some addition with string operands.
or stringbench. </F>
Fredrik Lundh wrote:
skip@pobox.com wrote:
MAL's pybench would probably be better for this presuming it does some addition with string operands.
or stringbench.
I ran 'em, and they are strangely consistent with pystone. With concat, stringbench is ever-so-slightly faster overall. "172.82" vs "174.85" for the "ascii" column, I guess that's in seconds. I'm just happy it's not slower. (I only ran stringbench once; it seems to take *forever*). I ran pybench three times for each build. The slowest concat overall time was still 2.9% faster than the fastest release time. "ConcatStrings" is a big winner, at around 150% faster; since the test doesn't *do* anything with the concatenated values, it never renders the concatenation objects, so it does a lot less work. "CreateStringsWithConcat" is generally 18-19% faster, as expected. After that, the timings are all over the place, but some tests were consistently faster: "CompareInternedStrings" was 8-12% faster, "DictWithFloatKeys" was 9-11% faster, "SmallLists" was 8-15% faster, "CompareLongs" was 6-10% faster, and "PyMethodCalls" was 4-6% faster. (These are all comparing the "average run-time" results, though the "minimum run-time" results were similar.) I still couldn't tell you why my results are faster. I swear on my mother's eyes I didn't touch anything major involved in "DictWithFloatKeys", "SmallLists", or "CompareLongs". I didn't touch the compiler settings, so that shouldn't be it. I acknowledge not only that it could all be a mistake, and that I don't know enough about it to speculate.// The speedup mystery continues, *larry*
Larry Hastings wrote:
Fredrik Lundh wrote:
skip@pobox.com wrote:
MAL's pybench would probably be better for this presuming it does some addition with string operands.
or stringbench.
I ran 'em, and they are strangely consistent with pystone.
With concat, stringbench is ever-so-slightly faster overall. "172.82" vs "174.85" for the "ascii" column, I guess that's in seconds. I'm just happy it's not slower. (I only ran stringbench once; it seems to take *forever*).
I ran pybench three times for each build. The slowest concat overall time was still 2.9% faster than the fastest release time. "ConcatStrings" is a big winner, at around 150% faster; since the test doesn't *do* anything with the concatenated values, it never renders the concatenation objects, so it does a lot less work. "CreateStringsWithConcat" is generally 18-19% faster, as expected. After that, the timings are all over the place, but some tests were consistently faster: "CompareInternedStrings" was 8-12% faster, "DictWithFloatKeys" was 9-11% faster, "SmallLists" was 8-15% faster, "CompareLongs" was 6-10% faster, and "PyMethodCalls" was 4-6% faster. (These are all comparing the "average run-time" results, though the "minimum run-time" results were similar.)
When comparing results, please look at the minimum runtime. The average times are just given to indicate how much the mintime differs from the average of all runs.
I still couldn't tell you why my results are faster. I swear on my mother's eyes I didn't touch anything major involved in "DictWithFloatKeys", "SmallLists", or "CompareLongs". I didn't touch the compiler settings, so that shouldn't be it. I acknowledge not only that it could all be a mistake, and that I don't know enough about it to speculate.//
Depending on what you changed, it is possible that the layout of the code in memory better fits your CPU architecture. If however the speedups are not consistent across several runs of pybench, then it's likely that you have some background activity going on on the machine which causes a slowdown in the unmodified run you chose as basis for the comparison. Just to make sure: you are using pybench 2.0, right ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 09 2006)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
I've uploaded a new patch to Sourceforge in response to feedback: * I purged all // comments and fixed all > 80 characters added by my patch, as per Neil Norwitz. * I added a definition of max() for those who don't already have one, as per skip@pobox.com. It now compiles cleanly on Linux again without modification; sorry for not checking that since the original patch. I've also uploaded my hacked-together benchmark script, for all that's worth. That patch tracker page again: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 M.-A. Lemburg wrote:
When comparing results, please look at the minimum runtime. The average times are just given to indicate how much the mintime differs from the average of all runs.
I'll do that next time. In the meantime, I've also uploaded a zip file containing the results of my benchmarking, including the stdout from the run and the "-f" file which contains the pickled output. So you can examine my results yourself, including doing analysis on the pickled data if you like.
If however the speedups are not consistent across several runs of pybench, then it's likely that you have some background activity going on on the machine which causes a slowdown in the unmodified run you chose as basis for the comparison.
The machine is dual-core, and was quiescent at the time. XP's scheduler is hopefully good enough to just leave the process running on one core. I ran the benchmarks just once on my Linux 2.6 machine; it's a dual-CPU P3 933EB (or maybe just 866EB, I forget). It's faster overall there too, by 1.9% (minimum run-time). The two tests I expected to be faster ("ConcatStrings" and "CreateStringsWithConcat") were consistently much faster; beyond that the results don't particularly resemble the results from my XP machine. (I uploaded those .txt and .pickle files too.) The mystery overall speedup continues, not that I find it unwelcome. :)
Just to make sure: you are using pybench 2.0, right ?
I sure was. And I used stringbench.py downloaded from here: http://svn.python.org/projects/sandbox/branches/jim-fix-setuptools-cli/strin... Cheers, /larry/
Larry Hastings <larry@hastings.org> wrote: [snip]
The machine is dual-core, and was quiescent at the time. XP's scheduler is hopefully good enough to just leave the process running on one core.
It's not. Go into the task manager (accessable via Ctrl+Alt+Del by default) and change the process' affinity to the second core. In my experience, running on the second core (in both 2k and XP) tends to produce slightly faster results. Linux tends to keep processes on a single core for a few seconds at a time. - Josiah
Gregory P. Smith wrote:
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation.
yay! i'm glad to see this. i hate the "".join syntax. i still write that as string.join() because thats at least readable). it also fixes the python idiom for fast string concatenation as intended; anyone whos ever written code that builds a large string value by pushing substrings into a list only to call join later should agree.
Well I always like things to run faster, but I disagree that this idiom is broken. I like using lists to store sub strings and I think it's just a matter of changing your frame of reference in how you think about them. For example it doesn't bother me to have an numeric type with many digits, and to have lists of many, many digit numbers, and work with those. Working with lists of many character strings is not that different. I've even come to the conclusion (just my opinion) that mutable lists of strings probably would work better than a long mutable string of characters in most situations. What I've found is there seems to be an optimum string length depending on what you are doing. Too long (hundreds or thousands of characters) and repeating some string operations (not just concatenations) can be slow (relative to short strings), and using many short (single character) strings would use more memory than is needed. So a list of medium length strings is actually a very nice compromise. I'm not sure what the optimal strings length is, but lines of about 80 columns seems to work very well for most things. I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place. So maybe instead of splitting, modifying, then joining, (and again, etc ...), just pass the whole list around and have operations that work directly on the list of strings and return a list of strings as the result. Pretty much what the Patch does under the covers, but it only works with concatenation. Having more functions that work with lists of strings directly will reduce the need for concatenation as well. Some operations that could work well with whole lists of strings of lines may be indent_lines, dedent_lines, prepend_lines, wrap_lines, and of course join_lines as in '\n'.join(L), the inverse of s.splitlines(), and there also readlines() and writelines(). Also possilby find_line or find_in_lines(). These really shouldn't seem anymore out of place than numeric operations that work with lists such as sum, max, and min. So to me... "".join(L) as a string operation that works on a list of strings seems perfectly natural. :-) Cheers, Ron
Ron Adam wrote:
I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place.
as important is the observation that you don't necessarily have to join string lists; if the data ends up being sent over a wire or written to disk, you might as well skip the join step, and work directly from the list. (it's no accident that ET has grown "tostringlist" and "fromstringlist" functions, for example ;-) </F>
Fredrik Lundh <fredrik@pythonware.com> wrote:
Ron Adam wrote:
I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place.
as important is the observation that you don't necessarily have to join string lists; if the data ends up being sent over a wire or written to disk, you might as well skip the join step, and work directly from the list.
(it's no accident that ET has grown "tostringlist" and "fromstringlist" functions, for example ;-)
I've personally added a line-based abstraction with indent/dedent handling, etc., for the editor I use, which helps make macros and underlying editor functionality easier to write. - Josiah
Josiah Carlson wrote:
Fredrik Lundh <fredrik@pythonware.com> wrote:
Ron Adam wrote:
I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place. as important is the observation that you don't necessarily have to join string lists; if the data ends up being sent over a wire or written to disk, you might as well skip the join step, and work directly from the list.
(it's no accident that ET has grown "tostringlist" and "fromstringlist" functions, for example ;-)
I've personally added a line-based abstraction with indent/dedent handling, etc., for the editor I use, which helps make macros and underlying editor functionality easier to write.
- Josiah
I've done the same thing just last week. I've started to collect them into a module called stringtools, but I see no reason why they can't reside in the string module. I think this may be just a case of collecting these type of routines together in one place so they can be reused easily because they already are scattered around pythons library in some form or another. Another tool I found tucked away within a pydoc is the console pager that is used in pydoc. I think it could easily be a separate module it self. And it benefits from the line-based abstraction as well. Cheers, Ron
On 10/6/06, Fredrik Lundh <fredrik@pythonware.com> wrote:
Ron Adam wrote:
I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place.
as important is the observation that you don't necessarily have to join string lists; if the data ends up being sent over a wire or written to disk, you might as well skip the join step, and work directly from the list.
(it's no accident that ET has grown "tostringlist" and "fromstringlist" functions, for example ;-)
The just make lists paradigm is used by Erlang too, it's called "iolist" there (it's not a type, just a convention). The lists can be nested though, so concatenating chunks of data for IO is always a constant time operation even if the chunks are already iolists. -bob
On 6 Oct 2006, at 12:37, Ron Adam wrote:
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". ... Well I always like things to run faster, but I disagree that this idiom is broken.
I like using lists to store sub strings and I think it's just a matter of changing your frame of reference in how you think about them.
I think that you've hit on exactly the reason why this patch is a good idea. You happen to like to store strings in lists, and in many situations this is a fine thing to do, but if one is forced to change ones frame of reference in order to get decent performance then as well as violating the maxims Larry originally cited you're also hitting both "readability counts" and "Correctness and clarity before speed." The "".join(L) idiom is not "broken" in the sense that, to the fluent Python programmer, it does convey the intent as well as the action. That said, there are plenty of places that you'll see it not being used because it fails to convey the intent. It's pretty rare to see someone write: for k,v in d.items(): print " has value: ".join([k,v]) but, despite the utility of the % operator on strings it's pretty common to see: print k + " has value: " + v This patch _seems_ to be able to provide better performance for this sort of usage and provide a major speed-up for some other common usage forms without causing the programmer to resort making their code more complicated. The cost seems to be a small memory hit on the size of a string object, a tiny increase in code size and some well isolated, under-the-hood complexity. It's not like having this patch is going to force anyone to change the way they write their code. As far as I can tell it simply offers better performance if you choose to express your code in some common ways. If it speeds up pystone by 5.5% with such minimal down side I'm hard pressed to see a reason not to use it. Cheers, Nicko
Nicko van Someren wrote:
On 6 Oct 2006, at 12:37, Ron Adam wrote:
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". ... Well I always like things to run faster, but I disagree that this idiom is broken.
I like using lists to store sub strings and I think it's just a matter of changing your frame of reference in how you think about them.
I think that you've hit on exactly the reason why this patch is a good idea. You happen to like to store strings in lists, and in many situations this is a fine thing to do, but if one is forced to change ones frame of reference in order to get decent performance then as well as violating the maxims Larry originally cited you're also hitting both "readability counts" and "Correctness and clarity before speed."
The statement ".. if one is forced to change .." is a bit overstated I think. The situation is more a matter of increasing awareness so the frame of reference comes to mind more naturally and doesn't seem forced. And the suggestion of how to do that is by adding additional functions and methods that can use lists-of-strings instead of having to join or concatenate them first. Added examples and documentation can also do that as well. The two ideas are non-competing. They are related because they realize their benefits by reducing redundant underlying operations in a similar way. Cheers, Ron
Nicko van Someren <nicko@nicko.org> wrote:
It's not like having this patch is going to force anyone to change the way they write their code. As far as I can tell it simply offers better performance if you choose to express your code in some common ways. If it speeds up pystone by 5.5% with such minimal down side I'm hard pressed to see a reason not to use it.
This has to wait until Python 2.6 (which is anywhere from 14-24 months away, according to history); including it would destroy binary capatability with modules compiled for 2.5, nevermind that it is a nontrivial feature addition. I also think that the original author (or one of this patch's supporters) should write a PEP outlining the Python 2.5 and earlier drawbacks, what changes this implementation brings, its improvements, and any potential drawbacks. - Josiah
Fredrik> Nicko van Someren wrote: >> If it speeds up pystone by 5.5% with such minimal down side I'm hard >> pressed to see a reason not to use it. Fredrik> can you tell me where exactly "pystone" does string Fredrik> concatenations? I wondered about that as well. While I'm not prepared to assert without a doubt that pystone does no simpleminded string concatenation, a couple minutes scanning the pystone source didn't turn up any. If the pystone speedup isn't an artifact, the absence of string concatention in pystone suggests it's happening somewhere in the interpreter. I applied the patch, ran the interpreter under gdb with a breakpoint set in string_concat where the PyStringConcatenationObject is created, then ran pystone. The first hit was in site.py -> distutils/util.py -> string.py All told, there were only 22 hits, none for very long strings, so that doesn't explain the performance improvement. BTW, on my Mac (OSX 10.4.8) max() is not defined. I had to add a macro definition to string_concat. Skip
On 7 Oct 2006, at 09:17, Fredrik Lundh wrote:
Nicko van Someren wrote:
If it speeds up pystone by 5.5% with such minimal down side I'm hard pressed to see a reason not to use it.
can you tell me where exactly "pystone" does string concatenations?
No, not without more in depth examination, but it is a pretty common operation in all sorts of cases including inside the interpreter. Larry's message in reply to Gregory Smith's request for a pystone score showed a 5.5% improvement and as yet I have no reason to doubt it. If the patch provides a measurable performance improvement for code that merely happens to use strings as opposed to being explicitly heavy on string addition then all the better. It's clear that this needs to be more carefully measured before it goes in (which is why that quote above starts "If"). As I've mentioned before in this thread, getting good performance measures on code that does lazy evaluation is often tricky. pystone is a good place to start but I'm sure that there are use cases that it does not cover. As for counting up the downsides, Josiah Carlson rightly points out that it breaks binary compatibility for modules, so the change can not be taken lightly and clearly it will have to wait for a major release. Still, if the benefits outweigh the costs it seems worth doing. Cheers, Nicko
I've significantly enhanced my string-concatenation patch, to the point where that name is no longer accurate. So I've redubbed it the "lazy strings" patch. The major new feature is that string *slices* are also represented with a lazy-evaluation placeholder for the actual string, just as concatenated strings were in my original patch. The lazy slice object stores a reference to the original PyStringObject * it is sliced from, and the desired start and stop slice markers. (It only supports step = 1.) Its ob_sval is NULL until the string is rendered--but that rarely happens! Not only does this mean string slices are faster, but I bet this generally reduces overall memory usage for slices too. Now, one rule of the Python programming API is that "all strings are zero-terminated". That part of makes the life of a Python extension author sane--they don't have to deal with some exotic Python string class, they can just assume C-style strings everywhere. Ordinarily, this means a string slice couldn't simply point into the original string; if it did, and you executed x = "abcde" y = x[1:4] internally y->ob_sval[3] would not be 0, it would be 'e', breaking the API's rule about strings. However! When a PyStringObject lives out its life purely within the Python VM, the only code that strenuously examines its internals is stringobject.c. And that code almost never needs the trailing zero*. So I've added a new static method in stringobject.c: char * PyString_AsUnterminatedString(PyStringObject *) If you call it on a lazy-evaluation slice object, it gives you back a pointer into the original string's ob_sval. The s->ob_size'th element of this *might not* be zero, but if you call this function you're saying that's a-okay, you promise not to look at it. (If the PyStringObject * is any other variety, it calls into PyString_AsString, which renders string concatenation objects then returns ob_sval.) Again: this behavior is *never* observed by anyone outside of stringobject.c. External users of PyStringObjects call PyString_AS_STRING(), which renders all lazy concatenation and lazy slices so they look just like normal zero-terminated PyStringObjects. With my patch applied, trunk still passes all expected tests. Of course, lazy slice objects aren't just for literal slices created with [x:y]. There are lots of string methods that return what are effectively string slices, like lstrip() and split(). With this code in place, string slices that aren't examined by modules are very rarely rendered. I ran "pybench -n 2" (two rounds, warp 10 (whatever that means)) while collecting some statistics. When it finished, the interpreter had created a total of 640,041 lazy slices, of which only *19* were ever rendered. Apart from lazy slices, there's only one more enhancement when compared with v1: string prepending now reuses lazy concatenation objects much more often. There was an optimization in string_concatenate (Python/ceval.c) that said: "if the left-side string has two references, and we're about to overwrite the second reference by storing this concatenation to an object, tell that object to drop its reference". That often meant the reference on the string dropped to 1, which meant PyString_Resize could just resize the left-side string in place and append the right-side. I modified it so it drops the reference to the right-hand operand too. With this change, even with a reduction in the allowable stack depth for right-hand recursion (so it's less likely to blow the stack), I was able to prepend over 86k strings before it forced a render. (Oh, for the record: I ensure depth limits are enforced when combining lazy slices and lazy concatenations, so you still won't blow your stack when you mix them together.) Here are the highlights of a single apples-to-apples pybench run, 2.6 trunk revision 52413 ("this") versus that same revision with my patch applied ("other"): Test minimum run-time average run-time this other diff this other diff ------------------------------------------------------------------------------- ConcatStrings: 204ms 76ms +168.4% 213ms 77ms +177.7% CreateStringsWithConcat: 159ms 138ms +15.7% 163ms 142ms +15.1% StringSlicing: 142ms 86ms +65.5% 145ms 88ms +64.6% ------------------------------------------------------------------------------- Totals: 7976ms 7713ms +3.4% 8257ms 7975ms +3.5% I also ran this totally unfair benchmark: x = "abcde" * (20000) # 100k characters for i in xrange(10000000): y = x[1:-1] and found my patched version to be 9759% faster. (You heard that right, 98x faster.) I'm ready to post the patch. However, as a result of this work, the description on the original patch page is really no longer accurate: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 Shall I close/delete that patch and submit a new patch with a more modern description? After all, there's not a lot of activity on the old patch page... Cheers, /larry/ * As I recall, stringobject.c needs the trailing zero in exactly *one* place: when comparing two zero-length strings. My patch ensures that zero-length slices and concatenations still return nullstring, so this still works as expected.
Interesting - is it possible that the same technique could be used to hide differences in character width? Specifically, if I concatenate an ascii string with a UTF-32 string, can the up-conversion to UTF-32 also be done lazily? If that could be done efficiently, it would resolve some outstanding issues that have come up on the Python-3000 list with regards to str/unicode convergence. Larry Hastings wrote:
I've significantly enhanced my string-concatenation patch, to the point where that name is no longer accurate. So I've redubbed it the "lazy strings" patch.
The major new feature is that string *slices* are also represented with a lazy-evaluation placeholder for the actual string, just as concatenated strings were in my original patch. The lazy slice object stores a reference to the original PyStringObject * it is sliced from, and the desired start and stop slice markers. (It only supports step = 1.) Its ob_sval is NULL until the string is rendered--but that rarely happens! Not only does this mean string slices are faster, but I bet this generally reduces overall memory usage for slices too.
Now, one rule of the Python programming API is that "all strings are zero-terminated". That part of makes the life of a Python extension author sane--they don't have to deal with some exotic Python string class, they can just assume C-style strings everywhere. Ordinarily, this means a string slice couldn't simply point into the original string; if it did, and you executed x = "abcde" y = x[1:4] internally y->ob_sval[3] would not be 0, it would be 'e', breaking the API's rule about strings.
However! When a PyStringObject lives out its life purely within the Python VM, the only code that strenuously examines its internals is stringobject.c. And that code almost never needs the trailing zero*. So I've added a new static method in stringobject.c: char * PyString_AsUnterminatedString(PyStringObject *) If you call it on a lazy-evaluation slice object, it gives you back a pointer into the original string's ob_sval. The s->ob_size'th element of this *might not* be zero, but if you call this function you're saying that's a-okay, you promise not to look at it. (If the PyStringObject * is any other variety, it calls into PyString_AsString, which renders string concatenation objects then returns ob_sval.)
Again: this behavior is *never* observed by anyone outside of stringobject.c. External users of PyStringObjects call PyString_AS_STRING(), which renders all lazy concatenation and lazy slices so they look just like normal zero-terminated PyStringObjects. With my patch applied, trunk still passes all expected tests.
Of course, lazy slice objects aren't just for literal slices created with [x:y]. There are lots of string methods that return what are effectively string slices, like lstrip() and split().
With this code in place, string slices that aren't examined by modules are very rarely rendered. I ran "pybench -n 2" (two rounds, warp 10 (whatever that means)) while collecting some statistics. When it finished, the interpreter had created a total of 640,041 lazy slices, of which only *19* were ever rendered.
Apart from lazy slices, there's only one more enhancement when compared with v1: string prepending now reuses lazy concatenation objects much more often. There was an optimization in string_concatenate (Python/ceval.c) that said: "if the left-side string has two references, and we're about to overwrite the second reference by storing this concatenation to an object, tell that object to drop its reference". That often meant the reference on the string dropped to 1, which meant PyString_Resize could just resize the left-side string in place and append the right-side. I modified it so it drops the reference to the right-hand operand too. With this change, even with a reduction in the allowable stack depth for right-hand recursion (so it's less likely to blow the stack), I was able to prepend over 86k strings before it forced a render. (Oh, for the record: I ensure depth limits are enforced when combining lazy slices and lazy concatenations, so you still won't blow your stack when you mix them together.)
Here are the highlights of a single apples-to-apples pybench run, 2.6 trunk revision 52413 ("this") versus that same revision with my patch applied ("other"):
Test minimum run-time average run-time this other diff this other diff -------------------------------------------------------------------------------
ConcatStrings: 204ms 76ms +168.4% 213ms 77ms +177.7% CreateStringsWithConcat: 159ms 138ms +15.7% 163ms 142ms +15.1% StringSlicing: 142ms 86ms +65.5% 145ms 88ms +64.6% -------------------------------------------------------------------------------
Totals: 7976ms 7713ms +3.4% 8257ms 7975ms +3.5%
I also ran this totally unfair benchmark: x = "abcde" * (20000) # 100k characters for i in xrange(10000000): y = x[1:-1] and found my patched version to be 9759% faster. (You heard that right, 98x faster.)
I'm ready to post the patch. However, as a result of this work, the description on the original patch page is really no longer accurate:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
Shall I close/delete that patch and submit a new patch with a more modern description? After all, there's not a lot of activity on the old patch page...
Cheers,
/larry/
* As I recall, stringobject.c needs the trailing zero in exactly *one* place: when comparing two zero-length strings. My patch ensures that zero-length slices and concatenations still return nullstring, so this still works as expected.
------------------------------------------------------------------------
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/talin%40acm.org
Talin wrote:
Interesting - is it possible that the same technique could be used to hide differences in character width? Specifically, if I concatenate an ascii string with a UTF-32 string, can the up-conversion to UTF-32 also be done lazily?
of course. and if all you do with the result is write it to an UTF-8 stream, it doesn't need to be done at all. this requires a slightly more elaborate C-level API interface than today's PyString_AS_STRING API, though... (which is why this whole exercise belongs on the Python 3000 lists, not on python-dev for 2.X) </F>
See also the Cedar Ropes work: http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/sp... Bill
Larry Hastings schrieb:
I've significantly enhanced my string-concatenation patch, to the point where that name is no longer accurate. So I've redubbed it the "lazy strings" patch.
It's not clear to me what you want to achieve with these patches, in particular, whether you want to see them integrated into Python or not.
The major new feature is that string *slices* are also represented with a lazy-evaluation placeholder for the actual string, just as concatenated strings were in my original patch. The lazy slice object stores a reference to the original PyStringObject * it is sliced from, and the desired start and stop slice markers. (It only supports step = 1.)
I think this specific approach will find strong resistance. It has been implemented many times, e.g. (apparently) in NextStep's NSString, and in Java's string type (where a string holds a reference to a character array, a start index, and an end index). Most recently, it was discussed under the name "string view" on the Py3k list, see http://mail.python.org/pipermail/python-3000/2006-August/003282.html Traditionally, the biggest objection is that even small strings may consume insane amounts of memory.
Its ob_sval is NULL until the string is rendered--but that rarely happens! Not only does this mean string slices are faster, but I bet this generally reduces overall memory usage for slices too.
Channeling Guido: what real-world applications did you study with this patch to make such a claim?
I'm ready to post the patch. However, as a result of this work, the description on the original patch page is really no longer accurate:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 Shall I close/delete that patch and submit a new patch with a more modern description? After all, there's not a lot of activity on the old patch page...
Closing the issue and opening a new is fine. Regards, Martin
Martin v. Löwis wrote:
It's not clear to me what you want to achieve with these patches, in particular, whether you want to see them integrated into Python or not.
I would be thrilled if they were, but it seems less likely with every passing day. If you have some advice on how I might increase the patch's chances I would be all ears. It was/is my understanding that the early days of a new major revision was the most judicious time to introduce big changes. If I had offered these patches six months ago for 2.5, they would have had zero chance of acceptance. But 2.6 is in its infancy, and so I assumed now was the time to discuss sea-change patches like this. Anyway, it was my intent to post the patch and see what happened. Being a first-timer at this, and not having even read the core development mailing lists for very long, I had no idea what to expect. Though I genuinely didn't expect it to be this brusque.
I think this specific approach will find strong resistance. I'd say the "lazy strings" patch is really two approaches, "lazy concatenation" and "lazy slices". You are right, though, *both* have "found strong resistance".
Most recently, it was discussed under the name "string view" on the Py3k list, see http://mail.python.org/pipermail/python-3000/2006-August/003282.html Traditionally, the biggest objection is that even small strings may consume insane amounts of memory.
Let's be specific: when there is at least one long-lived small lazy slice of a large string, and the large string itself would otherwise have been dereferenced and freed, and this small slice is never examined by code outside of stringobject.c, this approach means the large string becomes long-lived too and thus Python consumes more memory overall. In pathological scenarios this memory usage could be characterized as "insane". True dat. Then again, I could suggest some scenarios where this would save memory (multiple long-lived large slices of a large string), and others where memory use would be a wash (long-lived slices containing the all or almost all of a large string, or any scenario where slices are short-lived). While I think it's clear lazy slices are *faster* on average, its overall effect on memory use in real-world Python is not yet known. Read on.
I bet this generally reduces overall memory usage for slices too.
Channeling Guido: what real-world applications did you study with this patch to make such a claim?
I didn't; I don't have any. I must admit to being only a small-scale Python user. Memory use remains about the same in pybench, the biggest Python app I have handy. But, then, it was pretty clearly speculation, not a claim. Yes, I *think* it'd use less memory overall. But I wouldn't *claim* anything yet. The "stringview" discussion you cite was largely speculation, and as I recall there were users in both camps ("it'll use more memory overall" vs "no it won't"). And, while I saw a test case with microbenchmarks, and a "proof-of-concept" where a stringview was a separate object from a string, I didn't see any real-word applications tested with this approach. Rather than start in on speculation about it, I have followed that old maxim of "show me the code". I've produced actual code that works with real strings in Python. I see this as an opportunity for Pythonistas to determine the facts for themselves. Now folks can try the patch with these real-world applications you cite and find out how it really behaves. (Although I realize the Python community is under no obligation to do so.) If experimentation is the best thing here, I'd be happy to revise the patch to facilitate it. For instance, I could add command-line arguments letting you tweak the run-time behavior of the patch, like changing the minimum size of a lazy slice. Perhaps add code so there's a tweakable minimum size of a lazy concatenation too. Or a tweakable minimum *ratio* necessary for a lazy slice. I'm open to suggestions. Cheers, /larry/
Larry Hastings wrote:
Martin v. Löwis wrote:
Let's be specific: when there is at least one long-lived small lazy slice of a large string, and the large string itself would otherwise have been dereferenced and freed, and this small slice is never examined by code outside of stringobject.c, this approach means the large string becomes long-lived too and thus Python consumes more memory overall. In pathological scenarios this memory usage could be characterized as "insane".
True dat. Then again, I could suggest some scenarios where this would save memory (multiple long-lived large slices of a large string), and others where memory use would be a wash (long-lived slices containing the all or almost all of a large string, or any scenario where slices are short-lived). While I think it's clear lazy slices are *faster* on average, its overall effect on memory use in real-world Python is not yet known. Read on.
I wonder - how expensive would it be for the string slice to have a weak reference, and 'normalize' the slice when the big string is collected? Would the overhead of the weak reference swamp the savings? -- Talin
Larry Hastings schrieb:
Anyway, it was my intent to post the patch and see what happened. Being a first-timer at this, and not having even read the core development mailing lists for very long, I had no idea what to expect. Though I genuinely didn't expect it to be this brusque.
I could have told you :-) The "problem" really is that you are suggesting a major, significant change to the implementation of Python, and one that doesn't fix an obvious bug. The new code is an order of magnitude more complex than the old one, and the impact that it will have is unknown - but in the worst case, it could have serious negative impact, e.g. when the code is full of errors, and causes Python application to crash in masses. This is, of course, FUD: it is the fear that this might happen, the uncertainty about the quality of the code and the doubt about the viability of the approach. There are many aspects to such a change, but my experience is that it primarily takes time. Fredrik Lundh suggested you give up about Python 2.6, and target Python 3.0 right away; it may indeed be the case that Python 2.6 is too close for that kind of change to find enough supporters. If your primary goal was to contribute to open source, you might want to look for other areas of Python: there are plenty of open bugs ("real bugs" :-), unreviewed patches, etc. For some time, it is more satisfying to work on these, since the likelihood of success is higher. Regards, Martin
>> Anyway, it was my intent to post the patch and see what happened. >> Being a first-timer at this, and not having even read the core >> development mailing lists for very long, I had no idea what to >> expect. Though I genuinely didn't expect it to be this brusque. Martin> I could have told you :-) The "problem" really is that you are Martin> suggesting a major, significant change to the implementation of Martin> Python, and one that doesn't fix an obvious bug. Come on Martin. Give Larry a break. Lots of changes have been accepted to to the Python core which weren't obvious "bug fixes". In fact, I seem to recall a sprint held recently in Reykjavik where the whole point was just to make Python faster. I believe that was exactly Larry's point in posting the patch. The "one obvious way to do" concatenation and slicing for one of the most heavily used types in python appears to be faster. That seems like a win to me. Skip
skip@pobox.com wrote:
>> Anyway, it was my intent to post the patch and see what happened. >> Being a first-timer at this, and not having even read the core >> development mailing lists for very long, I had no idea what to >> expect. Though I genuinely didn't expect it to be this brusque.
Martin> I could have told you :-) The "problem" really is that you are Martin> suggesting a major, significant change to the implementation of Martin> Python, and one that doesn't fix an obvious bug.
The "obvious bug" that it fixes is slowness <0.75 wink>.
Come on Martin. Give Larry a break. Lots of changes have been accepted to to the Python core which weren't obvious "bug fixes". In fact, I seem to recall a sprint held recently in Reykjavik where the whole point was just to make Python faster. I believe that was exactly Larry's point in posting the patch. The "one obvious way to do" concatenation and slicing for one of the most heavily used types in python appears to be faster. That seems like a win to me.
I did point out to Larry when he went to c.l.py with the original patch that he would face resistance, so this hasn't blind-sided him. But it seems to me that the only major issue is the inability to provide zero-byte terminators with this new representation. Because Larry's proposal for handling this involves the introduction of a new API that can't already be in use in extensions it's obviously the extension writers who would be given most problems by this patch. I can understand resistance on that score, and I could understand resistance if there were other clear disadvantages to its implementation, but in their absence it seems like the extension modules are the killers. If there were any reliable way to make sure these objects never got passed to extension modules then I'd say "go for it". Without that it does seem like a potentially widespread change to the C API that could affect much code outside the interpreter. This is a great shame. I think Larry showed inventiveness and tenacity to get this far, and deserves credit for his achievements no matter whether or not they get into the core. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden
Steve Holden wrote:
But it seems to me that the only major issue is the inability to provide zero-byte terminators with this new representation.
I guess I wasn't clear in my description of the patch; sorry about that. Like "lazy concatenation objects", "lazy slices" render when you call PyString_AsString() on them. Before rendering, the lazy slice's ob_sval will be NULL. Afterwards it will point to a proper zero-terminated string, at which point the object behaves exactly like any other PyStringObject. The only function that *might* return a non-terminated char * is PyString_AsUnterminatedString(). This function is static to stringobject.c--and I would be shocked if it were ever otherwise.
If there were any reliable way to make sure these objects never got passed to extension modules then I'd say "go for it". If external Python extension modules are as well-behaved as the shipping Python source tree, there simply wouldn't be a problem. Python source is delightfully consistent about using the macro PyString_AS_STRING() to get at the creamy char *center of a PyStringObject *. When code religiously uses that macro (or calls PyString_AsString() directly), all it needs is a recompile with the current stringobject.h and it will Just Work.
I genuinely don't know how many external Python extension modules are well-behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved. Apart from stringobject.c, there was exactly one spot in the Python source tree which made assumptions about the structure of PyStringObjects (Mac/Modules/macos.c). It's in the block starting with the comment "This is a hack:". Note that this is unfixed in my patch, so just now all code using that self-avowed "hack" will break. Am I correct in understanding that changing the Python minor revision number (2.5 -> 2.6) requires external modules to recompile? (It certainly does on Windows.) If so, I could mitigate the problem by renaming ob_sval. That way, code making explicit reference to it would fail to compile, which I feel is better than silently recompiling unsafe code. Cheers, /larry/
On Mon, 23 Oct 2006 07:58:25 -0700, Larry Hastings <larry@hastings.org> wrote:
[snip] If external Python extension modules are as well-behaved as the shipping Python source tree, there simply wouldn't be a problem. Python source is delightfully consistent about using the macro PyString_AS_STRING() to get at the creamy char *center of a PyStringObject *. When code religiously uses that macro (or calls PyString_AsString() directly), all it needs is a recompile with the current stringobject.h and it will Just Work.
I genuinely don't know how many external Python extension modules are well- behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved.
FWIW, http://www.google.com/codesearch?q=+ob_sval Jean-Paul
Jean-Paul Calderone wrote:
On Mon, 23 Oct 2006 07:58:25 -0700, Larry Hastings <larry@hastings.org> wrote:
[snip] If external Python extension modules are as well-behaved as the shipping Python source tree, there simply wouldn't be a problem. Python source is delightfully consistent about using the macro PyString_AS_STRING() to get at the creamy char *center of a PyStringObject *. When code religiously uses that macro (or calls PyString_AsString() directly), all it needs is a recompile with the current stringobject.h and it will Just Work.
I genuinely don't know how many external Python extension modules are well- behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved.
Possible more enlightening (we *know* string objects play with this field!): http://www.google.com/codesearch?hl=en&lr=&q=ob_sval+-stringobject.%5Bhc%5D&btnG=Search Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
On 10/23/06, Larry Hastings <larry@hastings.org> wrote:
Steve Holden wrote:
But it seems to me that the only major issue is the inability to provide zero-byte terminators with this new representation.
I guess I wasn't clear in my description of the patch; sorry about that.
Like "lazy concatenation objects", "lazy slices" render when you call PyString_AsString() on them. Before rendering, the lazy slice's ob_sval will be NULL. Afterwards it will point to a proper zero-terminated string, at which point the object behaves exactly like any other PyStringObject.
I had picked up on this comment, and I have to say that I had been a little surprised by the resistance to the change based on the "code would break" argument, when you had made such a thorough attempt to address this. Perhaps others had missed this point, though.
I genuinely don't know how many external Python extension modules are well-behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved.
There's code out there which was written to the Python 1.4 API, and has not been updated since (I know, I wrote some of it!) I wouldn't call it "well-behaved" (it writes directly into the string's character buffer) but I don't believe it would fail (it only uses PyString_AsString to get the buffer address). /* Allocate an Python string object, with uninitialised contents. We * must do it this way, so that we can modify the string in place * later. See the Python source, Objects/stringobject.c for details. */ result = PyString_FromStringAndSize(NULL, len); if (result == NULL) return NULL; p = PyString_AsString(result); while (*str) { if (*str == '\n') *p = '\0'; else *p = *str; ++p; ++str; }
Am I correct in understanding that changing the Python minor revision number (2.5 -> 2.6) requires external modules to recompile? (It certainly does on Windows.) If so, I could mitigate the problem by renaming ob_sval. That way, code making explicit reference to it would fail to compile, which I feel is better than silently recompiling unsafe code.
I think you've covered pretty much all the possible backward compatibility bases. A sufficiently evil extension could blow up, I guess, but that's always going to be true. OTOH, I don't have a comment on the desirability of the patch per se, as (a) I've never been hit by the speed issue, and (b) I'm thoroughly indoctrinated, so I always use ''.join() :-) Paul.
"Paul Moore" <p.f.moore@gmail.com> wrote:
I had picked up on this comment, and I have to say that I had been a little surprised by the resistance to the change based on the "code would break" argument, when you had made such a thorough attempt to address this. Perhaps others had missed this point, though.
I'm also concerned about future usability. Word in the Py3k list is that Python 2.6 will be just about the last Python in the 2.x series, and by directing his implementation at only Python 2.x strings, he's just about guaranteeing obsolescence. By building with unicode and/or objects with a buffer interface in mind, Larry could build with both 2.x and 3.x in mind, and his code wouldn't be obsolete the moment it was released. - Josiah
On Mon, 23 Oct 2006 09:07:51 -0700, Josiah Carlson <jcarlson@uci.edu> wrote:
"Paul Moore" <p.f.moore@gmail.com> wrote:
I had picked up on this comment, and I have to say that I had been a little surprised by the resistance to the change based on the "code would break" argument, when you had made such a thorough attempt to address this. Perhaps others had missed this point, though.
I'm also concerned about future usability.
Me too (perhaps in a different way though).
Word in the Py3k list is that Python 2.6 will be just about the last Python in the 2.x series, and by directing his implementation at only Python 2.x strings, he's just about guaranteeing obsolescence.
People will be using 2.x for a long time to come. And in the long run, isn't all software obsolete? :)
By building with unicode and/or objects with a buffer interface in mind, Larry could build with both 2.x and 3.x in mind, and his code wouldn't be obsolete the moment it was released.
(I'm not sure what the antecedent of "it" is in the above, I'm going to assume it's Python 3.x.) Supporting unicode strings and objects providing the buffer interface seems like a good idea in general, even disregarding Py3k. Starting with str is reasonable though, since there's still plenty of code that will benefit from this change, if it is indeed a beneficial change. Larry, I'm going to try to do some benchmarks against Twisted using this patch, but given my current time constraints, you may be able to beat me to this :) If you're interested, Twisted trunk@HEAD plus this trial plugin: http://twistedmatrix.com/trac/browser/sandbox/exarkun/merit/trunk will let you do some gross measurements using the Twisted test suite. I can give some more specific pointers if this sounds like something you'd want to mess with. Jean-Paul
Larry> The only function that *might* return a non-terminated char * is Larry> PyString_AsUnterminatedString(). This function is static to Larry> stringobject.c--and I would be shocked if it were ever otherwise. If it's static to stringobject.c it doesn't need a PyString_ prefix. In fact, I'd argue that it shouldn't have one so that people reading the code won't miss the "static" and think it is part of the published API. Larry> Am I correct in understanding that changing the Python minor Larry> revision number (2.5 -> 2.6) requires external modules to Larry> recompile? Yes, in general, though you can often get away without it if you don't mind Python screaming at you about version mismatches. Skip
Larry Hastings wrote:
Am I correct in understanding that changing the Python minor revision number (2.5 -> 2.6) requires external modules to recompile?
not, in general, on Unix. it's recommended, but things usually work quite well anyway. </F>
Larry Hastings schrieb:
Am I correct in understanding that changing the Python minor revision number (2.5 -> 2.6) requires external modules to recompile? (It certainly does on Windows.)
There is an ongoing debate on that. The original intent was that you normally *shouldn't* have to recompile modules just because the Python version changes. Instead, you should do so when PYTHON_API_VERSION changes. Of course, such a change would also cause a change to PYTHON_API_VERSION. Then, even if PYTHON_API_VERSION changes, you aren't *required* to recompile your extension modules. Instead, you get a warning that the API version is different and *might* require recompilation: it does require recompilation if the extension module relies on some of the changed API. With this change, people not recompiling their extension modules would likely see Python crash rather quickly after seeing the warning about incompatible APIs. Regards, Martin
On 23-Oct-2006, at 16:58 , Larry Hastings wrote:
I genuinely don't know how many external Python extension modules are well-behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well- behaved.
Apart from stringobject.c, there was exactly one spot in the Python source tree which made assumptions about the structure of PyStringObjects (Mac/Modules/macos.c). It's in the block starting with the comment "This is a hack:". Note that this is unfixed in my patch, so just now all code using that self-avowed "hack" will break.
As the author of that hack, that gives me an idea for where you should look for code that will break: code that tries to expose low- level C interfaces to Python. (That hack replaced an even earlier worse hack, that took the id() of a string in Python and added a fixed number to it to get at the address of the string, to fill it into a structure, blush). Look at packages such as win32, PyObjC, ctypes, bridges between Python and other languages, etc. That's where implementors are tempted to bend the rules of Official APIs for the benefit of serious optimizations. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman
On Oct 24, 2006, at 11:09 AM, Jack Jansen wrote:
Look at packages such as win32, PyObjC, ctypes, bridges between Python and other languages, etc. That's where implementors are tempted to bend the rules of Official APIs for the benefit of serious optimizations.
PyObjC should be safe in this regard, I try to conform to the official rules :-) I do use PyString_AS_STRING outside of the GIL in other extensions though, the lazy strings patch would break that. My code is of course bending the rules here and can easily be fixed by introducing a temporary variable. Ronald
-- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ ronaldoussoren%40mac.com
skip@pobox.com schrieb:
>> Anyway, it was my intent to post the patch and see what happened. >> Being a first-timer at this, and not having even read the core >> development mailing lists for very long, I had no idea what to >> expect. Though I genuinely didn't expect it to be this brusque.
Martin> I could have told you :-) The "problem" really is that you are Martin> suggesting a major, significant change to the implementation of Martin> Python, and one that doesn't fix an obvious bug.
Come on Martin. Give Larry a break.
I'm seriously not complaining, I'm explaining.
Lots of changes have been accepted to to the Python core which weren't obvious "bug fixes".
Surely many new features have been implemented over time, but in many cases, they weren't really "big changes", in the sense that you could ignore them if you don't like them. This wouldn't be so in this case: as the string type is very fundamental, people feel a higher interest in its implementation.
In fact, I seem to recall a sprint held recently in Reykjavik where the whole point was just to make Python faster.
That's true. I also recall there were serious complaints about the outcome of this sprint, and the changes to the struct module in particular. Still, the struct module is of lesser importance than the string type, so the concerns were smaller.
I believe that was exactly Larry's point in posting the patch. The "one obvious way to do" concatenation and slicing for one of the most heavily used types in python appears to be faster. That seems like a win to me.
Have you reviewed the patch and can vouch for its correctness, even in boundary cases? Have you tested it in a real application and found a real performance improvement? I have done neither, so I can't speak on the advantages of the patch. I didn't actually object to the inclusion of the patch, either. I was merely stating what I think the problems with "that kind of" patch are. Regards, Martin
Larry Hastings <larry@hastings.org> wrote:
It was/is my understanding that the early days of a new major revision was the most judicious time to introduce big changes. If I had offered these patches six months ago for 2.5, they would have had zero chance of acceptance. But 2.6 is in its infancy, and so I assumed now was the time to discuss sea-change patches like this.
It would be a radical change for Python 2.6, and really the 2.x series, likely requiring nontrivial changes to extension modules that deal with strings, and the assumptions about strings that have held for over a decade. I think 2.6 as an option is a non-starter. Think Py3k, and really, think bytes and unicode.
The "stringview" discussion you cite was largely speculation, and as I recall there were users in both camps ("it'll use more memory overall" vs "no it won't"). And, while I saw a test case with microbenchmarks, and a "proof-of-concept" where a stringview was a separate object from a string, I didn't see any real-word applications tested with this approach.
Rather than start in on speculation about it, I have followed that old maxim of "show me the code". I've produced actual code that works with real strings in Python. I see this as an opportunity for Pythonistas to determine the facts for themselves. Now folks can try the patch with these real-world applications you cite and find out how it really behaves. (Although I realize the Python community is under no obligation to do so.)
One of the big concerns brought up in the stringview discussion was that of users expecting one thing and getting another. Slicing a larger string producing a 'view', which then keeps the larger string alive, would be a surprise. By making it a separate object that just *knows* about strings (or really, anything that offers a buffer interface), I was able to make an object that was 1) flexible, 2) usable in any Python, 3) doesn't change the core assumptions about Python, 4) is expandable to beyond just *strings*. Reason #4 was my primary reason for writing it, because str disappears in Py3k, which is closer to happening than most of us realize.
If experimentation is the best thing here, I'd be happy to revise the patch to facilitate it. For instance, I could add command-line arguments letting you tweak the run-time behavior of the patch, like changing the minimum size of a lazy slice. Perhaps add code so there's a tweakable minimum size of a lazy concatenation too. Or a tweakable minimum *ratio* necessary for a lazy slice. I'm open to suggestions.
I believe that would be a waste of time. The odds of it making it into Python 2.x without significant core developer support are pretty close to None, which in Python 2.x is less than 0. I've been down that road, nothing good lies that way. Want my advice? Aim for Py3k text as your primary target, but as a wrapper, not as the core type (I put the odds at somewhere around 0 for such a core type change). If you are good, and want to make guys like me happy, you could even make it support the buffer interface for non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior of bytes and text. - Josiah
Josiah Carlson wrote:
It would be a radical change for Python 2.6, and really the 2.x series, likely requiring nontrivial changes to extension modules that deal with strings, and the assumptions about strings that have held for over a decade.
the assumptions hidden in everyone's use of the C-level string API is the main concern here, at least for me; radically changing the internal format is not a new idea, but it's always been held off because we have no idea how people are using the C API. </F>
Josiah Carlson wrote:
Want my advice? Aim for Py3k text as your primary target, but as a wrapper, not as the core type (I put the odds at somewhere around 0 for such a core type change). If you are good, and want to make guys like me happy, you could even make it support the buffer interface for non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior of bytes and text.
This is still my preferred approach, too - for local optimisation of an algorithm, a string view type strikes me as an excellent idea. For the core data type, though, keeping the behaviour comparatively simple and predictable counterbalances the desire for more speed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Larry Hastings <larry@hastings.org> wrote:
I've significantly enhanced my string-concatenation patch, to the point where that name is no longer accurate. So I've redubbed it the "lazy strings" patch.
[snip] Honestly, I don't believe that pure strings should be this complicated. The implementation of the standard string and unicode type should be as simple as possible. The current string and unicode implementations are, in my opinion, as simple as possible given Python's needs. As such, I don't see a need to go mucking about with the standard string implementation to make it "lazy" so as to increase performance, reduce memory consumption, etc.. However, having written a somewhat "lazy" string slicing/etc operation class I called a "string view", whose discussion and implementation can be found in the py3k list, I do believe that having a related type, perhaps with the tree-based implementation you have written, or a simple pointer + length variant like I have written, would be useful to have available to Python. I also believe that it smells like a Py3k feature, which suggests that you should toss the whole string reliance and switch to unicode, as str and unicode become bytes and text in Py3k, with bytes being mutable. - Josiah
On 2006/10/20, Larry Hastings wrote:
I'm ready to post the patch. Sheesh! Where does the time go.
I've finally found the time to re-validate and post the patch. It's SF.net patch #1590352: http://sourceforge.net/tracker/index.php?func=detail&aid=1590352&group_id=5470&atid=305470 I've attached both the patch itself (against the current 2.6 revision, 52618) and a lengthy treatise on the patch and its ramifications as I understand them. I've also added one more experimental change: a new string method, str.simplify(). All it does is force a lazy concatenation / lazy slice to render. (If the string isn't a lazy string, or it's already been rendered, str.simplify() is a no-op.) The idea is, if you know these consarned "lazy slices" are giving you the oft-cited horrible memory usage scenario, you can tune your app by forcing the slices to render and drop their references. 99% of the time you don't care, and you enjoy the minor speedup. The other 1% of the time, you call .simplify() and your code behaves as it did under 2.5. Is this the right approach? I dunno. So far I like it better than the alternatives. But I'm open to suggestions, on this or any other aspect of the patch. Cheers, /larry/
Larry Hastings <larry@hastings.org> wrote:
But I'm open to suggestions, on this or any other aspect of the patch.
As Martin, I, and others have suggested, direct the patch towards Python 3.x unicode text. Also, don't be surprised if Guido says no... http://mail.python.org/pipermail/python-3000/2006-August/003334.html In that message he talks about why view+string or string+view or view+view should return strings. Some are not quite applicable in this case because with your implementation all additions can return a 'view'. However, he also states the following with regards to strings vs. views (an earlier variant of the "lazy strings" you propose), "Because they can have such different performance and memory usage characteristics, it's not right to treat them as the same type." - GvR This suggests (at least to me) that unifying the 'lazy string' with the 2.x string is basically out of the question, which brings me back to my earlier suggestion; make it into a wrapper that could be used with 3.x bytes, 3.x text, and perhaps others. - Josiah
participants (18)
-
"Martin v. Löwis"
-
Bill Janssen
-
Bob Ippolito
-
Fredrik Lundh
-
Gregory P. Smith
-
Jack Jansen
-
Jean-Paul Calderone
-
Josiah Carlson
-
Larry Hastings
-
M.-A. Lemburg
-
Nick Coghlan
-
Nicko van Someren
-
Paul Moore
-
Ron Adam
-
Ronald Oussoren
-
skip@pobox.com
-
Steve Holden
-
Talin