I got up in the middle of the night and wrote the email - and it shows.<br>Apologies for creating confusion. My comments below.<br><br>-Chetan<br><br><div><span class="gmail_quote">On 10/18/06, <b class="gmail_sendername">
<a href="mailto:python-dev-request@python.org">python-dev-request@python.org</a></b> <br></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>Date: Wed, 18 Oct 2006 13:04:14 -0700<br>From: Larry Hastings <<a href="mailto:larry@hastings.org">larry@hastings.org</a>><br>Subject: Re: [Python-Dev] PATCH submitted: Speed up + for string Re:<br> PATCHsubmitted: Speed up + for string concatenation, now as fast as
<br> "".join(x) idiom<br>To: <a href="mailto:python-dev@python.org">python-dev@python.org</a><br>Message-ID: <<a href="mailto:453688BE.2060509@hastings.org">453688BE.2060509@hastings.org</a>><br>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
<br><br>Chetan Pandya wrote:<br>> The deallocation code needs to be robust for a complex tree - it is<br>> currently not recursive, but needs to be, like the concatenation code.<br>It is already both those things.<br>
<br>Deallocation is definitely recursive. See Objects/stringobject.c,<br>function (*ahem*) recursive_dealloc. That Py_DECREF() line is where it<br>recurses into child string concatenation objects.<br><br>You might have been confused because it is *optimized* for the general
<br>case, where the tree only recurses down the left-hand side. For the<br>left-hand side it iterates, instead of recursing, which is both slightly<br>faster and much more robust (unlikely to blow the stack).</blockquote>
<div><br>Actually I looked at the setting of ob_sstrings to NULL in recursive_dealloc and thought none of the strings will get destroyed as the list is destroyed. However it is only setting the first element to NULL, which is fine.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Rendering occurs if the string being concatenated is already a<br>> concatenation object created by an earlier assignment.
<br>Nope. Rendering only occurs when somebody asks for the string's value,<br>not when merely concatenating. If you add nine strings together, the<br>ninth one fails the "left side has room" test and creates a second object.
</blockquote><div><br>I don't know what I was thinking. In the whole of string_concat() there is no call to render the string, except for the right recursion case. <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Try stepping through it. Run Python interactively under the debugger.<br>Let it get to the prompt. Execute some expression like "print 3", just<br>so the interpreter creates its concatenated encoding object (I get
<br>"encodings.cp437"). Now, in the debugger, put a breakpoint in the<br>rendering code in recursiveConcatenate(), and another on the "op =<br>(PyStringConcatenationObject *)PyObject_MALLOC()" line in
<br>string_concat. Finally, go back to the Python console and concatenate<br>nine strings with this code:<br> x = ""<br> for i in xrange(9):<br> x += "a"<br>You won't hit any breakpoints for rendering, and you'll hit the string
<br>concatenation object malloc line twice. (Note that for demonstration<br>purposes, this code is more illustrative than running x = "a" + "b" ...<br>+ "i" because the peephole optimizer makes a constant folding pass.
<br>It's mostly harmless, but for my code it does mean I create<br>concatenation objects more often.)</blockquote><div><br>I don't have a patch build, since I didn't download the revision used by the patch. </div>However, I did look at values in the debugger and it looked like x in your example above had a reference count of 2 or more within string_concat even when there were no other assignments that would account for it. My idea was to investibate this, but this was the whole reason for saying that the concatenation will create new objects. However, I ran on another machine under debugger and I get the reference count as 1, which is what I would expect. I need to find out what has happened to my work machine.
<br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">In the interests of full disclosure, there is *one* scenario where pure<br>string concatenation will cause it to render. Rendering or deallocating
<br>a recursive object that's too deep would blow the program stack, so I<br>limit recursion depth on the right seven slots of the recursion object.<br>That's what the "right recursion depth" field is used for. If you
<br>attempt to concatenate a string concatenation object that's already at<br>the depth limit, it renders the deep object first. The depth limit is<br>2**14 right now.</blockquote><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
You can force this to happen by prepending like crazy:<br> x = ""<br> for i in xrange(2**15):<br> x = "a" + x<br><br>Since my code is careful to be only iterative when rendering and<br>deallocating down the left-hand side of the tree, there is no depth
<br>limit for the left-hand side.</blockquote><div><br>The recursion limit seems to be optimistic, given the default stack limit, but of course, I haven't tried it. There is probably a depth limit on the left hand side as well, since recursiveConcatenate is recursive even on the left side.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Step before you leap,<br><br><br>/larry/<br></blockquote></div><br>