[Patches] [ python-Patches-1590352 ] The "lazy strings" patch

SourceForge.net noreply at sourceforge.net
Sun Mar 11 21:16:37 CET 2007


Patches item #1590352, was opened at 2006-11-04 06:30
Message generated for change (Comment added) made by paulhankin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1590352&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The "lazy strings" patch

Initial Comment:
This patch consists of three changes to CPython:
 * changing PyStringObject.ob_sval,
 * "lazy concatenations", and
 * "lazy slices".
None of these changes adds new functionality to CPython;
they are all speed or memory optimizations.


In detail:

PyStringObject.ob_sval was changed from a char[] array
to a char *.  This is not in and of itself particularly
desirable.  It was necessary in order to implement the
other two changes.

"lazy concatenations" change string concatenation ("a" + "b") so that,
instead of directly calculating the resulting string, it returns a
placeholder object representing the result.  As a result, string
concatenation in CPython is now more than 150% faster on average (as
reported by pystone 2.0), and is approximately as fast as the standard
string concatenation idiom ("".join([a + b + c])).

"lazy slices" changes string slicing ("abc"[1], "a".strip()) so
that, instead of directly calculating the resulting string, it
returns a placeholder object representing the result.  As a result,
string slicing in CPython is now more than 60% faster on average
(as reported by pystone 2.0).

When considering this patch, please keep in mind that the "lazy" changes
are distinct, and could be incorporated independently.  In particular
I'm guessing that "lazy concatenations" have a lot higher chance of
being accepted than "lazy slices".


These changes were implemented almost entirely in
Include/stringobject.h and Objects/stringobject.c.

With this patch applied, trunk builds and passes all expected tests
on Win32 and Linux.


For a more thorough discussion of this patch, please see the attached
text file(s).

----------------------------------------------------------------------

Comment By: Paul Hankin (paulhankin)
Date: 2007-03-11 20:16

Message:
Logged In: YES 
user_id=1740099
Originator: NO

Hi Larry,
It doesn't sound too promising - I'm new and have no powers of
resurrection :(

By strict aliasing, I just meant it's illegal to access members of one
type if the object is of a different (incompatible) type (actually I was
wrong, this isn't the strict aliasing rule - it's a more fundamental one).
In your case, it means it's illegal to pass a concat object where a string
object is expected, even if the function accesses members that are common
to them both. If this is happening, the answer is to make a union with the
string object and cat object as members, and to use this union type instead
but it's not pretty.

I suggest this patch is closed anyway. If you still believe in your code
and think that lazy string cats have support, I suggest making a new patch
with just those in (fixed up to be correct C, and PEP 7 compliant).


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-03-11 18:19

Message:
Logged In: YES 
user_id=364875
Originator: YES

Howdy!  Much has transpired since I posted this patch.
* Guido expressed interest in having it in Py3k.
* I ported it to Py3k; it's Python patch #1629305 on SourceForge.
* Guido didn't like it, specifically discussing the pathological behavior
of "lazy slices".
* I created a "v2 lazy slices" that eliminated the pathological behavior
but added a lot of complexity.
* I ran a poll on the Py3k mailing list to see how interested people were
in "lazy concatenation" and "v2 lazy slices".  Most people were +1 on lazy
concatenation, and -1 on lazy slices (v1 or v2), a position I can
completely endorse.  However, no Python luminaries replied, which--given
the patch's checkered past--seemed like a vote of no-confidence.
* Guido closed patch #1629305.

Is there life after Guido patch-closing?  I'd be happy to spend the time
answering your questions if my patch had some sort of future.  (Though
you'll have to tell me what you mean by "break strict aliasing".)

----------------------------------------------------------------------

Comment By: Paul Hankin (paulhankin)
Date: 2007-03-11 17:27

Message:
Logged In: YES 
user_id=1740099
Originator: NO

I really like the idea of the lazy cats, and can believe that it's a
really good optimisation, but before I review this code properly I'd like
to see:
a. convincing that it doesn't break strict aliasing (a casual reading
suggests it does)
b. lazy slices removed into their own patch (or just removed) - I don't
want to recommend a patch containing them
c. adherence to coding standard
d. a little more explanation of how the cat objects work: it's important
because they're a future minefield of bugs.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1590352&group_id=5470


More information about the Patches mailing list