[Tutor] working with strings in python3

Chris Angelico rosuav at gmail.com
Tue Apr 19 05:22:32 CEST 2011


On Tue, Apr 19, 2011 at 12:16 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> See Joel on Software for more:
>
> http://www.joelonsoftware.com/articles/fog0000000319.html

The bulk of that article is reasonable; he's right in that a good
programmer MUST have at least some understanding of what's happening
on the lowest level. He seems to consider C strings to be
fundamentally bad, though; which isn't quite fair. See, a C-style
ASCIIZ string can scale up to infinity - the Pascal strings he
mentions are limited to 255 bytes, and while a forward-thinker might
have gone as far as a 32-bit length (not guaranteed, and quite
wasteful if you have billions of short strings - imagine if your
32-bit arithmetic functions are double effort for the CPU), in today's
world it's not that uncommon to work with 4GB or more of data. ASCIIZ
may not be the most efficient for strcatting onto, but you shouldn't
strcat in a loop like that anyway; rather than the mystrcat that he
offered, it's better to have a mystrcpy (called strmov in several
libraries) that's identical to strcpy but returns the end of the
string. Identical to his version but without the dest++ scan first,
and used in the same way but without bothering to put the starting \0
in the buffer.

Ultimately, though, every method of joining strings is going to have
to deal with the "first strlen, then strcpy" issue. I haven't looked
at Python's guts, but I would expect that list joining does this; and
one of the simplest ways to code a StringBuffer object (which I've
used on occasion in raw C) is to simply save all the pointers and then
build the string at the end, which is really the same as
"".join(list). (And yes, I know this depends on the memory still being
allocated. I knew what I was doing when I took that shortcut.)

"Anyway. Life just gets messier and messier down here in byte-land.
Aren't you glad you don't have to write in C anymore?"

Nope. I'm glad that I *can* write in C still. Well, actually I use C++
because I prefer the syntax, but there are plenty of times when I want
that down-on-the-metal coding.

Oh, and by the way. XML sucks if you want performance... and it's so
easy to abuse that I don't really see that it has much value for "data
structures" outside of file transfers. You package your data up in
XML, send it to the other end, they unpack it and turn it into what
they want. End of XMLness. And if you want anything binary ("hey guys,
here's the icon that I want you to display with this thing", for
instance), it gets messier. Much neater to avoid it altogether.

Chris Angelico



More information about the Python-list mailing list