f python?

Chris Angelico rosuav at gmail.com
Sun Apr 8 15:27:29 EDT 2012


On Mon, Apr 9, 2012 at 5:14 AM, Kaz Kylheku <kaz at kylheku.com> wrote:
> Not only can compilers compress storage by recognizing that string literals are
> the suffixes of other string literals, but a lot of string manipulation code is
> simplified, because you can treat a pointer to interior of any string as a
> string.

I'm not sure about the value of tail recursion in C, but this is
definitely a majorly useful feature, as is the related technique of
parsing by dropping null bytes into the string (see for instance the
strtok function, which need not do any memory movement; I wrote a CSV
parser that works the same way). Often I use both techniques
simultaneously, for instance in parsing this sort of string:

"A:100 B:200 C:300"

First, tokenize on the spaces by looking for a space, retaining a
pointer, and putting in a NUL:
char *next=strchr(str,' '); if (!next) break; *next++=0;
Then read a character, and increment the pointer through that string
as you parse.

Try doing THAT in a high level language without any memory copying.
And "without any memory copying" may not be important with this
trivial example, but suppose you've just read in a huge CSV file to
parse - maybe 16MB in the normal case, with no actual limit other than
virtual memory. (And yes, I read the whole thing in at once, because
it comes from a Postgres database and reading it in pieces would put
more load on the central database server.)

Don't get me wrong, I wouldn't want to do _everything_ in C; but I
also wouldn't want to do everything in length-preceded strings. The
nearest equivalent that would be able to use the shared buffer is a
length-external string like BASIC uses (or used, back when I used to
write BASIC code and 8086 assembly to interface with it) - a string
"object" consists of a length and a pointer. But that has issues with
freeing up memory, if you're using parts of a string.

ChrisA



More information about the Python-list mailing list