An efficient split function

Tim Peters tim_one at email.msn.com
Mon May 10 23:19:28 EDT 1999


[Andrew M. Kuchling]
> 	Note that your use of split(/\|/) in Perl requires using the
> regular expression engine, instead of a simple C splitting loop .  Try
> using a literal string instead of a regex, as in split('|', ...); that
> will probably even out the speeds.

[William S. Lear]
> Thanks for the suggestion, which I had tried originally, but got
> marginally worse performance than with the regexp.  For some reason, I
> did have to do split('\|') instead of split('|'), which I found curious.

Unless Perl has changed a lot since the last time I cared <wink>, the notion
that split will accept a literal string *as* a literal string is an
illusion:  string expressions are treated as regexps too, *typically* used
when the split pattern varies at runtime.  '|' as a regexp means "match the
empty string, or match the empty string", and so will split the line into
characters.  This is consistent with your need to spell it '\|' to get what
you wanted.  The "marginally worse" performance was also likely an
illusion -- should have been the same.

The easiest ways to speed the Python version:

1. Stick the whole thing in a function (local vrbl access is much cheaper
   than global).
2. Read more than one line at a time (e.g. try readlines with a largish
   "hint" argument).

anything-faster-than-doing-it-by-hand-is-excessive<wink>-ly y'rs  - tim






More information about the Python-list mailing list