how to avoid leading white spaces
Nobody
nobody at nowhere.com
Sat Jun 4 15:44:56 EDT 2011
On Sat, 04 Jun 2011 13:41:33 +1200, Gregory Ewing wrote:
>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries
>
> But is there any need for the Boyer-Moore algorithm to
> operate on characters?
>
> Seems to me you could just as well chop the UTF-16 up
> into bytes and apply Boyer-Moore to them, and it would
> work about as well.
No, because that won't care about alignment. E.g. on a big-endian
architecture, if you search for '\u2345' in the string '\u0123\u4567', it
will find a match (at an offset of 1 byte).
More information about the Python-list
mailing list