PEP 393 vs UTF-8 Everywhere
no.email at nospam.invalid
Sat Jan 21 04:14:20 EST 2017
Chris Angelico <rosuav at gmail.com> writes:
> You can't do a look-ahead with a vanilla string iterator. That's
> necessary for a lot of parsers.
For JSON? For other parsers you usually have a tokenizer that reads
characters with maybe 1 char of lookahead.
> Yes, which gives a two-level indexing (first find the strand, then the
> character), and that's going to play pretty badly with CPU caches.
If you're jumping around at random all over the string, you probably
really want a bytearray rather than a unicode string. If you're
scanning sequentually you won't have to look at the outer table very
More information about the Python-list