[Python-Dev] Internal representation of strings and Micropython
Serhiy Storchaka
storchaka at gmail.com
Wed Jun 4 18:49:18 CEST 2014
04.06.14 18:38, Paul Sokolovsky написав(ла):
>> Any non-trivial text parsing uses indices or regular expressions (and
>> regular expressions themself use indices internally).
>
> I keep hearing this stuff, and unfortunately so far don't have enough
> time to collect all that stuff and provide detailed response. So,
> here's spur of the moment response - hopefully we're in the same
> context so it is easy to understand.
>
> So, gentlemen, you keep mixing up character-by-character random access
> to string and taking substrings of a string.
>
> Character-by-character random access imply that you would need to scan
> thru (possibly almost) all chars in a string. That's O(N) (N-length of
> string). With varlength encoding (taking O(N) to index arbitrary char),
> there's thus concern that this would be O(N^2) op.
>
> But show me real-world case for that. Common usecase is scanning string
> left-to-right, that should be done using iterator and thus O(N).
> Right-to-left scanning would be order(s) of magnitude less frequent, as
> and also handled by iterator.
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't
use iterators. They use indices, str.find and/or regular expressions.
Common use case is quickly find substring starting from current position
using str.find or re.search, process found token, advance position and
repeat.
More information about the Python-Dev
mailing list