Unicode and Python - how often do you index strings?
Roy Smith
roy at panix.com
Tue Jun 3 21:18:12 EDT 2014
In article <mailman.10656.1401842403.18130.python-list at python.org>,
Chris Angelico <rosuav at gmail.com> wrote:
> A current discussion regarding Python's Unicode support centres (or
> centers, depending on how close you are to the cent[er]{2} of the
> universe)
<sarcasm style="regex-pedant">Um, you mean cent(er|re), don't you? The
pattern you wrote also matches centee and centrr.</sarcasm>
> around one critical question: Is string indexing common?
Not in our code. I've got 80008 non-blank lines of Python (2.7) source
handy. I tried a few heuristics to find patterns which might be string
indexing.
$ find . -name '*.py' | xargs egrep '\[[^]][0-9]+\]'
and then looked them over manually. I see this pattern a bunch of times
(in a single-use script):
data['shard_key'] = hashlib.md5(str(id)).hexdigest()[:4]
We do this once:
if tz_offset[0] == '-':
We do this somewhere in some command-line parsing:
process_match = args.process[:15]
There's this little gem:
return [dedup(x[1:-1].lower()) for x in
re.findall('(\[[^\]\[]+\]|\([^\)\(]+\))',title)]
It appears I wrote this one, but I don't remember exactly what I had in
mind at the time...
withhyphen = number if '-' in number else (number[:-2] + '-' +
number[-2:]) # big assumption here
Anyway, there's a bunch more, but the bottom line is that in our code,
indexing into a string (at least explicitly in application source code)
is a pretty rare thing.
More information about the Python-list
mailing list