Unicode and Python - how often do you index strings?
Tim Chase
python.list at tim.thechases.com
Tue Jun 3 21:11:54 EDT 2014
On 2014-06-04 10:39, Chris Angelico wrote:
> A current discussion regarding Python's Unicode support centres (or
> centers, depending on how close you are to the cent[er]{2} of the
> universe) around one critical question: Is string indexing common?
>
> Python strings can be indexed with integers to produce characters
> (strings of length 1). They can also be iterated over from beginning
> to end. Lots of operations can be built on either one of those two
> primitives; the question is, how much can NOT be implemented
> efficiently over iteration, and MUST use indexing? Theories are
> great, but solid use-cases are better - ideally, examples from
> actual production code (actual code optional).
Many of my string-indexing uses revolve around a sliding window which
can be done with itertools[1], though I often just roll it as
something like
n = 3
for i in range(1 + len(s) - n):
do_something(s[i:i+n])
So that could be supplanted by the SO iterator linked below.
The other use big case I have from production code involves a
column-offset delimited file where the headers have a row of
underscores under them delimiting the field widths, so it looks
something like
EmpID Name Cost Center
--------- ------------------- -----------------------------
314159 Longstocking, Pippi RJ45
265358 Davis, Miles JA22
979328 Bell, Alexander RJ15
I then take row 2 and use it to make a mapping of header-name to a
slice-object for slicing the subsequent strings:
import re
r = re.compile('-+') # a sequence of 1+ dashes
f = file("data.txt")
headers = next(f)
lines = next(f)
header_map = dict((
headers[i.start():i.end()].strip().upper(),
slice(i.start(), i.end())
)
for i in r.finditer(lines)
)
for row in f:
print("EmpID = %s" % row[header_map["EMPID"]].strip())
print("Name = %s" % row[header_map["NAME"]].strip())
# ...
which I presume uses string indexing under the hood.
Perhaps there's a better way of doing that, but it's what I currently
use to process these large-ish files (largest max out at 10-20MB each)
There might be other use-cases I've done, but those two leap to mind.
-tkc
[1]
http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python
More information about the Python-list
mailing list