[Pythonmac-SIG] Unicode and split
Jeremy Reichman
jaharmi at jaharmi.com
Fri May 23 20:52:41 CEST 2008
Thanks to everyone who replied!
I'll take a further look into the encoding of the file because I'm
interested in that for other reasons. In the output I saw, u"\xe1" (and a
few others I found after sending my note) were prevalent around the splits.
For the moment, though, I've solved my immediate difficulty by splitting
twice. I really only need the space delimited fields that appear after a tab
in each line, and the characters causing problems are always before that. I
split by tab first and then a normal split of that gets me to the fields I
need.
--
Jeremy
More information about the Pythonmac-SIG
mailing list