[Pythonmac-SIG] Unicode and split

Jeremy Reichman jaharmi at jaharmi.com
Fri May 23 20:52:41 CEST 2008


Thanks to everyone who replied!

I'll take a further look into the encoding of the file because I'm
interested in that for other reasons. In the output I saw, u"\xe1" (and a
few others I found after sending my note) were prevalent around the splits.

For the moment, though, I've solved my immediate difficulty by splitting
twice. I really only need the space delimited fields that appear after a tab
in each line, and the characters causing problems are always before that. I
split by tab first and then a normal split of that gets me to the fields I
need.


-- 
Jeremy




More information about the Pythonmac-SIG mailing list