[python-uk] Tell us what you did with Python this year....
Tim Golden
mail at timgolden.me.uk
Mon Dec 20 17:46:33 CET 2010
On 20/12/2010 16:08, Alec Battles wrote:
>Unicode
> interoperability is a pain, though, and I find it depressing to work
> with in Python2.x, because it never seems to behave predictably. I
> still have no idea why tokenizing Hungarian text and tokenizing German
> text are not fundamentally the same operation
I have no idea why they're not:
<code - untested>
import codecs
with codecs.open ("german.txt", "rb", encoding="utf8") as f:
german_text = f.read ()
with codecs.open ("hungarian.txt", "rb", encoding="utf8") as f:
hungarian_text = f.read ()
# do_stuff_with (german_text)
# do_stuff_with (hungarian_text)
</code>
Of course, I'm assuming that you know what encoding has been
used to serialise the text, but if you don't then it's not
Python's fault ;)
TJG
More information about the python-uk
mailing list