[Tutor] unicode nightmare

Thu Nov 11 01:46:26 CET 2010

"danielle davout" <danielle.davout at gmail.com> wrote

> I simplify it to
> v = u'\u0eb4'
> X = (1,)
> gen = ((v ,v) for x in X for y in X)
>
> What can be so wrong in this line, around it to give the 1lined file
> ໄ:ໄ
> where ໄ "is" not u'\u0eb4' but  u'\u0ec4' though a direct printing 
> looks OK

The code will produce a one line file with v repeated twice.
Now why do you think the character is different?
What have you done to check it?

What do you mean by a direct printing?

print v

maybe?

> To write the file corresponding to my nth generator of my list h I 
> use
>    def ecrire(n):
>        f= codecs.open("G"+str(n),"w","utf8")
>        for x, tx in h[n]:
>            f.write((x + U":"+ tx))
>            f.write('\n')

Personally I'd use

f.write(U"%s:%s\n" % (x,tx))

but thats largely a matter of style preference I guess.
But why do you have double parens in the first print?

> But In its non simplified form
>    h.append( (x + v + y ,tr[x]+ tr[v]+ tr[y]) for x in CC for y in 
> OFC) )
> before I  have a chance to write anything in the file G5
> I have got the KeyError: u'\u0ec4'
> yes tr is a dictionary that doesn't have u'\u0ec4' as a key
> but tr[v] is well definied ...

OK, but the error is valid in that case.
Which implies that you have bad data in CC.

What exactly are you asking?

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/