[Tutor] Fw: unicode nightmare

ALAN GAULD alan.gauld at btinternet.com
Thu Nov 11 09:41:49 CET 2010


forward to the list

 
Alan Gauld
Author of the Learn To Program website
http://www.alan-g.me.uk/




----- Forwarded Message ----
> From: danielle davout <danielle.davout at gmail.com>
> To: Alan Gauld <alan.gauld at btinternet.com>
> Sent: Thursday, 11 November, 2010 4:19:22
> Subject: Re: [Tutor] unicode nightmare
> 
> First thanks to answer ... even if I couldn't  manage to be clear  in
> formulating my question
> It already helps ;) I was confident  enough  that somebody of this
> helpful group would answer to my  S.O.S
> to manage to have a night of sleep without nigthmare and   a night  of
> sleep even short helps too :)
> 
> I am able now better resume my  problem and it doesn't look anymore
> like   a unicode problem at all.
>   (below, If necessary .., some of lines of codes that conduct me to
> this  conclusion)
> In my script I construct a generator I want to use later  on;  for this
> I make the use of what I thought an accessory (convenient)  variable, I
> do my business ...
> and when I'm ready to use the  generator  it has changed "because" in
> the mean time I have reused the  same name of variable.
> If I say I go as far as later on  to remove  altogether the variable by
> the  statement del(v)
> I have a traceback  complaining NameError: global name 'v' is not defined
> If I don't reuse the  same name (say as  a quick test I put an exit()
> before using it)
> I  have no complaint.
> If I take care to change the name, not using any more loop  to
> construct all the generators I need,   my script gives the results  I
> expected.
> 
> It looks very puzzling to me and I would like to have some  pointers to
> be able to understand one day what have happened to me  :)
> Thanks,
> 
> 
> 
> On Thu, Nov 11, 2010 at 7:46 AM, Alan Gauld <alan.gauld at btinternet.com>  
wrote:
> >
> > "danielle davout" <danielle.davout at gmail.com>  wrote
> >
> >> I simplify it to
> >> v = u'\u0eb4'
> >>  X = (1,)
> >> gen = ((v ,v) for x in X for y in  X)
> >>
> >> What can be so wrong in this line, around it to give  the 1lined file
> >> ໄ:ໄ
> >> where ໄ "is" not u'\u0eb4' but   u'\u0ec4' though a direct printing looks
> >> OK
> >
> > The  code will produce a one line file with v repeated twice.
> > Now why do you  think the character is different?
> > What have you done to check  it?
> >
> > What do you mean by a direct printing?
> >
> > print  v
> >
> > maybe?
> yes
> >> To write the file corresponding to  my nth generator of my list h I use
> >>   def ecrire(n):
> >>        f= codecs.open("G"+str(n),"w","utf8")
> >>       for x, tx in  h[n]:
> >>           f.write((x + U":"+ tx))
> >>            f.write('\n')
> >
> > Personally I'd use
> >
> >  f.write(U"%s:%s\n" % (x,tx))
> ok
> > but thats largely a matter of style  preference I guess.
> > But why do you have double parens in the first  print?
> remaining of several trials...
> >> But In its non simplified  form
> >>   h.append( (x + v + y ,tr[x]+ tr[v]+ tr[y]) for x in CC for y  in OFC) )
> >> before I  have a chance to write anything in the file  G5
> >> I have got the KeyError: u'\u0ec4'
> >> yes tr is a  dictionary that doesn't have u'\u0ec4' as a key
> >> but tr[v] is well  defined ...
> >
> > OK, but the error is valid in that case.
> >  Which implies that you have bad data in CC.
> 
>  I'm sorry, I was not very  clear on that point... nor CC nor OFC
> contains u'\u0ec4'
> and I haven't no  problem with for example
> h.append(((u"ເ" + x + y, tr[x]+e2 +tr[y])   for  x in CC for y in OFC))
> if I define tr[u'\u0ec4'] I obtain a result but that  is wrong on the
> first part x + v + y
> every thing looks like *the generator  was not made with the value of v
> that I believe to give*
> > What exactly  are you asking?
> If I write :
> 114    v = u'\u0eb4'
> 115     X = (1,)
> 116    h.append( ((v ,v) for x in X for y in  X))
> 117    print v
> 118    #exit()
> v prints  nicely
> If I remove the last comment, v still prints nicely but I get the  traceback
> Traceback (most recent call last):
>   File  "/home/dan/workspace/lao/src/renaud/tt.py", line 261, in  <module>
>     ecrire(n)
>   File  "/home/dan/workspace/lao/src/renaud/tt.py", line 254, in ecrire
>      for x, tx in h[n]:
>   File "/home/dan/workspace/lao/src/renaud/tt.py",  line 116, in <genexpr>
>     h.append( ((v ,v) for x in X for y  in X))
> NameError: global name 'v' is not defined
> 
> Where has gone my  variable v ? Why it is considered as global as soon
> I use it in a  generator?
> 
> If in 114 I write
> v = w = u'\u0eb4'
> and I delete w by  del(w) and still want to print it  I'm only told
> that "name 'w' is not  defined"
> Why changing the value of v 200 lines below I can change the value  of
> the generator
> Definitely it is not a unicode problem but of global  variable
> If I write
> w = u'\u0eb4'
> h.append( (x +w + y , tr[x]+ tr[w]+  tr[y]) for x in CC for y in OFC) )
> my file G5 is  generated
> ກິກ:kik
> ກິມ:kim
> ກິວ:kiv
> .....
> 


More information about the Tutor mailing list