[Tutor] pickle in unicode format

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Tue Apr 5 20:33:23 CEST 2005


> > I have this dictionnary :
> >
> > a={'partition': u'/export/diskH1/home_evol/ricquebo',
> >  'rsmFirstname': u'Fran\xe7ois',
> >  'rsmLastname': u'Ricquebourg',
> >  'size': u'8161222.0',
> >  'size_max': '1'}
> >
> > and I'd like to *serialize* it with pickle and that the output format
> > will be of type unicode.
>
> I'm not sure what you mean by this. Do you mean that you want the actual
> pickled data to be a unicode string? Or that you want to be able to
> pickle something that contains unicode strings?

The first interpretation doesn't make sense to me either: pickled data is
a 'byte' representation of your data structures.  Why do you need it to be
treated as a unicode string?



> > unicode(pickle.dumps(a)) doesn't work !

Can you show us why this doesn't work for you?  I can guess why, but it is
much better if we don't have to guess what the error message looks like.
I suspect you're seeing something like this:

######
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
ordinal not in range(128)
######

but again, I hate guessing.  *grin*

Show us exactly what you're seeing as an error message: don't just say it
doesn't work, because that doesn't give us enough to know if the error or
bug that we're seeing is the same one that you're seeing.  Problems often
can have multiple causes, which is why we need to see the one you're
getting stuck on.



I think it'll help if we see what you intend to do with the result of
pickle.dumps().


Are you trying to send it off to someone else as a part of an XML
document?  If you are including some byte string into an XML document, you
can encode those bytes as base64:

######
>>> bytes = 'Fran\xe7ois'
>>> encodedBytes = bytes.encode('base64')
>>> encodedBytes
'RnJhbudvaXM=\n'
######

This produces a transformed byte string that pass through the unicode()
function with no damage:

#######
>>> unicodeEncodedBytes = unicode(encodedBytes)
>>> unicodeEncodedBytes
u'RnJhbudvaXM=\n'
#######

and this can written right into an XML document.  It can also be later
decoded back to get the original byte string:

######
>>> str(unicodeEncodedBytes).decode("base64")
'Fran\xe7ois'
######


If you have more questions, please feel free to ask!



More information about the Tutor mailing list