Translating unicode data

John Machin sjmachin at lexicon.net
Tue Mar 24 22:14:44 EDT 2009


On Mar 24, 10:30 am, Scott David Daniels <Scott.Dani... at Acm.Org>
wrote:
> CaptainMcCrank wrote:
> > Hi list,
>
> > I'm struggling with a problem analyzing large amounts of unicode data
> > in an http wireshark capture.
> > I've solved the problem with the interpreter, but I'm not sure how to
> > do this in an automated fashion.
>
> > I'd like to grab a line from a text file & translate the unicode
> > sections of it to ascii.  So, for example
> > I'd like to take
> > "\u003cb\u003eMar 17\u003c/b\u003e"
>
> > and turn it into
>
> > "<b>Mar 17</b>"
>
> > I can handle this from the interpreter as follows:
>
> >>>> import unicodedata
> >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
> >>>> print mystring
> > <b>Mar 17</b>
>
> > But I don't know what I need to do to automate this!  The data that is
> > in the quotes from line 2 will have to come from a variable.  I am
> > unable to figure out how to do this using a variable rather than a
> > literal string.
>
> > Please help!
>
> You really need to say what version of Python you are working with,
> how the code you tried, and the results you got.

Always very good advice, not often taken :-)

> Using Python 3.1, I get:
>      >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
>      True

Using Python 2.1.3 I get:
 >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
 0
 >>> u"\u003cb\u003eMar 17\u003c/b\u003e" == u'<b>Mar 17</b>'
 1

But so what? AFAICT from the OP's description and his joyous response
to Peter's suggestion, what he has (in 3.0 syntax) is not
   "\u003cb\u003e etc"
it's
  b"\u003cb\u003e etc"

HTH,
John



More information about the Python-list mailing list