Translating unicode data

Tue Mar 24 22:14:44 EDT 2009

On Mar 24, 10:30 am, Scott David Daniels <Scott.Dani... at Acm.Org>
wrote:
> CaptainMcCrank wrote:
> > Hi list,
>
> > I'm struggling with a problem analyzing large amounts of unicode data
> > in an http wireshark capture.
> > I've solved the problem with the interpreter, but I'm not sure how to
> > do this in an automated fashion.
>
> > I'd like to grab a line from a text file & translate the unicode
> > sections of it to ascii.  So, for example
> > I'd like to take
> > "\u003cb\u003eMar 17\u003c/b\u003e"
>
> > and turn it into
>
> > "<b>Mar 17</b>"
>
> > I can handle this from the interpreter as follows:
>
> >>>> import unicodedata
> >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
> >>>> print mystring
> > <b>Mar 17</b>
>
> > But I don't know what I need to do to automate this!  The data that is
> > in the quotes from line 2 will have to come from a variable.  I am
> > unable to figure out how to do this using a variable rather than a
> > literal string.
>
> > Please help!
>
> You really need to say what version of Python you are working with,
> how the code you tried, and the results you got.

Always very good advice, not often taken :-)

> Using Python 3.1, I get:
>      >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
>      True

Using Python 2.1.3 I get:
 >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
 0
 >>> u"\u003cb\u003eMar 17\u003c/b\u003e" == u'<b>Mar 17</b>'
 1

But so what? AFAICT from the OP's description and his joyous response
to Peter's suggestion, what he has (in 3.0 syntax) is not
   "\u003cb\u003e etc"
it's
  b"\u003cb\u003e etc"

HTH,
John