[Tutor] unicode decode/encode issue

bruce badouglas at gmail.com
Mon Sep 26 12:59:04 EDT 2016


Hi.

Ive got a "basic" situation that should be simpl. So it must be a user (me)
issue!


I've got a page from a web fetch. I'm simply trying to go from utf-8 to
ascii. I'm not worried about any cruft that might get stripped out as the
data is generated from a us site. (It's a college/class dataset).

I know this is a unicode issue. I know I need to have a much more
robust/ythnic/correct approach. I will later, but for now, just want to
resolve this issue, and get it off my plate so to speak.

I've looked at stackoverflow, as well as numerous other sites, so I turn to
the group for a pointer or two...

The unicode that I'm dealing with is 'u\2013'

The basic things I've done up to now are:

  s=content
  s=ascii_strip(s)
  s=s.replace('\u2013', '-')
  s=s.replace(u'\u2013', '-')
  s=s.replace(u"\u2013", "-")
  s=re.sub(u"\u2013", "-", s)
  print repr(s)

When I look at the input content, I have :

 u'English 120 Course Syllabus \u2013 Fall \u2013 2006'

So, any pointers on replacing the \u2013 with a simple '-' (dash) (or I
could even handle just a ' ' (space)

thanks


More information about the Tutor mailing list