raw_input() and utf-8 formatted chars
7stud
bbxx789_05ss at yahoo.com
Thu Nov 1 22:21:03 EDT 2007
On Oct 13, 12:42 pm, MRAB <goo... at mrabarnett.plus.com> wrote:
> You can
> decode that into the actual UTF-8 string with decode("string_escape"):
>
> s = raw_input('Enter: ') #A\xcc\x88
> s = s.decode("string_escape")
>
Ahh. Thanks for that.
>On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>
> > And what is it that your keyboard enters to produce an 'a' with an umlaut?
>
> *I* just hit the ä key. The one right next to the ö key. ;-)
>
BeautifulSoup can convert an html entity representing an 'A' with
umlaut, e.g.:
Ä
into an without every touching my keyboard. How does BeautifulSoup
do it?
from BeautifulSoup import BeautifulStoneSoup as bss
s1 = "<h1>Ä</h1>" #&_Auml;_
#I added the comment after the line to show the
#format of the html entity. In case a browser
#might render the comment into the actual character,
#I added underscores to the html entity:
soup = bss(s1)
text = soup.contents[0].string #gets the 'A' with umlaut out of the
html
new_s = bss(text, convertEntities=bss.HTML_ENTITIES)
print repr(new_s)
print new_s
I see the same output for both print statements, and what I see is an
'A' with umlaut. I expected that the first print statement would show
the utf-8 encoding for the character.
More information about the Python-list
mailing list