[Tutor] Error with incorrect encoding

Alan Gauld alan.gauld at btinternet.com
Thu Apr 17 18:44:31 CEST 2008


I don't know the cause of the error here but I will say that
parsing HTML with regular expressions is fraught with difficulty
unless you know that the HTML will be suitably formatted
in advance.

You may be better off using one of the HTML parsing
modules such as HTMLParser or even the more powerful
BeautifulSoup.

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld



"Oleg Oltar" <oltarasenko at gmail.com> wrote in message 
news:b4fc2ad80804150820y7ae54b6dw8c7fea4981821fd2 at mail.gmail.com...
>I am trying to parse an html page. Have following error while doing 
>that
>
>
> src = sel.get_html_source()
>        links = re.findall(r'<a class="al4"[^<]*</a>', src)
>        for link in links:
>            print link
>
>
>
> ======================================================================
> ERROR: test_new (__main__.NewTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "<stdin>", line 19, in test_new
> UnicodeEncode    Error: 'ascii' codec can't encode character u'\xae' 
> in
> position 90: ordinal not in range(128)
>
> ----------------------------------------------------------------------
> Ran 1 test in 6.345s
>


--------------------------------------------------------------------------------


> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 




More information about the Tutor mailing list