[Tutor] [Fwd: Re: Spanish text in BS problem]

Ismael Garrido ismaelgf at adinet.com.uy
Thu Nov 10 02:35:54 CET 2005


Found the problem myself.
(look down)

Ismael Garrido wrote:

> This is the script:
>
> import BeautifulSoup
> import os
>
> a = open("zona.htm")
> text = a.readlines()
> a.close()
>
> BS = BeautifulSoup.BeautifulSoup(str(text))

Apparently, str(text) is the cause of the problem. If instead I do: 
"".join(text) it all works allright. I guess this is because str 
converts 'ó' to '\xf3' while "".join() does not change the strings in 
any way. Now the output from BS makes sense.

Bye,
Ismael

> for ed in BS('span', {'class':'ed_ant_fecha'}):
>    fecha = ed.next.split(" ")[1].replace(".","-")
>    urlynombre = ed.findNextSibling().findNextSibling().findNextSibling()
>    url = 'http://espectador.com/' + urlynombre.get('href')
>    nombre = urlynombre.next.next
>
>    print url
>    print "D:/dolina/"+fecha, nombre
>    print
> ###end




More information about the Tutor mailing list