[Tutor] printing the links of a page (regular expressions)
Alfonso
traviesomono at yahoo.es
Sat May 6 17:03:01 CEST 2006
Kent Johnson wrote:
> Alfonso wrote:
>
>> I'm writing a script to retrieve and print some links of a page. These
>> links begin wiht "/dog/", so I use a regular expresion to try to find
>> them. The problem is that the script only retrieves a link per line in
>> the page. I mean, if the line hat several links, the script only reports
>> the first. I can't find where is the mistake. Does anyone hat a idea,
>> what I have false made?
>>
>
> You are reading the data by line using readlines(). You only search each
> line once. regex.findall() or regex.finditer() would be a better choice
> than regex.search().
>
> You might also be interested in sgmllib-based solutions to this problem,
> which will generally be more robust than regex-based searching. For
> example, see
> http://diveintopython.org/html_processing/extracting_data.html
> http://www.w3journal.com/6/s3.vanrossum.html#MARKER-9-26
>
> Kent
>
>
>> Thank you very much for your help.
>>
>>
>> import re
>> from urllib import urlopen
>>
>> fileObj = urlopen("http://name_of_the_page")
>> links = []
>> regex = re.compile ( "((/dog/)[^ \"\'<>;:,]+)",re.I)
>>
>> for a in fileObj.readlines():
>> result = regex.search(a)
>> if result:
>> print result.group()
>>
>>
>>
>>
>> ______________________________________________
>> LLama Gratis a cualquier PC del Mundo.
>> Llamadas a fijos y móviles desde 1 céntimo por minuto.
>> http://es.voice.yahoo.com
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
Thank you very much, Kent, it works with findall(). I will also have a
look at the links about
sgmllib.
______________________________________________
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com
More information about the Tutor
mailing list