[Tutor] parse text file
Norman Khine
norman at khine.net
Tue Feb 2 15:33:17 CET 2010
hello,
thank you all for the advise, here is the updated version with the changes.
import re
file = open('producers_google_map_code.txt', 'r')
data = repr( file.read().decode('utf-8') )
get_records = re.compile(r"""openInfoWindowHtml\(.*?\\ticon:
myIcon\\n""").findall
get_titles = re.compile(r"""<strong>(.*)<\/strong>""").findall
get_urls = re.compile(r"""a href=\"\/(.*)\">En savoir plus""").findall
get_latlngs = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""").findall
records = get_records(data)
block_record = []
for record in records:
namespace = {}
titles = get_titles(record)
title = titles[-1] if titles else None
urls = get_urls(record)
url = urls[-1] if urls else None
latlngs = get_latlngs(record)
latlng = latlngs[-1] if latlngs else None
block_record.append( {'title':title, 'url':url, 'lating':latlng} )
print block_record
On Tue, Feb 2, 2010 at 1:27 PM, Kent Johnson <kent37 at tds.net> wrote:
> On Tue, Feb 2, 2010 at 4:16 AM, Norman Khine <norman at khine.net> wrote:
>
>> here are the changes:
>>
>> import re
>> file=open('producers_google_map_code.txt', 'r')
>> data = repr( file.read().decode('utf-8') )
>
> Why do you use repr() here?
i have latin-1 chars in the producers_google_map_code.txt' file and
this is the only way to get it to read the data.
is this incorrect?
>
>> get_record = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
>> get_title = re.compile(r"""<strong>(.*)<\/strong>""")
>> get_url = re.compile(r"""a href=\"\/(.*)\">En savoir plus""")
>> get_latlng = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")
>>
>> records = get_record.findall(data)
>> block_record = []
>> for record in records:
>> namespace = {}
>> titles = get_title.findall(record)
>> for title in titles:
>> namespace['title'] = title
>
>
> This is odd, you don't need a loop to get the last title, just use
> namespace['title'] = get_title.findall(html)[-1]
>
> and similarly for url and latings.
>
> Kent
>
>
>> urls = get_url.findall(record)
>> for url in urls:
>> namespace['url'] = url
>> latlngs = get_latlng.findall(record)
>> for latlng in latlngs:
>> namespace['latlng'] = latlng
>> block_record.append(namespace)
>>
>> print block_record
>>>
>>> The def of "namespace" would be clearer imo in a single line:
>>> namespace = {title:t, url:url, lat:g}
>>
>> i am not sure how this will fit into the code!
>>
>>> This also reveals a kind of name confusion, doesn't it?
>>>
>>>
>>> Denis
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> la vita e estrany
>>>
>>> http://spir.wikidot.com/
>>> _______________________________________________
>>> Tutor maillist - Tutor at python.org
>>> To unsubscribe or change subscription options:
>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
More information about the Tutor
mailing list