[Tutor] parse text file

Norman Khine norman at khine.net
Tue Feb 2 15:33:17 CET 2010


hello,
thank you all for the advise, here is the updated version with the changes.

import re
file = open('producers_google_map_code.txt', 'r')
data = repr( file.read().decode('utf-8') )

get_records = re.compile(r"""openInfoWindowHtml\(.*?\\ticon:
myIcon\\n""").findall
get_titles = re.compile(r"""<strong>(.*)<\/strong>""").findall
get_urls = re.compile(r"""a href=\"\/(.*)\">En savoir plus""").findall
get_latlngs = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""").findall

records = get_records(data)
block_record = []
for record in records:
	namespace = {}
	titles = get_titles(record)
	title = titles[-1] if titles else None
	urls = get_urls(record)
	url = urls[-1] if urls else None
	latlngs = get_latlngs(record)
	latlng = latlngs[-1] if latlngs else None
	block_record.append( {'title':title, 'url':url, 'lating':latlng} )

print block_record


On Tue, Feb 2, 2010 at 1:27 PM, Kent Johnson <kent37 at tds.net> wrote:
> On Tue, Feb 2, 2010 at 4:16 AM, Norman Khine <norman at khine.net> wrote:
>
>> here are the changes:
>>
>> import re
>> file=open('producers_google_map_code.txt', 'r')
>> data =  repr( file.read().decode('utf-8') )
>
> Why do you use repr() here?

i have latin-1 chars in the producers_google_map_code.txt' file and
this is the only way to get it to read the data.

is this incorrect?

>
>> get_record = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
>> get_title = re.compile(r"""<strong>(.*)<\/strong>""")
>> get_url = re.compile(r"""a href=\"\/(.*)\">En savoir plus""")
>> get_latlng = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")
>>
>> records = get_record.findall(data)
>> block_record = []
>> for record in records:
>>        namespace = {}
>>        titles = get_title.findall(record)
>>        for title in titles:
>>                namespace['title'] = title
>
>
> This is odd, you don't need a loop to get the last title, just use
>  namespace['title'] = get_title.findall(html)[-1]
>
> and similarly for url and latings.
>
> Kent
>
>
>>        urls = get_url.findall(record)
>>        for url in urls:
>>                namespace['url'] = url
>>        latlngs = get_latlng.findall(record)
>>        for latlng in latlngs:
>>                namespace['latlng'] = latlng
>>        block_record.append(namespace)
>>
>> print block_record
>>>
>>> The def of "namespace" would be clearer imo in a single line:
>>>    namespace = {title:t, url:url, lat:g}
>>
>> i am not sure how this will fit into the code!
>>
>>> This also reveals a kind of name confusion, doesn't it?
>>>
>>>
>>> Denis
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> la vita e estrany
>>>
>>> http://spir.wikidot.com/
>>> _______________________________________________
>>> Tutor maillist  -  Tutor at python.org
>>> To unsubscribe or change subscription options:
>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>


More information about the Tutor mailing list