x=something, y=somethinelse and z=crud all likely to fail - how do i wrap them up
Peter Otten
__peter__ at web.de
Sun Jan 31 05:40:12 EST 2016
Veek. M wrote:
> Chris Angelico wrote:
>
>> On Sun, Jan 31, 2016 at 3:58 PM, Veek. M <vek.m1234 at gmail.com> wrote:
>>> I'm parsing html and i'm doing:
>>>
>>> x = root.find_class(...
>>> y = root.find_class(..
>>> z = root.find_class(..
>>>
>>> all 3 are likely to fail so typically i'd have to stick it in a try.
>>> This is a huge pain for obvious reasons.
>>>
>>> try:
>>> ....
>>> except something:
>>> x = 'default_1'
>>> (repeat 3 times)
>>>
>>> Is there some other nice way to wrap this stuff up?
>>
>> I'm not sure what you're using to parse HTML here (there are several
>> libraries for doing that), but the first thing I'd look for is an
>> option to have it return a default if it doesn't find something - even
>> if that default has to be (say) None.
>>
>> But failing that, you can always write your own wrapper:
>>
>> def find_class(root, ...):
>> try:
>> return root.find_class(...)
>> except something:
>> return 'default_1'
>>
>> Or have the default as a parameter, if it's different for the different
>> ones.
>>
>> ChrisA
>
> I'm using lxml.html
>
> def parse_page(self, root):
> for li_item in root.xpath('//li[re:test(@id, "^item[a-z0-9]+$")]',
> namespaces={'re': "http://exslt.org/regular-expressions"}):
> description = li_item.find_class('vip')[0].text_content()
> link = li_item.find_class('vip')[0].get('href')
> price_dollar = li_item.find_class('lvprice prc')
> [0].xpath('span')[0].text
> bids = li_item.find_class('lvformat')[0].xpath('span')[0].text
>
> tme_time = li_item.find_class('tme')[0].xpath('span')
> [0].get('timems')
> if tme_time:
> time_hrs = int(tme_time)/1000 - time.time()
> else:
> time_hrs = 'No time found'
>
> shipping = li_item.find_class('lvshipping')
> [0].xpath('span/span/span')[0].text_content()"
>
> print('{} {} {} {} {}'.format(link, price_dollar, time_hrs,
> shipping, bids))
>
print('-----------------------------------------------------------------')
When you use XPath instead of the chained function calls your initial
> Pass the statement as a string to a try function?
idea works out naturally:
def parse_page(self, root):
def get_xpath(path, default="<not available>"):
result = li_item.xpath(path)
if result:
return " ".join(part.strip() for part in result)
return default
for li_item in root.xpath(
'//li[re:test(@id, "^item[a-z0-9]+$")]',
namespaces={'re': "http://exslt.org/regular-expressions"}):
description = get_xpath("*[@class='vip']//text()")
link = get_xpath("*[@class='vip']/@href")
price = get_xpath("*[@class='lvprice prc']/span/text()")
bids = get_xpath("*[@class='lvformat']/span/text()")
tme_time = get_xpath("*[@class='tme']/span/@timems", None)
if tme_time is not None:
time_hrs = int(tme_time)/1000 - time.time()
else:
time_hrs = "No time found"
shipping = get_xpath(
"*[@class='lvshipping']/span/span/span//text()")
More information about the Python-list
mailing list