x=something, y=somethinelse and z=crud all likely to fail - how do i wrap them up

Sun Jan 31 01:31:45 EST 2016

Veek. M wrote:

> Chris Angelico wrote:
> 
>> On Sun, Jan 31, 2016 at 3:58 PM, Veek. M <vek.m1234 at gmail.com> wrote:
>>> I'm parsing html and i'm doing:
>>>
>>> x = root.find_class(...
>>> y = root.find_class(..
>>> z = root.find_class(..
>>>
>>> all 3 are likely to fail so typically i'd have to stick it in a try.
>>> This is a huge pain for obvious reasons.
>>>
>>> try:
>>>  ....
>>> except something:
>>>  x = 'default_1'
>>> (repeat 3 times)
>>>
>>> Is there some other nice way to wrap this stuff up?
>> 
>> I'm not sure what you're using to parse HTML here (there are several
>> libraries for doing that), but the first thing I'd look for is an
>> option to have it return a default if it doesn't find something - even
>> if that default has to be (say) None.
>> 
>> But failing that, you can always write your own wrapper:
>> 
>> def find_class(root, ...):
>>     try:
>>         return root.find_class(...)
>>     except something:
>>         return 'default_1'
>> 
>> Or have the default as a parameter, if it's different for the different
>> ones.
>> 
>> ChrisA
> 
> I'm using lxml.html
> 
>     def parse_page(self, root):
>         for li_item in root.xpath('//li[re:test(@id, "^item[a-z0-9]+$")]',
> namespaces={'re': "http://exslt.org/regular-expressions"}):
>             description = li_item.find_class('vip')[0].text_content()
>             link = li_item.find_class('vip')[0].get('href')
>             price_dollar = li_item.find_class('lvprice prc')
> [0].xpath('span')[0].text
>             bids = li_item.find_class('lvformat')[0].xpath('span')[0].text
> 
>             tme_time = li_item.find_class('tme')[0].xpath('span')
> [0].get('timems')
>             if tme_time:
>                 time_hrs = int(tme_time)/1000 - time.time()
>             else:
>                 time_hrs = 'No time found'
> 
>             shipping = li_item.find_class('lvshipping')
> [0].xpath('span/span/span')[0].text_content()"
>             
>             print('{} {} {} {} {}'.format(link, price_dollar, time_hrs,
> shipping, bids))
>             
print('-----------------------------------------------------------------')

Someone suggested i refactor the find_class/xpath into wrapper functions but 
i tried it and it didn't look all that great..

Just give me a general idea of how to deal with messy crud like this..