[omaha] Tidy Help

Jeff Hinrichs - DM&T jeffh at dundeemt.com
Sun May 18 05:28:14 CEST 2008


Hmmm.  So, I'm a bit confused.  Is it the installer for utidylib that
is causing the grief?  Or are there problems using the utidylib (2004)
with a modern version of Python?

Since the tidylib bindings are apparently out of date, have you
considered using the cmd line exe and then controlling it via process?
Not quite as nice as having a proper binding to a dll, but workable.

On Sat, May 17, 2008 at 5:49 PM, Mike Hostetler <mike at hostetlerhome.com> wrote:
> The problem is that there are a lot of HTML files that are in different
> degrees of bad.  They are all bad, but some are different degrees of bad.
>  So we are using both Tidy and BeautifulSoup.  Tidy to normalize them to
> something sane and then BeautifuSoup to parse what we want out of it.
> Using the Soup by itself is slow and gives varying results.  Using Tidy
> first and then the Soup is faster and gives us more consist results.
>
>
> On May 17, 2008, at 12:15 PM, Jeff Hinrichs - DM&T wrote:
>
>> Yes,
>>
>>
>> http://www.crummy.com/software/BeautifulSoup/documentation.html#Printing%20a%20Document
>>
>> On Sat, May 17, 2008 at 9:10 AM, Burch Kealey <bkealey at mail.unomaha.edu>
>> wrote:
>>>
>>>  I don't know-does BS clean up bad html for parsing?
>>>
>>>  Burch T. Kealey, PhD.
>>>  RH-CBA 408-N
>>>  University of Nebraska at Omaha
>>>  6000 Dodge Street
>>>  Omaha Nebraska  68104
>>>  402-554-3571
>>>  This message (including any attachments) contains confidential
>>>  information
>>>  intended for a specific individual and purpose, and is protected by
>>>  law.  If
>>>  you are not the intended recipient, you should delete this
>>>  message.  Any
>>>  disclosure, copying, or distribution of this message, or the taking of
>>>  any
>>>  action based on it, is strictly prohibited.
>>> _______________________________________________
>>> Omaha Python Users Group mailing list
>>> Omaha at python.org
>>> http://mail.python.org/mailman/listinfo/omaha
>>> http://www.OmahaPython.org
>>>
>>
>>
>>
>> --
>> Jeff Hinrichs
>> Dundee Media & Technology, Inc
>> jeffh at dundeemt.com
>> 402.218.1473
>> web: www.dundeemt.com
>> blog: inre.dundeemt.com
>> _______________________________________________
>> Omaha Python Users Group mailing list
>> Omaha at python.org
>> http://mail.python.org/mailman/listinfo/omaha
>> http://www.OmahaPython.org
>
> Mike Hostetler
> mike at hostetlerhome.com
> http://mike.hostetlerhome.com
>
>
>
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org
> http://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>



-- 
Jeff Hinrichs
Dundee Media & Technology, Inc
jeffh at dundeemt.com
402.218.1473
web: www.dundeemt.com
blog: inre.dundeemt.com


More information about the Omaha mailing list