
Jan. 28, 2023
6:48 p.m.
Hello, Some columns in a DB have badly formed HTML, to the point BeautifulSoup (lxml?) fails: ============= #Some records start with 0A</crap> soup = BeautifulSoup("\n</strong>", 'lxml') #AttributeError: 'NoneType' object has no attribute 'text' print(soup.body.text) ============= What would be a nice way to solve the problem? Is there a command to remove wrong tags altogether (eg. strings that starts with </strong>), or should I just catch the error? Thank you.