[newbie] Preserving carriage returns when calling soup.body.text?
data:image/s3,"s3://crabby-images/fb1c4/fb1c4548d2bd8fea23256ca435536d0faf51fc48" alt=""
Hello, I can't find how to tell lxml/BS to preserve carriage returns in an HTML snippet when calling soup.body.text: After removing </br>'s, it also removes the CRLF that follows. ========== builder = LXMLTreeBuilderForXML(preserve_whitespace_tags=["body"]) rows = cur.fetchall() for row in rows: #BAD soup = BeautifulSoup(row["introtext"], builder=builder,features='lxml') soup = BeautifulSoup(row["intro"],features='lxml') print(soup.body.text) break ========== Is there an option? Thank you.
data:image/s3,"s3://crabby-images/fb1c4/fb1c4548d2bd8fea23256ca435536d0faf51fc48" alt=""
My mistake, I'm sorry. All the carriage returns were stripped in the input file. BS/lxml weren't to blame. Problem solved.
data:image/s3,"s3://crabby-images/fb1c4/fb1c4548d2bd8fea23256ca435536d0faf51fc48" alt=""
My mistake, I'm sorry. All the carriage returns were stripped in the input file. BS/lxml weren't to blame. Problem solved.
participants (1)
-
codecomplete@free.fr