[Tutor] beautiful soup raw text workarounds?
nathan Smith
nathan-tech at hotmail.com
Tue Aug 24 16:15:22 EDT 2021
I actually fixed this myself.
It was a perspective issue.
I was looking at:
tags=soup.find_all()
which returns tags only.
Raw text are not tags.
Sorry for the time waster.
On 24/08/2021 21:06, nathan Smith wrote:
> Hi List,
>
>
> I'm using beautiful soup to pass a website which is all going well.
>
> I'm having problems though with getting it to include the raw text,
> that is to say, text not in any tag.
>
> I've done some Googling on this and it seems beautiful soup does not
> support the text outside of tags? Fair enough!
>
> I was wondering how I could work around this issue?
>
> For instance, is there like, tag.endpos next_tag.startpos so I could
> do raw-text=text[endpos:nextpos]
>
>
> I've included the web page below for reference so you can see what I
> mean. the thing I am stuck on is below h2.
>
>
> Nathan '
>
>
> Website:
>
> <html>
>
> <head>
>
> <title>This is my website</title>
>
> </head>
>
> <body>
>
> <h1>Headings</h1>
>
> <p>Paragraphs and such.</p>
>
> <h2>Another heading.</h2>
>
> This text here doesn't <br/>
>
> want to show in bs.
>
> </body>
>
> </html>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Ftutor&data=04%7C01%7C%7C7171480747aa46d96a6908d9673ae7b0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637654324947862911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2wl4lWEZnztcPR6f%2BzUBOH7qIDOEoapygjPTOWEkIRY%3D&reserved=0
>
More information about the Tutor
mailing list