[Tutor] Extracting comments from a file
Kent Johnson
kent37 at tds.net
Mon Feb 22 13:39:48 CET 2010
On Mon, Feb 22, 2010 at 1:06 AM, Lao Mao <laomao1975 at googlemail.com> wrote:
> Hi,
>
> I have an html file, with xml style comments in:
>
> <!--
> Some comments here
> Blah
> ...
> -->
>
> I'd like to extract only the comments. My sense of smell suggests that
> there's probably a library (maybe an xml library) that does this already.
Take a look at BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/documentation.html
Your code will look something like this (untested):
from BeautifulSoup import BeautifulSoup, Comment
data = open('myfile.html').read()
soup = BeautifulSoup(data)
current = soup
while current:
if isinstance(current, Comment):
print current.string
current = current.next
> Otherwise, my current alogorithm looks a bit like this:
>
> * Iterate over file
> * If current line contains <!---
> - Toggle 'is_comment' to yes
> * If is_comment is yes, print the line
> * If current line contains -->
> - Toggle 'is_comment' to no
>
> This feels crude, but is it effective, or ok?
It will break on comments like
<!-- This is a comment <!-- still the same comment -->
It will print too much if the comment doesn't start and end at the
start and end of the line.
Kent
>
> Thanks,
>
> Laomao
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
More information about the Tutor
mailing list