[Tutor] Extracting comments from a file

Mon Feb 22 13:39:48 CET 2010

On Mon, Feb 22, 2010 at 1:06 AM, Lao Mao <laomao1975 at googlemail.com> wrote:
> Hi,
>
> I have an html file, with xml style comments in:
>
> <!--
> Some comments here
> Blah
> ...
> -->
>
> I'd like to extract only the comments.  My sense of smell suggests that
> there's probably a library (maybe an xml library) that does this already.

Take a look at BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/documentation.html

Your code will look something like this (untested):

from BeautifulSoup import BeautifulSoup, Comment
data = open('myfile.html').read()
soup = BeautifulSoup(data)
current = soup

while current:
    if isinstance(current, Comment):
        print current.string
    current = current.next

> Otherwise, my current alogorithm looks a bit like this:
>
> * Iterate over file
> * If current line contains <!---
>   - Toggle 'is_comment' to yes
> * If is_comment is yes, print the line
> * If current line contains -->
>   - Toggle 'is_comment' to no
>
> This feels crude, but is it effective, or ok?

It will break on comments like
<!-- This is a comment <!-- still the same comment -->

It will print too much if the comment doesn't start and end at the
start and end of the line.

Kent
>
> Thanks,
>
> Laomao
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>