[Tutor] grave confusion

Danny Yoo dyoo at hashcollision.org
Mon Oct 6 19:46:53 CEST 2014


Alan has pointed out that your loop here:

    for line_in in file.readline():
        ...

has a much different meaning that the one you intend.  It means: "for
every character in the first line of the file: ..."

The reason is because "file.readline()" returns a line of your file as
a string.  A string is a sequence of characters.  Loops work on
sequences of things, so in the loop above, it will walk over the
sequence of characters.


That being said, if you're trying to parse HTML with regular
expressions, you probably want to reconsider.  Instead, you might want
to look into a dedicated parser for that task such as Beautiful Soup.
http://www.crummy.com/software/BeautifulSoup/  The problem with a
regular expressions approach is that it's easy to code up a fragile,
broken solution.

See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags


More information about the Tutor mailing list