[Tutor] html checker

eryksun eryksun at gmail.com
Mon Oct 1 22:23:22 CEST 2012


On Mon, Oct 1, 2012 at 11:45 AM, Matthew Dalrymple
<computer_dude15 at hotmail.com> wrote:
>
> Im trying to write an html syntax checker...pretty much read an imported
> file, if it has all the opening and closing "<" and ">" it will return True
> and if it doesn't it will return False.

It's just this htmlChecker function that you need help with, right?
Below I've indented your code by 4 spaces and split it into sections.
I've placed simple code suggestions in-line with my comments. I hope
it's easy enough to follow.


    def htmlChecker():
        fname = input('What is the name of the file you would like to open: ')
        infile = open(fname, 'r+')


You should generalize this by having the function take "fname" as an
argument and moving the "input" line to another function that calls
htmlChecker. Keep the back-end processing separate from the user
interface.

Also, why are you opening in read-write mode? Use "open(fname)" to
open the file in read-only text mode.


        for line in infile:
            data = infile.readline()


Python 3 doesn't raise an error for this, but Python 2 would.

(1) You're already iterating through the file by line, i.e. "for line
in infile", so why do you call readline() in the body of the loop?

(2) This repeatedly assigns a line from the file to the name "data".
It does not extend "data", nor would that be generally advisable since
it's inefficient to repeatedly allocate, copy, and deallocate memory
to extend a string.

(3) If you want the entire contents of the file in "data", use either
"data = infile.read()" to get it all as one string, or "data =
infile.readlines()"  (note the "s" in "readlines") to load the lines
of the file into a list.


        s = stack()
        s1 = stack()


Stack s1 seems to be for the contents of the tag. You can just use a
list for this, i.e. "s1 = []". Actually, you can use a list as a
stack, too, but I assume you were instructed to use this particular
stack data structure.

A stack is just like a stack of plates. You can push a plate on top,
and pop one off. The "list.append" method functions to "push" a value
on the top (end of the list), and "list.pop()" defaults to removing
the last item if you don't specify an index (like popping off the top
of a stack).


        for ch in infile:


(1) File objects iterate by line, not by character.
(2) You've already exhausted the iterator from the last loop, so this
loop immediately terminates.

Let's assume you've read the entire file into memory as a string with
"data = infile.read()". Then you can iterate over the characters with
"for ch in data".


            if ch =='<':
                if s.isEmpty():
                    s.push(ch)
                else:
                    return s.isEmpty()


Why are you returning s.isEmpty()? Does this method return something
other than True/False? Otherwise, I think you should explicitly
"return False".


            if ch =='>':
                s.pop()


This should use "elif", not "if". ch can't be both '<' and '>' (it's
made of classical bits, not quantum qubits), so there's no reason to
test for '>' if it already matched '<'.


            if not s.isEmpty()and ch!='<':
                s1.push(ch)


As above, but this line should be "elif not s.isEmpty()". If you make
s1 a list, you can append the character with "s1.append(ch)".


Finally:


        print(s1)


If s1 is an iterable sequence, you can print the string using either
"print(''.join(s1))" or "print(*s1, sep='')". Also, at this point
should s1 be emptied?


More information about the Tutor mailing list