[Chicago] Python Development in Chicago

Fri Oct 26 17:31:36 CEST 2007

On 10/26/07, Kevin L. Stern <kevin.l.stern at gmail.com> wrote:
> I'm doing some research in graph theory and I have a set of graphs (known as
> the AT&T set) that is in a seemingly proprietary format - I want to swap it
> into the graphml XML format.  I hacked together a little Python script for
> this.  Would you folks say that this is 'pythonic', or does this look
> newbie'ish?

there is something in python that is somewhat wart-like and easy to
miss in the docs (yet criticized often).  Whenever you want to run
some code in a module, say, if you want to treat that module like a
command line script (like you are), you should really always put that
code within this if statement:

if __name__ == '__main__':
   # code goes here that you want to run from the command line

why?  There is no good reason for this so my first inclination is to
answer like this: Just do it!

For the non-conformists out there, here's really why.  Python assigns
the magical value '__main__' to <module>.__name__ when it is the
top-level script, or as the lib ref at
http://docs.python.org/lib/module-main.html says :

"This module represents the (otherwise anonymous) scope in which the
interpreter's main program executes -- commands read either from
standard input, from a script file, or from an interactive prompt. It
is this environment in which the idiomatic ``conditional script''
stanza causes a script to run: " [see above]

So, without that if statement your code will obviously still run but
the code will also run *whenever* the module is imported.  Might not
be a big deal today, but tomorrow when you refactor your code or run
anything that imports modules at will (pydoc, pudge, nosetests, the
list goes on) then you run the dangerous risk of executing your main
code at unexpected times.

I will refrain from linking to the python list complaints about how
unintuitive and obscure this is but rest assured it has been discussed
many times with many alternatives offered, yet no resolution.

K

PS. are you new to the Chicago area or just new to Python or both?

>
> ____________________________________________________________________
>
> import re, mmap, os
>
> class token:
>         def __init__(self):
>                 self.type = None
>                 self.data = None
>
> class tokenizer:
>         def __init__(self, inmap, out):
>                 self.inmap = inmap
>                 self.out = out
>
>         def nextToken(self):
>                 line = inmap.readline()
>                  if re.search("^graph\s.*\s{$", line):
>                         ident = line[6:len(line)-3]
>                         result = token()
>                         result.type = 'graph'
>                          result.data = ident
>                         return result
>                 elif re.search("^\s*subgraph.*{$", line):
>                         parse = re.search("subgraph\s.*\s{$", line).group()
>                         ident = line[10:len(line)-3]
>                         result = token()
>                         result.type = 'subgraph'
>                         result.data = ident
>                         return result
>                 elif re.search("^\s*}$", line):
>                         result = token()
>                         result.type = 'endgroup'
>                         return result
>                 elif re.search("^\s*n\d+\s--\sn\d+;$", line):
>                         parse = re.search("n\d+\s--\sn\d+", line).group()
>                         split = parse.partition('--')
>                         first = re.search("\d+", split[0]).group()
>                         last = re.search("\d+", split[2]).group()
>                         result = token()
>                         result.type = 'edge'
>                         result.data = [first,last]
>                         return result
>                 return None
>
>         def processToken(self, t):
>                 if not t:
>                         return
>                 if t.type == 'graph':
>                         self.out.write('<graph id="%s">\n' % t.data)
>                 elif t.type == 'subgraph':
>                         self.out.write('<graph id="%s">\n' % t.data)
>                         self.sg += 1
>                 elif t.type == 'endgroup':
>                         self.out.write ('</graph>\n')
>                         if self.sg > 0:
>                                 self.sg -= 1
>                 elif t.type == 'edge':
>                         self.out.write('<edge source="%s" target="%s"/>\n' %
> (t.data[0], t.data[1]))
>
>         def go(self):
>                 self.sg = 0
>                 self.out.write("""<?xml version="1.0" encoding="UTF-8"?>
> <graphml>
> """)
>                 while self.inmap.tell() <  self.inmap.size ():
>                         lex.processToken(lex.nextToken())
>
>                 self.out.write("</graphml>")
>
> try:
>         infile = "ug.txt"
>         insize = os.path.getsize(infile)
>         fd = open(infile, "r+")
>         inmap = mmap.mmap(fd.fileno(), insize, None, mmap.ACCESS_READ)
>         outfile = "out.txt"
>         out = open(outfile, "r+")
>         lex = tokenizer(inmap, out)
>         lex.go()
> except IOError:
>         print "IO Error Occurred"
> finally:
>         inmap.close()
>         out.close()
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>