Splitting with Regular Expressions

qwweeeit qwweeeit at yahoo.it
Sat Mar 19 09:47:46 EST 2005


It' s my faute that I have not read more deeply the Library
Reference...
In any case the time "wasted" in developping small applications to
number lines and remove comments, triple quoted strings, multiline
instructions etc. has been "useful" to learn the language...
Now I already have the single tokens of a source saved in a file with
the reference to the corresponding line number.
An example is:
    
052 PROGNAME
052 sys.argv
052 0
053 AUTHOR
053 .encode
054 VERSION
056 URL_BASE
057 OUTPUT_HTML
058 OUTPUT_RSS
060 CSS
071 urllib.URLopener.version
072 urllib.FancyURLopener.prompt_user_passwd
072 lambda
072 self
072 host
072 realm
072 None
072 None
074 categories

The corresponding source is 
(from http://inigo.katxi.org/devel/misc/googlenews.py):
    
052 PROGNAME = sys.argv[0]
053 AUTHOR = u'Iñigo Serna'.encode('utf-8')
054 VERSION = '0.3'
055 
056 URL_BASE = 'http://news.google.com'
057 OUTPUT_HTML = 'news-%s-%s.html'
058 OUTPUT_RSS = 'news-%s-%s.xml'
059 
060 CSS = """<style type="text/css">
061   body { color: black; background: white; }
062   a { color : #003399; text-decoration : none; }
063   a:hover { color : #339900; text-decoration : none; }
064   a.main { font-size : 100%; }
065   span.text { font-size : 100%; }
066   span.data { color : #666666; font-size : 80%; }
067   a.other { color : #003366; font-size : 75%; }
068   a.other:hover { color : #339900; font-size : 75%; }
069 </style>"""
070 
071 urllib.URLopener.version = 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0; T312461)'
072 urllib.FancyURLopener.prompt_user_passwd = lambda self, host,
realm: (None, None)
073 
074 categories = ['w', 'n', 'b', 't', 's', 'e', 'm']   
 
As you can see there are no literal strings nor comments (they are
saved in an accompagning file).
Now I have "only" to sort them, display in a table or (if you prefer)
in an extended display in which all the references to lines are
expanded...
Of course many tokens (like for, in, if, not, etc.) will be eliminated
for space reasons and also because I already know how they are used.

I applied some years ago the same approach to understand an assembler
source...

In the case of python (and all others high level languages) it should
be very useful also the function tree or flow chart...



More information about the Python-list mailing list