[Tutor] search/match file position q

Clayton Kirkwood crk at godblessthe.us
Tue Oct 7 21:53:19 CEST 2014



!-----Original Message-----
!From: Danny Yoo [mailto:dyoo at hashcollision.org]
!Sent: Tuesday, October 07, 2014 11:14 AM
!To: Clayton Kirkwood
!Cc: Python Tutor Mailing List
!Subject: Re: [Tutor] search/match file position q
!
!> So, what makes regex wrong for this job? question still remains: does
!> the search start at the beginning of the line each time or does it
!> step forward from the last search? I will check out beautiful soup as
!> suggested in a subsequent mail; I'd still like to finish this
!> process:<}}
!
!
!Mathematically, regular expressions can capture a certain class of text
!called the "regular languages".  Regular languages have a few
!characteristics.  As a concrete example of a limitation: you can't write
!a pattern that properly does parentheses matching with a regular
!expression alone.
!
!This isn't a challenge to your machismo: it's a matter of mathematics!
! For the precise details on the impossibility proof, you'd need to take
!a CS theory class, and in particular, learn about the "pumping lemma for
!regular expressions".  Sipser's "Introduction to the Theory of
!Computation" has a good presentation.  This is one reason why CS theory
!matters: it can tell you when some approach is not a good idea.
!:P
!
!HTML is not a regular language: it has nested substructure.  The same
!problem about matching balanced parentheses is essentially that of
!matching start and end tags.
!
!So that's the objections from the purely mathematical point of view.
!This is not to say that regular expressions are useless: they work well
!for breaking down HTML into a sequence of tokens.  If you only care
!about processing individual tokens at a time, regexes might be
!appropriate.  They're just not the best tool for everything.  From a
!practical point of view: HTML parsing libraries such as Beautiful Soup
!are nicer to work with than plain regular expressions.

In this case, I was able to determine which line I was interested in because it had a specific marker. From that point, I knew specific markers to look for for each desired field. I thought the desired parenthesis couple was assigned to the variable at the beginning of the match line. I thought that regex's 
Were meant to skip over unwanted detritus and grab only the desired match with in the parentheses.

Wrong?

TIA,

Clayton




More information about the Tutor mailing list