[Tutor] Extract strings from a text file

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Fri Feb 27 12:49:10 CET 2009


On Fri, Feb 27, 2009 at 2:59 AM, wesley chun <wescpy at gmail.com> wrote:

> > There is a text file that looks like this:
> >
> > text text text <ID>Joseph</text text text>
> > text text text text text text text text text text text
> > text text text text text text text text text text text
> > text text text text text text text text text text text
> > text text text text text text text text text text text
> > text text text text text text text text text text text
> > text text text text text text text text text text text
> > text text text text text text text text text text text
> > text text text <Full name> Joseph Smith</text text text>
> > text text text <Rights> 1</text text text>
> > text text text <LDAP> 0</text text text>
> >
> > What I am trying to do is:
> >
> > 1. I need to extract the name and the full name from this text file. For
> > example: ( ID is Joseph & Full name is Joseph Smith).
>
>
> in addition to denis' suggestion of using regular expressions, you can
> also look at the xml.etree module and have ElementTree parse them into
> tags for you, so all you have to do is ask for the ID and "Full name"
> tags to get your data.
>
> good luck!
> -- wesley
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> "Core Python Programming", Prentice Hall, (c)2007,2001
> "Python Fundamentals", Prentice Hall, (c)2009
>    http://corepython.com
>
> wesley.j.chun :: wescpy-at-gmail.com
> python training and technical consulting
> cyberweb.consulting : silicon valley, ca
> http://cyberwebconsulting.com
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


Since I'm learning Pyparsing, this was a nice excercise. I've written this
elementary script which does the job well in light of the data we have

from pyparsing import *
ID_TAG = Literal("<ID>")
FULL_NAME_TAG1 = Literal("<Full")
FULL_NAME_TAG2 = Literal("name>")
END_TAG = Literal("</")
word = Word(alphas)
pattern1 = ID_TAG + word + END_TAG
pattern2 = FULL_NAME_TAG1 + FULL_NAME_TAG2 + OneOrMore(word) + END_TAG
result = pattern1 | pattern2

lines = open("lines.txt")# This is your file name
for line in lines:
    myresult = result.searchString(line)
    if myresult:
        print myresult[0]


# This prints out
['<ID>', 'Joseph', '</']
['<Full', 'name>', 'Joseph', 'Smith', '</']

# You can access the individual elements of the lists to pick whatever you
want


-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
http://emnawfal.googlepages.com
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090227/e47f7407/attachment.htm>


More information about the Tutor mailing list