<br><br><div class="gmail_quote">On Fri, Feb 27, 2009 at 2:59 AM, wesley chun <span dir="ltr"><<a href="mailto:wescpy@gmail.com">wescpy@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> There is a text file that looks like this:<br>
><br>
> text text text <ID>Joseph</text text text><br>
> text text text text text text text text text text text<br>
> text text text text text text text text text text text<br>
> text text text text text text text text text text text<br>
> text text text text text text text text text text text<br>
> text text text text text text text text text text text<br>
> text text text text text text text text text text text<br>
> text text text text text text text text text text text<br>
> text text text <Full name> Joseph Smith</text text text><br>
> text text text <Rights> 1</text text text><br>
> text text text <LDAP> 0</text text text><br>
><br>
> What I am trying to do is:<br>
><br>
> 1. I need to extract the name and the full name from this text file. For<br>
> example: ( ID is Joseph & Full name is Joseph Smith).<br>
<br>
<br>
in addition to denis' suggestion of using regular expressions, you can<br>
also look at the xml.etree module and have ElementTree parse them into<br>
tags for you, so all you have to do is ask for the ID and "Full name"<br>
tags to get your data.<br>
<br>
good luck!<br>
-- wesley<br>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -<br>
"Core Python Programming", Prentice Hall, (c)2007,2001<br>
"Python Fundamentals", Prentice Hall, (c)2009<br>
<a href="http://corepython.com" target="_blank">http://corepython.com</a><br>
<br>
wesley.j.chun :: <a href="http://wescpy-at-gmail.com" target="_blank">wescpy-at-gmail.com</a><br>
python training and technical consulting<br>
cyberweb.consulting : silicon valley, ca<br>
<a href="http://cyberwebconsulting.com" target="_blank">http://cyberwebconsulting.com</a><br>
_______________________________________________<br>
Tutor maillist - <a href="mailto:Tutor@python.org">Tutor@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/tutor" target="_blank">http://mail.python.org/mailman/listinfo/tutor</a><br>
</blockquote></div><br><br clear="all">Since I'm learning Pyparsing, this was a nice excercise. I've written this elementary script which does the job well in light of the data we have<br><br>from pyparsing import *<br>
ID_TAG = Literal("<ID>")<br>FULL_NAME_TAG1 = Literal("<Full") <br>FULL_NAME_TAG2 = Literal("name>")<br>END_TAG = Literal("</")<br>word = Word(alphas)<br>pattern1 = ID_TAG + word + END_TAG<br>
pattern2 = FULL_NAME_TAG1 + FULL_NAME_TAG2 + OneOrMore(word) + END_TAG<br>result = pattern1 | pattern2<br><br>lines = open("lines.txt")# This is your file name<br>for line in lines:<br> myresult = result.searchString(line)<br>
if myresult:<br> print myresult[0]<br><br><br># This prints out<br>['<ID>', 'Joseph', '</']<br>['<Full', 'name>', 'Joseph', 'Smith', '</']<br>
<br># You can access the individual elements of the lists to pick whatever you want<br><br><br>-- <br>لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي<br>"No victim has ever been more repressed and alienated than the truth"<br>
<br>Emad Soliman Nawfal<br>Indiana University, Bloomington<br><a href="http://emnawfal.googlepages.com">http://emnawfal.googlepages.com</a><br>--------------------------------------------------------<br>