[Tutor] python regex help

Arun Tomar tomar.arun at gmail.com
Sun Sep 28 19:49:39 CEST 2008

On Sun, 2008-09-28 at 17:26 +0100, Alan Gauld wrote:
> "Arun Tomar" <tomar.arun at gmail.com> wrote
> > I've been using shell scripting & using sed & pipes i've solved it,
> > but with python, i need to practice more ;).
> Can you show us some output as you'd like irt?
> Can you show us the sed script that works?
sample data:
Contact Candidate       
Jyoti Soni - 0 Year(s) 0 Month(s)

C , C + + , Java , JSP , Oracle , S / W Testing
B.Sc Pt.Ravishanker University,Raipur
MCA Pt.Ravishanker University,Raipur

Currently in: Pune
CTC(p.a): Not Disclosed
Modified: 27 Sep 2007
Tel: 09975610476(M)

Account Information
Account Information

Contact Candidate       
Minal - 0 Year(s) 0 Month(s)

c , c + + , java , ASP . NET , VB , Oracle , Dimploma in Web Designing
B.Sc Shivaji University , Maharasthra
MCA Shivaji University , Maharashtra

Currently in: Pune
CTC(p.a): INR 0 Lac(s) 5 Thousand
Modified: 27 Jan 2006
Last Active: 06 Sep 2007
Tel: 9890498376(M)
011 02162 250553(R)

Account Information
Account Information

small shell scripts that works: 

print $1

 sed -ne '/Contact/,+1p' -e '/Tel/p' $1 |sed -e '/Contact Candidate/d'|
sed -e 's/\-//'|sed -e '/^$/d'|sed -e 's/ *$//'|sed -e 's/Tel://g' -e
's/(M)//g' -e 's/0 Year(s) 0 Month(s)//g' -e 's/(R)//g' -e '/> Similar

sample output
Jyoti Soni  

> Also can you show us the Python code that doesn't work
> and what went wrong? Its easier to fix what's broken than
> to guess at what might do what you want :-)
python code that works, after that i'm a bit lost ;)

import re

filename = "script.txt"

#regex pattern
p1 = re.compile("Contact Candidate",re.IGNORECASE)
p2 = re.compile ("Tel:", re.IGNORECASE)

#open the file
fh = open(filename,'r')
#read the contents of the file to an array.
file_array = fh.readlines()

#create an empty array
new_array = []
mod_array = []
for i in range(len(file_array)):
    if p1.search(file_array[i]):
    if p2.search(file_array[i]):

basically i'm trying my hand with text manipulation with python. i'm
thorough with shell scripting, sed & awk. 

after this data is extracted i would like to convert it to a csv file,
then i would like to insert the data into a database etc etc. i hope
this gives a good idea of what i'm trying to do.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/tutor/attachments/20080928/3ca6965d/attachment.pgp>

More information about the Tutor mailing list