[Tutor] python regex help

Arun Tomar tomar.arun at gmail.com
Sun Sep 28 19:49:39 CEST 2008


On Sun, 2008-09-28 at 17:26 +0100, Alan Gauld wrote:
> "Arun Tomar" <tomar.arun at gmail.com> wrote
> 
> > I've been using shell scripting & using sed & pipes i've solved it,
> > but with python, i need to practice more ;).
> 
> Can you show us some output as you'd like irt?
> Can you show us the sed script that works?
sample data:
Contact Candidate       
Jyoti Soni - 0 Year(s) 0 Month(s)
MCA

Keyskills:
C , C + + , Java , JSP , Oracle , S / W Testing
                
B.Sc Pt.Ravishanker University,Raipur
MCA Pt.Ravishanker University,Raipur



                
Currently in: Pune
CTC(p.a): Not Disclosed
Modified: 27 Sep 2007
Tel: 09975610476(M)

Account Information
Account Information


Contact Candidate       
Minal - 0 Year(s) 0 Month(s)
MCA

Keyskills:
c , c + + , java , ASP . NET , VB , Oracle , Dimploma in Web Designing
                
B.Sc Shivaji University , Maharasthra
MCA Shivaji University , Maharashtra



                
Currently in: Pune
CTC(p.a): INR 0 Lac(s) 5 Thousand
Modified: 27 Jan 2006
Last Active: 06 Sep 2007
Tel: 9890498376(M)
011 02162 250553(R)

Account Information
Account Information

small shell scripts that works: 
#!/bin/bash

print $1

 sed -ne '/Contact/,+1p' -e '/Tel/p' $1 |sed -e '/Contact Candidate/d'|
sed -e 's/\-//'|sed -e '/^$/d'|sed -e 's/ *$//'|sed -e 's/Tel://g' -e
's/(M)//g' -e 's/0 Year(s) 0 Month(s)//g' -e 's/(R)//g' -e '/> Similar
Resumes/d' 

sample output
Jyoti Soni  
 09975610476
Minal  
 9890498376

> 
> Also can you show us the Python code that doesn't work
> and what went wrong? Its easier to fix what's broken than
> to guess at what might do what you want :-)
python code that works, after that i'm a bit lost ;)

import re

filename = "script.txt"

#regex pattern
p1 = re.compile("Contact Candidate",re.IGNORECASE)
p2 = re.compile ("Tel:", re.IGNORECASE)

#open the file
fh = open(filename,'r')
#read the contents of the file to an array.
file_array = fh.readlines()

#create an empty array
new_array = []
mod_array = []
for i in range(len(file_array)):
    if p1.search(file_array[i]):
        new_array.append(file_array[i+1])
    if p2.search(file_array[i]):
        new_array.append(file_array[i])
        new_array.append(file_array[i+1])
        

basically i'm trying my hand with text manipulation with python. i'm
thorough with shell scripting, sed & awk. 

after this data is extracted i would like to convert it to a csv file,
then i would like to insert the data into a database etc etc. i hope
this gives a good idea of what i'm trying to do.


regds,
arun.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/tutor/attachments/20080928/3ca6965d/attachment.pgp>


More information about the Tutor mailing list