[Tutor] python regex help
tomar.arun at gmail.com
Sun Sep 28 19:49:39 CEST 2008
On Sun, 2008-09-28 at 17:26 +0100, Alan Gauld wrote:
> "Arun Tomar" <tomar.arun at gmail.com> wrote
> > I've been using shell scripting & using sed & pipes i've solved it,
> > but with python, i need to practice more ;).
> Can you show us some output as you'd like irt?
> Can you show us the sed script that works?
Jyoti Soni - 0 Year(s) 0 Month(s)
C , C + + , Java , JSP , Oracle , S / W Testing
B.Sc Pt.Ravishanker University,Raipur
MCA Pt.Ravishanker University,Raipur
Currently in: Pune
CTC(p.a): Not Disclosed
Modified: 27 Sep 2007
Minal - 0 Year(s) 0 Month(s)
c , c + + , java , ASP . NET , VB , Oracle , Dimploma in Web Designing
B.Sc Shivaji University , Maharasthra
MCA Shivaji University , Maharashtra
Currently in: Pune
CTC(p.a): INR 0 Lac(s) 5 Thousand
Modified: 27 Jan 2006
Last Active: 06 Sep 2007
011 02162 250553(R)
small shell scripts that works:
sed -ne '/Contact/,+1p' -e '/Tel/p' $1 |sed -e '/Contact Candidate/d'|
sed -e 's/\-//'|sed -e '/^$/d'|sed -e 's/ *$//'|sed -e 's/Tel://g' -e
's/(M)//g' -e 's/0 Year(s) 0 Month(s)//g' -e 's/(R)//g' -e '/> Similar
> Also can you show us the Python code that doesn't work
> and what went wrong? Its easier to fix what's broken than
> to guess at what might do what you want :-)
python code that works, after that i'm a bit lost ;)
filename = "script.txt"
p1 = re.compile("Contact Candidate",re.IGNORECASE)
p2 = re.compile ("Tel:", re.IGNORECASE)
#open the file
fh = open(filename,'r')
#read the contents of the file to an array.
file_array = fh.readlines()
#create an empty array
new_array = 
mod_array = 
for i in range(len(file_array)):
basically i'm trying my hand with text manipulation with python. i'm
thorough with shell scripting, sed & awk.
after this data is extracted i would like to convert it to a csv file,
then i would like to insert the data into a database etc etc. i hope
this gives a good idea of what i'm trying to do.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part
More information about the Tutor