[Tutor] lil help Booky.py - updated (Chris or Leslie Smith)

Alan ldapguru at yahoo.com
Sun Nov 27 11:54:55 CET 2005


Smile
Thanks 

#I am sorry I had bad cold
#I am back
I totally agree with you, I am sorry I was engrossed with the 150 lines
of code 

Now I am attempting to address this booky.py script with your all help
to clean the text


Input Filename1.txt :

00001|H A quote this is a cool python coders (some of them know Java and
c++) lucky them thereby they united the universe
|F dollar sore book of the year |R nice

00007|H C qoute this is a cool Java group coders, they live San Jose at
sun car parking lot
all what they afraid of is the sun network (this is a dog outside
|S Title I love the Canyon
the door) in sun main building |F Santa Fe Flea Market book of the year
|R very nice


00005|H B qoute this is a cool COBOL group of coders, they are loved by
IBM
some of them are trying to learn visual COBOL
|F Good morning America book of the year
|R bad


Expected output Filename2.txt

|H00001 A quote this is a cool python coders lucky them thereby they
united the universe
|F00001 dollar sore book of the year
|R00001 nice (some of them know Java and c++)

|H00002 C qoute this is a cool Java group coders, they live San Jose at
sun car parking lot all what they afraid of is the sun network in sun
main building
|F00002 Santa Fe Flea Market book of the year
|R00002 very nice (this is a dog outside the door)

|H00003 B qoute this is a cool COBOL group of coders, they are loved by
IBM some of them are trying to learn visual COBOL
|F00003 Good morning America book of the year
|R00003 bad

#!c:\python42

# script Name: Booky
# Date: 11/05
# Author: Python's good fellows
#
# requirements:
# read Filename1.txt and produce filename2.txt and perhaps finemame3.txt
as small test having accomplised the following:
# 1. sorted by H statement
# 2. numbereing |H |F |R
# 3. () relocated from |H to after |R
# 4. each statement united is only one line]
# 5. disregard any "|" following by other than |H |F |R such as |S Title
I love the Canyon
#
#
# Abstract:
# 1. consider remove all line feed, enter, etc from the whole file
# 2. split by | to generate one line per |
# 4. disregard (kill) any line having | which is not followed by H, F or
R 
# 3. constuct regular expression to find between |H and next | any
(words)
# even if it is in multi line layout
# 4. replace the (words) by one space and relocate (words) to the end of
|R just before next |H
# 5. sort each set (H F R) by |H statement
# 6. renamber all sets (H F R)
# done
# match, search, findall or findliter

import re
import sys

filename1 = "filename1.txt"
filename2 = "filename2.txt"
filename3 = "filetest1.txt"


f1= open(filename1, "r")
f2= open(filename2, "w")
f3= open(filetest3, "w") # small test file to check the output
f3= open(filename1, "r+w")


reg1=re.compile('\(.*' , re.IGNORECASE)
reg2=re.compile('.*\)' , re.IGNORECASE)

#or:
leftq=re.compile('\(\d+\s+\w+' , re.IGNORECASE)
rightq=re.compile('\d+\s+\w+\)' , re.IGNORECASE)


for searchstring in f1.readlines():

# which one is best

 match1 = reg1.match(searchstring)
 match2 = reg2.match(searchstring)

 search1 = reg1.search(searchstring)
 search2 = reg2.search(searchstring)

 findall1 = reg1.findall(searchstring)
 findall2 = reg2.findall(searchstring)

 finditer1=reg1.finditer(searchstring)
 finditer2=reg2.finditer(searchstring)

 if match1:
  print "Match 1 leftq:\t" , match1.group()

 if match2:
  print "Match 2 rightq:\t" , match2.group()
 else:
   print "No match"
 if search1:
  print "Match 3 leftq:\t" , search1.group()
 else:
   print "No match"

 if search2:
  print "Match 4 rightq:\t" , search2.group()
 else:
   print "No match"

 if findall1:
  print "Match 5 leftq:\t" , findall1.group()
 else:
   print "No match"

 if findall2:
  print "Match 6 rightq:\t" , findall2.group()
 else:
   print "No match"

 if finditer1:
  for i in finditer1:
   print i.group()
 else:
  print "no finditer match"

 if finditer2:
  for i in finditer1:
   print i.group()
 else:
  print "no finditer match"


# use sub to remove ()  from |H
# whole = (leftq + rightq)
# whole = sub("whole", ' ')


# Unlite () and relocate

# Use Smile's multilines post tomake one line of whole

# for searchstring in f1.readlines():
#  line1 = reg1
#  line2 = reg2
#  multinines
# sub (multilines, (reg1 + reg2)
# use the test file to check the output
# f3.writelines(searchstrings) # small test file to check the output
# if the output is Ok then we write to f2 or f1 r+w
# f2.writelines(searchstrings) # how do we write original file + changes

# in progress


Message: 8
Date: Fri, 25 Nov 2005 21:53:28 -0600
From: "Chris or Leslie Smith" <smiles at worksmail.net>
Subject: Re: [Tutor] lil help please - updated (fwd)
To: <tutor at python.org>
Message-ID: <001001c5f23f$9e742380$e62c4fca at csmith>
Content-Type: text/plain;	charset="iso-8859-1"

| The logic is so good so far. However, How do we move the (...) in |H 
| to end of |R and before next |H

Maybe you are thinking too literally about the moving of the
parenthetical item from the |H to the end of the |R.  Let's say you have
the 3 chunks of information in variables h, f, and r:

###
>>> h='This (you see) is the header.'
>>> f='The Book'
>>> r='cool'
###

If you could break h apart into 3 pieces, the "This ", the "(you see)",
and " is the header." and call them h1, h2, and h3 then you could add h1
and h3 back together (that's your new value of h) and add h2 to r as
your new r:

###
>>> h1='This '
>>> h2='(you see)'
>>> h3=' is the header'
>>> h=h1+h3
>>> r=r+'\n'+h2  #the \n puts the parenthetical on a new line print h
This  is the header
>>> print r
cool
(you see)
###

(There's a double space left over at the place where the h1 and h3 were
joined which is something you might want to fix before you add them back
together. The string method ".strip() is nice for getting rid of leading
and trailing space on a string. Here's an example:

###
>>> print s
   space before and after is gone      
>>> print s.strip()
space before and after is gone
###
)

What would help maintain the spirit of the tutor list is if you would
ask specific questions about problems that you are having getting the
script to do what you you want or clarifications about how something is
suppose to work. Right now you have a task that is defined but you are
asking general questions about designing the program, not specific
questions about the *python* related problems. 

Rather than sending or posting the script you have, why not just post
the specific problems you are running into? 

Thanks,
/c

 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004
 



More information about the Tutor mailing list