[Tutor] lil help please - updated (fwd) (Chris or Leslie Smith)

Alan ldapguru at yahoo.com
Fri Nov 25 18:30:49 CET 2005


Smile and Kent

The logic is so good so far. However, How do we move the (...) in |H to
end of |R and before next |H

Much respect
AD

Exceptional team:
I like and I agree with your all logic (I have no choice! Smile you are
more advanced than me)

Kent said: 
I think I would split this into three phases:
- collect the data into groups of HFR
- process each group by rearranging, renumbering, reporting errors
- output the processed groups

One potential problem is to resynchronize to the next group when there
is a sequence error. If there is always a blank line between groups it
is easy. Otherwise maybe just assume an H is the start of a group.

And Smile addressed Kent's concern by saying:

Hmm...so Alan could first split the data on the "|H" values. These
*should* contain an |F and and |R, so the next step would be to break
these HFR groups into pieces and check to see that all the pieces are
there, and perhaps if not, printing those to an error file for review.


Alan, regarding the extraction of the parentheticals, what have you
tried? One suggestion for this aspect is to get rid of the line breaks
in the |H chunk and then you won't have the problem of a broken
parenthetical. For example,

######
>>> multiLines = '''This (as you
... can see) is multilined.'''
>>> multiLines.splitlines()
['This (as you', 'can see) is multilined.']
>>> ' '.join(multiLines.splitlines())
'This (as you can see) is multilined.'
>>> # the above is one line and much easier to handle now.
######

>How are you reading the data in from the file?

I use the 150 line python I do not mind emailing it directly so I do not
confuse these cleaning tasks - you just say yes

Much respect
AD



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004
 



More information about the Tutor mailing list