[Tutor] lil help please - updated (|H|F|R) (Chris or Leslie Smith) TASK1 one relocate/move

Fri Nov 25 08:38:17 CET 2005

Smiles

Thanks a thousands 

Let us do one task at a time= task 1
relocate/move any words in |H between (...) to the end of |R rating and
before the next line of |H 

sorry the master python script wants one | before H and F and R in any
place  however some |H |F |R do not come in the beginning in error

BEFORE any process
|H 00100 "a friend in need is a friend indeed (xyz words) so select the
best 
 friend as soon as you can"
|F Old London book  
|R Cool

OR

|H 00100 "a friend in need is a friend indeed (xyz 
words) so select
 the best 
 friend as soon as you can"
|F Old London book  
|R Cool

RESULT AFTER task1 process
|H 00100 "a friend in need is a friend indeed so select the best 
 friend
 as soon as you can blah"
|F Old London book (xyz blah words)  <=== parenthetical here? 
|R Cool

Hello,

This seems like a well laid out task. If you post what you are trying
and the problems you are encountering, that would be helpful.

One suggestion that I have is that you switch problems 1 and 2. If the
ordering is broken (e.g. HHFR instead of HFRH) then knowing where to put
the parenthetical comment is going to be a problem.  Also, you said that
you wanted it put after the "F" reference did you mean that is should
look like this:

BEFORE any process
|H 00100 "a friend in need is a friend indeed (xyz blah words) so select
the best 
 friend
 as soon as you can blah"
|F Old London book  
|R Cool

AFTER your process
|H 00100 "a friend in need is a friend indeed so select the best 
 friend
 as soon as you can blah"
|F Old London book (xyz blah words)  <=== parenthetical here? 
|R Cool

It's a little hard to tell from what you've said, but it looks like the
"|" was an unnecessary addition. If your record markers were always a
single character at the beginning of a line, those are easy enough to
find--provided there is never an H, F, or R that is a NON-record marker
at the beginning of a line as a single character.

######
>>> text='''H This is the start.
... F here is a reference. 
... Right here is a non-reference R but it's not a single character
starting the line ... so it won't be matched; and the single one in the
middle isn't at the start. ... R cool'''
>>> import re
>>> text = '\n'+text     #make the first one like all the others:
preceded by newline character
>>> re.findall(r'\n([HFR])\b', text)
['H', 'F', 'R']
>>> re.split(r'\n([HFR])\b', text)
['', 'H', ' This is the start.', 'F', " here is a reference. \nRight
here is a non-reference R but it's not a single character starting the
line\nso it won't be matched; and the single one in the middle isn't at
the start.", 'R', ' cool']

######

That last list has all the groups with the identifier preceding the
corresponding data.

Finally, I'm not sure how you are checking the correctness of the HFR
sequence, but the findall used above suggests a way to do it:

-do the findall
-join the results together
-replace 'HFR' with '.'
-if the whole string isn't dots then there was a problem and the number
of dots before the non-dot tell you how many correct records there were.

######
>>> bad='''
... H
... F
... R
... R
... '''
>>> re.findall(r'\n([HFR])\b', bad)
['H', 'F', 'R', 'R']
>>> ''.join(_)            # the _ refers to the last output
'HFRR'
>>> _.replace('HFR', '.')
'.R'
>>> len(_),_.count('.')
(2, 1)

######

Notice that since not all the HFRs were complete, there are not all the
characters are periods (and so the count of periods is not the same as
the length of the string). In this case there was one correct record
(thus one leading dot) before the problem occurred.

/c

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004