finding repeated data sequences in a column
yadin
conra2004 at yahoo.com
Thu May 21 03:55:45 EDT 2009
On May 20, 6:53 pm, norseman <norse... at hughes.net> wrote:
> bearophileH... at lycos.com wrote:
> > yadin:
> >> How can I build up a program that tells me that this sequence
> >> 1000028706
> >> 1000028707
> >> 1000028708
> >> is repeated somewhere in the column, and how can i know where?
>
> > Can such patterns nest? That is, can you have a repeated pattern made
> > of an already seen pattern plus something else?
> > If you don't want a complex program, then you may need to specify the
> > problem better.
>
> > You may want something like LZ77 or releated (LZ78, etc):
> >http://en.wikipedia.org/wiki/LZ77
> > This may have a bug:
> >http://code.activestate.com/recipes/117226/
>
> > Bye,
> > bearophile
>
> ============================================
> index on column
> Ndx1 is set to index #1
> Ndx2 is set to index #2
> test Ndx1 against Ndx2
> if equal write line number and column content to a file
> (that's two things on one line: 15 1000028706
> 283 1000028706 )
> Ndx1 is set to Ndx2
> Ndx2 is set to index #next
> loop to test writing out each duplicate set
>
> Then use the outfile and index on line number
>
> In similar manor, check if line current and next line line numbers are
> sequential. If so scan forward to match column content of lower line
> number and check first matched column's line number and next for
> sequential. Print them out if so
>
> everything in outfile has 1 or more duplicates
>
> 4 aa |--
> 5 bb |-- | thus 4/5 match 100/101
> 6 cc | |
> . | |
> 100 aa | |--
> 101 bb |--
> 102 ddd
> 103 cc there is a duplicate but not a sequence
> 200 ff
>
> mark duplicate sequences as tested and proceed on through
> seq1 may have more than one other seq in file.
> the progress is from start to finish without looking back
> thus each step forward has fewer lines to test.
> marking already knowns eliminates redundant sequence testing.
>
> By subseting on pass1 the expensive testing is greatly reduced.
> If you know your subset data won't exceed memory then the "outfile"
> can be held in memory to speed things up considerably.
>
> Today is: 20090520
> no code
>
> Steve- Hide quoted text -
>
> - Show quoted text -
this is the program...I wrote but is not working
I have a list of valves, and another of pressures;
If I am ask to find out which ones are the valves that are using all
this set of pressures, wanted best pressures
this is the program i wrote but is not working properly, it suppossed
to return in the case
find all the valves that are using pressures 1 "and" 2 "and" 3.
It returns me A, A2, A35....
The correct answer supposed to be A and A2...
if I were asked for pressures 56 and 78 the correct answer supossed to
be valves G and G2...
Valves = ['A','A','A','G', 'G', 'G',
'C','A2','A2','A2','F','G2','G2','G2','A35','A345','A4'] ##valve names
pressures = [1,2,3,4235,56,78,12, 1, 2, 3, 445, 45,56,78,1, 23,7] ##
valve pressures
result = []
bestpress = [1,2,3] ##wanted base pressures
print bestpress,'len bestpress is' , len(bestpress)
print len(Valves)
print len(Valves)
for j in range(len(Valves)):
#for i in range(len(bestpress)):
#for j in range(len(Valves)):
for i in range(len(bestpress)-2):
if pressures [j]== bestpress[i] and bestpress [i+1]
==pressures [j+1] and bestpress [i+2]==pressures [j+2]:
result.append(Valves[j])
#i = i+1
#j = j+1
# print i, j, bestpress[i]
print "common PSVs are", result
More information about the Python-list
mailing list