finding repeated data sequences in a column
norseman at hughes.net
Fri May 22 01:27:07 CEST 2009
> On May 20, 6:53 pm, norseman <norse... at hughes.net> wrote:
>> bearophileH... at lycos.com wrote:
>>>> How can I build up a program that tells me that this sequence
>>>> is repeated somewhere in the column, and how can i know where?
>>> Can such patterns nest? That is, can you have a repeated pattern made
>>> of an already seen pattern plus something else?
>>> If you don't want a complex program, then you may need to specify the
>>> problem better.
>>> You may want something like LZ77 or releated (LZ78, etc):
>>> This may have a bug:
>> index on column
>> Ndx1 is set to index #1
>> Ndx2 is set to index #2
>> test Ndx1 against Ndx2
>> if equal write line number and column content to a file
>> (that's two things on one line: 15 1000028706
>> 283 1000028706 )
>> Ndx1 is set to Ndx2
>> Ndx2 is set to index #next
>> loop to test writing out each duplicate set
>> Then use the outfile and index on line number
>> In similar manor, check if line current and next line line numbers are
>> sequential. If so scan forward to match column content of lower line
>> number and check first matched column's line number and next for
>> sequential. Print them out if so
>> everything in outfile has 1 or more duplicates
>> 4 aa |--
>> 5 bb |-- | thus 4/5 match 100/101
>> 6 cc | |
>> . | |
>> 100 aa | |--
>> 101 bb |--
>> 102 ddd
>> 103 cc there is a duplicate but not a sequence
>> 200 ff
>> mark duplicate sequences as tested and proceed on through
>> seq1 may have more than one other seq in file.
>> the progress is from start to finish without looking back
>> thus each step forward has fewer lines to test.
>> marking already knowns eliminates redundant sequence testing.
>> By subseting on pass1 the expensive testing is greatly reduced.
>> If you know your subset data won't exceed memory then the "outfile"
>> can be held in memory to speed things up considerably.
>> Today is: 20090520
>> no code
>> Steve- Hide quoted text -
>> - Show quoted text -
> this is the program...I wrote but is not working
> I have a list of valves, and another of pressures;
> If I am ask to find out which ones are the valves that are using all
> this set of pressures, wanted best pressures
> this is the program i wrote but is not working properly, it suppossed
> to return in the case
> find all the valves that are using pressures 1 "and" 2 "and" 3.
> It returns me A, A2, A35....
looking at the data that seems correct.
there are 3 '1's in the list, 1-A, 1-A2, 1-A35
there are 2 '2's in the list, 2-A, 2-A2
there are 2 '3's in the list, 3-A, 3-A2
and so on
after the the two sets are paired
indexing on the right yields 1-A,2-A,3-A,1-A2,2-A2,3-A2,7-A4...
indexing on the left yiels1 1-A,1-A2,1-A35,2-A,2-A2,3-A,3-A2,7-A4...
and the two 78s would pair with a G and with a G2 (78-G, 78-G2)
beyond that I'm a bit lost.
> The correct answer supposed to be A and A2...
> if I were asked for pressures 56 and 78 the correct answer supossed to
> be valves G and G2...
> Valves = ['A','A','A','G', 'G', 'G',
> 'C','A2','A2','A2','F','G2','G2','G2','A35','A345','A4'] ##valve names
> pressures = [1,2,3,4235,56,78,12, 1, 2, 3, 445, 45,56,78,1, 23,7] ##
> valve pressures
> result = 
> bestpress = [1,2,3] ##wanted base pressures
> print bestpress,'len bestpress is' , len(bestpress)
> print len(Valves)
> print len(Valves)
> for j in range(len(Valves)):
> #for i in range(len(bestpress)):
> #for j in range(len(Valves)):
> for i in range(len(bestpress)-2):
> if pressures [j]== bestpress[i] and bestpress [i+1]
> ==pressures [j+1] and bestpress [i+2]==pressures [j+2]:
> #i = i+1
> #j = j+1
> # print i, j, bestpress[i]
> print "common PSVs are", result
More information about the Python-list