Two dimensional regexp matching?

Benjamin Goldberg goldbb2 at earthlink.net
Sun Jul 28 01:07:00 EDT 2002


Paddy wrote:
> 
> We already have the re module for regular expression matching on a string.
> 
> I am looking for pointers to references/algorithms for regular
> expression matching for files of tabular data, i.e.
> 
>      Table definition
>      ================
>      1) Samples from one point in the system
>         appears in a column of the table.
>      2) Samples encoded as characters
>      3) All points in the system are sampled at the
>         same time to produce successive rows of the table
> 
> So a system sampled at two points in successively may produce the
> following file:
> 
>      GH
>      DF
>      AS
>      QW
>      FF
>      SD
> 
> I want to be able to do regular expression type searches within the
> file. Things like
>   Where can I find
>       point1 == (D or G) then point2 == W within three samples 
>   and
>       where the next sample of point2 != the earlier sample of point1?

Hmm...
To match just the first part /[DG].\n(?:..\n){0,2}.W\n/ might do it.
And with that other requirement: /([DG]).\n(?:..\n){0,2}.W\n.(?!=\1)./

> That was a small example, in reality there is usually hundreds of
> points and tens of thousands of samples in multi-megabyte files but

The problem isn't in the size of the data (or at least, not *just* in
the size of the data), but in the complexity of the regular expressions.

> I'd first like to see if anyone else has considered this kind of 'two
> dimensional regexp matching'
> 
> Note: I DO NOT have queries in the date on sample points. The queries
> will always be "Find the range of sample times in which 'this'
> occurs".
> 
> I have tried Google but without success - I don't know enough to think
> of a suitable search phrase, or, (much less likely), Google doesn't
> have it ;-)
> 
> Thanks in advance, Paddy.

-- 
tr/`4/ /d, print "@{[map --$| ? ucfirst lc : lc, split]},\n" for
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';



More information about the Python-list mailing list