Regex on a huge text

Paddy paddy3118 at googlemail.com
Sun Aug 24 13:47:54 CEST 2008


On Aug 22, 9:19 pm, "Medardo Rodriguez" <med.... at gmail.com> wrote:
> On Fri, Aug 22, 2008 at 11:24 AM, Dan <redalas... at gmail.com> wrote:
> > I'm looking on how to apply a regex on a pretty huge input text (a file
> > that's a couple of gigabytes). I found finditer which would return results
> > iteratively which is good but it looks like I still need to send a string
> > which would be bigger than my RAM. Is there a way to apply a regex directly
> > on a file?
>
> > Any help would be appreciated.
>
> You can call *grep* posix utility.
> But if the regex's matches are possible only inner the context of a
> line of that file:
> #<code>
> res = []
> with file(filename) as f:
>     for line in f:
>         res.extend(getmatches(regex, line))
> #  Of course "getmatches" describes the concept.
> #</code>
>
> Regards

Try and pre-filter your file on a line basis to cut it down , then
apply a further filter on the result.

For example, if you were looking for consecutive SPAM records with the
same Name field then you might first extract only the SPAM records
from the gigabytes to leave something more manageable to search for
consecutive Name fields in.

- Paddy.



More information about the Python-list mailing list