[Tutor] how to match regular expression from right to left
Kent Johnson
kent37 at tds.net
Sun Sep 16 16:50:44 CEST 2007
王瘢雹超 wrote:
> The number of iterms - (us::.*?) - varies.
>
> When I use re.findall with (us::*?), only several 'us::' are extracted.
I don't understand what is going wrong now. Please show the code, the
data, and tell us what you get and what you want to get.
Here is an example:
Without a group you get the whole match:
In [3]: import re
In [4]: line = """38166 us::Video_Cat::Other; us::Video_Cat::Today Show;
us::VC_Supplier::bc; 1002::ms://bc.wd.net/a275/video/tdy_is.asf;
1003::ms://bc.wd.net/a275/video/tdy_is_.fl;"""
In [5]: re.findall('us::.*?;', line)
Out[5]: ['us::Video_Cat::Other;', 'us::Video_Cat::Today Show;',
'us::VC_Supplier::bc;']
With a group you get just the group:
In [6]: re.findall('(us::.*?);', line)
Out[6]: ['us::Video_Cat::Other', 'us::Video_Cat::Today Show',
'us::VC_Supplier::bc']
Kent
>
> Daniel
>
> On 9/16/07, * Kent Johnson* <kent37 at tds.net <mailto:kent37 at tds.net>> wrote:
>
> 王瘢雹超 wrote:
> > yes, but I mean if I have the line like this:
> >
> > line = """38166 us::Video_Cat::Other; us::Video_Cat::Today Show;
> > us::VC_Supplier::bc; 1002::ms://bc.wd.net/a275/video/tdy_is.asf;
> > 1003::ms://bc.wd.net/a275/video/tdy_is_.fl;"""
> >
> > I want to get the part "us::MSNVideo_Cat::Other;
> us::MSNVideo_Cat::Today
> > Show; us::VC_Supplier::Msnbc;"
> >
> > but re.compile(r"(us::.*) .*(1002|1003).*$") will get the
> > "1002::ms://bc.wd.net/a275/video/tdy_is.asf;" included in an lazy
> mode.
>
> Of course, you have asked for all the text up to the end of the string.
>
> Not sure what you mean by lazy mode...
>
> If there will always be three items you could just repeat the relevant
> sections of the re, something like
>
> r'(us::.*?); (us::.*?); (us::.*?);'
>
> or even
>
> r'(us::Video_Cat::.*?); (us::Video_Cat::.*?); (us::VC_Supplier::.*?);'
>
> If the number of items varies then use re.findall() with (us::.*?);
>
> The non-greedy match is not strictly needed in the first case but it is
> in the second.
>
> Kent
>
>
More information about the Tutor
mailing list