[Tutor] Question on re.findall usage

Mon Jan 28 20:43:56 CET 2013

Please post in plain text (not html) as otherwise the code gets
screwed up. When I paste your code into a terminal this is what
happens:

>>> junk_list = 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet interface 1.3 {\r\n    media-capabilities {\r\n        none\r\n        auto\r\n     40000SR4-FD\r\n  10T-HD\r\n        100TX-FD\r\n        100TX-HD\r\n        1000T-FD\r\n        40000LR4-FD\r\n     1000T-HD\r\n    }\r\n}\r\n'
>>> junk_list
'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet interface
1.3 {\r\n\xc2\xa0\xc2\xa0\xc2\xa0 media-capabilities
{\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0
none\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0
auto\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 40000SR4-FD\r\n\xc2\xa0
10T-HD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0
100TX-FD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0
100TX-HD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0
1000T-FD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0
40000LR4-FD\r\n \xc2\xa0\xc2\xa0\xc2\xa0
1000T-HD\r\n\xc2\xa0\xc2\xa0\xc2\xa0 }\r\n}\r\n'

Those \xc2\xa0 characters are non-breaking space characters. The
trouble is that I don't know if they were added by your email client
or are actually part of your junk string. I've assumed the former and
replaced them with spaces in the code I show below.

On 28 January 2013 19:15, Dave Wilder <D.Wilder at f5.com> wrote:
> Hello,
>
> I am trying using re.findall to parse the string below and then create a
> list from the results.
> junk_list = 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet
> interface 1.3 {\r\n    media-capabilities {\r\n        none\r\n
> auto\r\n     40000SR4-FD\r\n  10T-HD\r\n        100TX-FD\r\n
> 100TX-HD\r\n        1000T-FD\r\n        40000LR4-FD\r\n     1000T-HD\r\n
> }\r\n}\r\n'
>
> What I am doing now is obviously quite ugly, but I have not yet able to
> manipulate it to work how I want but in a much more efficient and modular
> way.
> I did some research on re.findall but am still confused as to how to do
> character repetition searches, which  I guess is what I need to do here.
>>> junk_list =
>>> re.findall(r'(auto|[1|4]0+[A-Z]-[HF]D|[1|4]0+[A-Z][A-Z]-[HF]D|[1|4]0+[A-Z][A-Z][0-9])',
>>> junk_list)
>>> junk_list
> ['auto', '40000SR4', '10T-HD', '100TX-FD', '100TX-HD', '40000LR4',
> '1000T-FD', '1000T-HD']

This output doesn't match what I would expect from the string above.
Why is '1000T-FD' after '40000LR4-FD'? Is that the problem with the
code you posted?

>>>>
>
> Basically, all I need to search on is:
>
> auto
> anything that starts w/ ‘1’ or ‘4’ and then any number of subsequent zeroes
> e.g. 10T-HD, 40000LR4-FD, 100TX-FD

Does "any number" mean "one or more" or "zero or more"?

Some people like to use regexes for everything. I prefer to try string
methods first as I find them easier to understand. Here's my attempt:

>>> junk_list = 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet interface 1.3 {\r\n    media-capabilities {\r\n        none\r\n        auto\r\n     40000SR4-FD\r\n  10T-HD\r\n        100TX-FD\r\n        100TX-HD\r\n        1000T-FD\r\n        40000LR4-FD\r\n     1000T-HD\r\n    }\r\n}\r\n'
>>> junk_list = [s.strip() for s in junk_list.splitlines()]
>>> junk_list = [s for s in junk_list if s == 'auto' or s[:2] in ('10', '40')]
>>> junk_list
['auto', '40000SR4-FD', '10T-HD', '100TX-FD', '100TX-HD', '1000T-FD',
'40000LR4-FD', '1000T-HD']

Does that do what you want?

Oscar