Regular Expression

MRAB python at mrabarnett.plus.com
Mon Apr 13 03:03:30 CEST 2015


On 2015-04-13 01:25, Pippo wrote:
> On Sunday, 12 April 2015 20:06:08 UTC-4, MRAB  wrote:
>> On 2015-04-13 00:47, Pippo wrote:
>> > On Sunday, 12 April 2015 19:44:05 UTC-4, Pippo  wrote:
>> >> On Sunday, 12 April 2015 19:28:44 UTC-4, MRAB  wrote:
>> >> > On 2015-04-12 23:49, Pippo wrote:
>> >> > > I have a text as follows:
>> >> > >
>> >> > > "#D{#C[Health] #P[Information] -
>> >> > > means any information, including #ST[genetic information],
>> >> > > whether #C[oral | (recorded in (any form | medium))], that
>> >> > > (1)#C[Is created or received by] a
>> >> > > #A[health care provider | health plan | public health authority | employer | life insurer | school | university | or health care clearinghouse];
>> >> > > (2)#C[Relates to] #C[the past, present, or future physical | mental health | condition of an individual] |
>> >> > > #C[the provision of health care to an individual] |
>> >> > > #C[the past, present, or future payment for the provision of health care to an individual].}"
>> >> > >
>> >> > > I want to get all elements that start with #C and are []  and put it in an array. For example #C[Health], I try with regex but it doesn't work:
>> >> > >
>> >> > "... it doesn't work"? In what way doesn't it work?
>> >> >
>> >> > > import re
>> >> > > import tkinter.filedialog
>> >> > > import readfile
>> >> > >
>> >> > >
>> >> > >
>> >> > > j = 0
>> >> > >
>> >> > > text = [ ]
>> >> > >
>> >> > >
>> >> > > content = readfile.pattread()
>> >> > >
>> >> > > while j < len(content):
>> >> > >
>> >> > There's a syntax error here:
>> >> >
>> >> > >      constraint = re.compile(r'(#C\[\w*\]'))
>> >> > >      result = constraint.search(content[j],re.MULTILINE)
>> >> > >      text.append(result)
>> >> > >      print(text)
>> >> > >      j = j+1
>> >> > >
>> >>
>> >> result is empty! Although it should have a content.
>> >>
>> >> What is the syntax error?
>> >
>> > I fixed the syntax error but the result shows:
>> >
>> >>>>
>> > [None]
>> > [None, None]
>> > [None, None, None]
>> > [None, None, None, None]
>> > [None, None, None, None, None]
>> > [None, None, None, None, None, None]
>> > [None, None, None, None, None, None, None]
>> > [None, None, None, None, None, None, None, None]
>> >>>>
>> >
>> >
>> > No error but if I don't call the content I posted up and call this as a content: #content = "#C[Health] #P[Information]"
>> >
>> > result gives me #C[Health]
>> >
>> What does 'readfile.pattread()' return? Does it return a list of
>> strings? I'm guessing it does.
>
> yes it reads a file of string similar to the one I posted above
>
>>
>> Try printing each string you're trying to match using 'repr', i.e.:
>>
>>      print(repr(content[j]))
>>
>> Do any look like they should match?
>
>   print(repr(content[j])) gives me the following:
>
> [None]
> '#D{#C[Health] #P[Information] - \n'
> [None, None]
> 'means any information, including #ST[genetic information], \n'
> [None, None, None]
> 'whether #C[oral | (recorded in (any form | medium))], that \n'
> [None, None, None, None]
> '(1)#C[Is created or received by] a \n'
> [None, None, None, None, None]
> '#A[health care provider | health plan | public health authority | employer | life insurer | school | university | or health care clearinghouse];  \n'
> [None, None, None, None, None, None]
> '(2)#C[Relates to] #C[the past, present, or future physical | mental health | condition of an individual] | \n'
> [None, None, None, None, None, None, None]
> '#C[the provision of health care to an individual] | \n'
> [None, None, None, None, None, None, None, None]
> '#C[the past, present, or future payment for the provision of health care to an individual].}\n'
>
> shouldn't it match "#C[Health]" in the first row? If not, what is the best way to fetch these items in an array?
>
>>
>> If one doesn't, but you think it should, post it here so that someone
>> can tell you why it doesn't! :-)
>
It took me a while to spot the problem...

You're passing re.MULTILINE as the second argument of
'constraint.search', but look at the help text:

 >>> help(constraint.search)
Help on built-in function search:

search(...) method of _sre.SRE_Pattern instance
     search(string[, pos[, endpos]]) -> match object or None.
     Scan through string looking for a match, and return a corresponding
     match object instance. Return None if no position in the string 
matches.

The second argument is the starting position for the search, _not_
flags.

That flag should've been passed as the second argument of re.compile
(not that it's needed by that pattern, anyway).

Actually, there's no point compiling the same regex every time; you
might as well compiling it one, outside (just before) the loop.




More information about the Python-list mailing list