regular expression

Bengt Richter bokr at oz.net
Sat Mar 26 02:41:54 EST 2005


On Fri, 25 Mar 2005 23:54:32 -0500, Peter Hansen <peter at engcorp.com> wrote:

>Bengt Richter wrote:
>> On Sat, 26 Mar 2005 02:07:15 GMT, aaron <asteele at berkeley.edu> wrote:
>>>>>>pattern.sub(':', '375 mi. south of U.C.B is 3.4 degrees warmer.')
>>>'375 mi: south of U:C:B is 3.4 degrees warmer:'
>>>
>>>so this works, but not in the following case:
>>>>>>pattern.sub(':', '.3')
>>>
>> Brute force the exceptional case that happens at the start of the line?
>> 
>>  >>> import re
>>  >>> pattern = re.compile(r'^[.]|(?!\d)[.](?!\d)')
>>  >>> pattern.sub(':', '375 mi. south of U.C.B is 3.4 degrees warmer.')
>>  '375 mi: south of U:C:B is 3.4 degrees warmer:'
>>  >>> pattern.sub(':', '.3')
>>  ':3'
>>  >>> pattern.sub(':', '3.')
>>  '3:'
>
>Be careful... the OP has assumed something that isn't true,
>and Bengt's fix isn't sufficient:
>
> >>> import re
> >>> s = 'x.3'
> >>> pattern = re.compile(r'^[.]|(?!\d)[.](?!\d)')
> >>> pattern.sub(':', '.3')
>':3'
> >>> pattern.sub(':', s)
>'x.3'
>
>So the OP's "this works" comment was wrong.
>
>Suggestion: whip up a variety of automated test cases and
>make sure you run them all whenever you make changes to
>this code...
>
>(No, I don't have a solution to the continuing problem,
>other than to wonder whether the input data really requires
>all these edge cases to be handled properly.)
>
Goes to show you ;-/ Do we need more tests than these?

 >>> import re
 >>> pattern = re.compile(r'[.](?!\d)|(?<!\d)[.]')
 >>> print pattern.sub(':', '375 mi. south of U.C.B is 3.4 degrees warmer.')
 375 mi: south of U:C:B is 3.4 degrees warmer:
 >>> for s,ss in ((s,pattern.sub(':', s)) for s in ('%s%s.%s%s'%(sp1,c1,c2,sp2)
 ...         for sp1 in ('', ' ')
 ...         for c1 in ('', 'x', '3')
 ...         for c2 in ('', 'x', '3')
 ...         for sp2 in ('', ' '))):
 ...     print '%10r => %r' %(s,ss)
 ...
        '.' => ':'
       '. ' => ': '
       '.x' => ':x'
      '.x ' => ':x '
       '.3' => ':3'
      '.3 ' => ':3 '
       'x.' => 'x:'
      'x. ' => 'x: '
      'x.x' => 'x:x'
     'x.x ' => 'x:x '
      'x.3' => 'x:3'
     'x.3 ' => 'x:3 '
       '3.' => '3:'
      '3. ' => '3: '
      '3.x' => '3:x'
     '3.x ' => '3:x '
      '3.3' => '3.3'
     '3.3 ' => '3.3 '
       ' .' => ' :'
      ' . ' => ' : '
      ' .x' => ' :x'
     ' .x ' => ' :x '
      ' .3' => ' :3'
     ' .3 ' => ' :3 '
      ' x.' => ' x:'
     ' x. ' => ' x: '
     ' x.x' => ' x:x'
    ' x.x ' => ' x:x '
     ' x.3' => ' x:3'
    ' x.3 ' => ' x:3 '
      ' 3.' => ' 3:'
     ' 3. ' => ' 3: '
     ' 3.x' => ' 3:x'
    ' 3.x ' => ' 3:x '
     ' 3.3' => ' 3.3'
    ' 3.3 ' => ' 3.3 '

Regards,
Bengt Richter



More information about the Python-list mailing list