Limits on search length

John Machin sjmachin at lexicon.net
Tue Oct 2 04:20:43 CEST 2007


On Oct 2, 3:16 am, Daryl Lee <d... at altaregos.com> wrote:
> I am trying to locate all lines in a suite of files with quoted strings of
> particular lengths.  A search pattern like r'".{15}"' finds 15-character
> strings very nicely.  But I have some very long ones, and a pattern like
> r'".{272}"' fails miserably, even though I know I have at least one
> 272-character string.
>
> In the short term, I can resort to locating the character positions of the
> quotes, but this seemed like such an elegant solution I hate to see it not
> work.  The program is given below (sans imports), in case someone can spot
> something I'm overlooking:
>
> # Example usage: search.py *.txt \".{15}\"
>
> filePattern = sys.argv[1]
> searchPattern  = sys.argv[2]

1. Learn an elementary debugging technique called "print the input".

  print "pattern is", repr(searchPattern)

2. Fix your regular expression:

>>> import re
>>> patt = r'".{15}"'
>>> patt
'".{15}"'
>>> rx = re.compile(patt)
>>> o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>> o.group()
'"123456789012345"'
>>> o = rx.search('"1234567" "12345"'); o
<_sre.SRE_Match object at 0x00B96950>
>>> o.group()
'"1234567" "12345"' ########## whoops ##########
>>>

>>> patt = r'"[^"]{15}"' # or use the non-greedy ? tag
>>> rx = re.compile(patt)
>>> o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>> o.group()
'"123456789012345"'
>>> o = rx.search('"1234567" "12345"'); o
>>> o.group()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

3. Try building scripts from small TESTED parts e.g. in this case
write a function to find all quoted strings of length n inside a given
string. If you do that, you will KNOW there is no limit that stops you
finding a string of length 272, and you can then look for your error
elsewhere.

HTH,
John




More information about the Python-list mailing list