[Python-bugs-list] [Bug #121654] sre regex () groups have strong memory
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 05 Jan 2001 07:36:33 -0800
Bug #121654, was updated on 2000-Nov-05 00:17
Here is a current snapshot of the bug.
Project: Python
Category: Regular Expressions
Status: Open
Resolution: Fixed
Bug Group: None
Priority: 5
Submitted by: david_alessio
Assigned to : effbot
Summary: sre regex () groups have strong memory
Details: The new re module (sre) has a problem with () grouping.
It seems () groups sometimes "remember" the previous
match. Try the following:
#!/usr/bin/python
"""
This module parses a CSV string and produces a list whose
elements are the Comma-Seperated Values. myCSV uses Python's
regular expression matching. The search pattern uses three
groups of (). findall() returns a list of tuples containing
three elements--the results of the search groups. The lambda
function returns the correct element of the three.
For more info on regular expressions see:
Mastering Regular Expressions
Jeffrey E. F. Friedl
O'Reilly Press
######################################################################
Created: 23-Mar-2000
Author: David S. Alessio
Last Modified: $Modtime:$
Last Modified by: $Author:$
Revision: $Revision:$
Revision History:
$Log:$
######################################################################
"""
import re
class myCSV:
def __init__(self):
self.srchPat = '"([^"\\\\]*(\\\\.[^"\\\\]*)*)",?|([^,]+),?|,'
self.srchProg = re.compile(self.srchPat)
def __str2num(self, str):
try:
num = int(str)
except:
try:
num = long(str)
except:
try:
num = float(str)
except:
num = str
return num
def __pick_best_value(self, t):
value = t[0] or t[2]
return self.__str2num(value)
def csv2list(self, str):
# print self.srchProg.findall(str)
return map(lambda t: t[0] or t[2], self.srchProg.findall(str))
def csv2numlist(self, str):
return map(self.__pick_best_value, self.srchProg.findall(str))
def test(self):
str = '1,2,3,"ab\\"c","de,f",name,"123",,,"xyz","","zzz"'
tstnumlst =
[1,2,3,'ab\\"c','de,f','name',123,'','','xyz','','zzz']
list = self.csv2list(str)
numlst = self.csv2numlist(str)
print str
print list
print numlst
print tstnumlst
#"""
if tstnumlst != numlst:
print "\nERROR:"
if len(numlst) > len(tstnumlst):
print 'NumList Too Long!!!'
elif len(numlst) < len(tstnumlst):
print 'NumList Too Short!!!'
else:
print ' Expected Got'
for i in xrange(len(tstnumlst)):
mark = ''
if tstnumlst[i] != numlst[i]:
mark = ' <==='
print 'L[%2d]: %10s %10s %s' % (i, tstnumlst[i],
numlst[i], mark)
else:
print 'OK'
#"""
if __name__ == '__main__':
from sys import argv
if len(argv) == 1:
print 'Compiling CSV'
x = myCSV();
print 'Testing...'
x.test()
elif len(argv) == 2:
fin = open(argv[1])
x = myCSV();
while 1:
str = fin.readline()
if not str: break
csv = x.csv2numlist(str[:-1])
print csv
else:
print 'Usage: %s [file.csv]' % argv[0]
Follow-Ups:
Date: 2001-Jan-05 07:36
By: effbot
Comment:
Fixed but not closed? Might mean that it's fixed locally, but not in the
CVS repository. I'll investigate. /F
-------------------------------------------------------
Date: 2000-Nov-09 11:54
By: effbot
Comment:
same as #117612
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=121654&group_id=5470