[Python-bugs-list] [Bug #121654] sre regex () groups have strong memory

noreply@sourceforge.net noreply@sourceforge.net
Fri, 05 Jan 2001 07:36:33 -0800


Bug #121654, was updated on 2000-Nov-05 00:17
Here is a current snapshot of the bug.

Project: Python
Category: Regular Expressions
Status: Open
Resolution: Fixed
Bug Group: None
Priority: 5
Submitted by: david_alessio
Assigned to : effbot
Summary: sre regex () groups have strong memory

Details: The new re module (sre) has a problem with () grouping.
It seems () groups sometimes "remember" the previous
match.  Try the following:


#!/usr/bin/python

"""
	This module parses a CSV string and produces a list whose
	elements are the Comma-Seperated Values.  myCSV uses Python's
	regular expression matching.  The search pattern uses three
	groups of ().  findall() returns a list of tuples containing
	three elements--the results of the search groups.  The lambda
	function returns the correct element of the three.

        For more info on regular expressions see:
            Mastering Regular Expressions
            Jeffrey E. F. Friedl
            O'Reilly Press

######################################################################
        Created:            23-Mar-2000
        Author:             David S. Alessio
        Last Modified:      $Modtime:$
        Last Modified by:   $Author:$
        Revision:           $Revision:$

	Revision History:
        $Log:$
######################################################################
"""

import re


class myCSV:
    def __init__(self):
        self.srchPat = '"([^"\\\\]*(\\\\.[^"\\\\]*)*)",?|([^,]+),?|,'
        self.srchProg = re.compile(self.srchPat)
    
    def __str2num(self, str):
        try:
            num = int(str)
        except:
            try:
                num = long(str)
            except:
                try:
                    num = float(str)
                except:
                    num = str
        return num

    def __pick_best_value(self, t):
        value = t[0] or t[2]
        return self.__str2num(value)
        
    def csv2list(self, str):
#       print self.srchProg.findall(str)
        return map(lambda t: t[0] or t[2], self.srchProg.findall(str))
        
    def csv2numlist(self, str):
        return map(self.__pick_best_value, self.srchProg.findall(str))

    def test(self):
        str = '1,2,3,"ab\\"c","de,f",name,"123",,,"xyz","","zzz"'
        tstnumlst =
[1,2,3,'ab\\"c','de,f','name',123,'','','xyz','','zzz']
        list = self.csv2list(str)
        numlst = self.csv2numlist(str)
        print str
        print list
        print numlst
        print tstnumlst
#"""
        if tstnumlst != numlst:
            print "\nERROR:"
            if len(numlst) > len(tstnumlst):
                print 'NumList Too Long!!!'
            elif len(numlst) < len(tstnumlst):
                print 'NumList Too Short!!!'
            else:
                print '            Expected      Got'
                for i in xrange(len(tstnumlst)):
                    mark = ''
                    if tstnumlst[i] != numlst[i]:
                        mark = ' <==='
                    print 'L[%2d]: %10s %10s %s' % (i, tstnumlst[i],
numlst[i], mark)
                    
        else:
            print 'OK'
#"""

if __name__ == '__main__':
    from sys import argv
    if len(argv) == 1:
        print 'Compiling CSV'
        x = myCSV();
        print 'Testing...'
        x.test()
    elif len(argv) == 2:
        fin = open(argv[1])
        x = myCSV();
        while 1:
            str = fin.readline()
            if not str: break
            csv = x.csv2numlist(str[:-1])
            print csv
    else:
        print 'Usage: %s [file.csv]' % argv[0]
		


Follow-Ups:

Date: 2001-Jan-05 07:36
By: effbot

Comment:
Fixed but not closed?  Might mean that it's fixed locally, but not in the
CVS repository.  I'll investigate. /F
-------------------------------------------------------

Date: 2000-Nov-09 11:54
By: effbot

Comment:
same as #117612
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=121654&group_id=5470