Regexp: unexspected splitting of string in several groups

Piet pit.grinja at gmx.de
Mon May 31 07:41:11 EDT 2004


Hello,
I have a very strange problem with regular expressions. The problem
consists of analyzing the properties of columns of a MySQL database.
When I request the column type, I get back a string with the following
composition:
vartype(width[,decimals]|list) further variable attributes.
vartype is a simple string(varchar, tinyint ...) which might be
followed by a string in curved brackets. This bracketed string is
either composed of a single number, two numbers separated by a comma,
or a list of strings separated by a comma. After the bracketed string,
there might be a list of further strings (separated by blanks)
describing some more properties of the column.
Typical examples are:
char(30) binary
int(10) zerofill
float(3,2)...
I would like to extract the vartype, the bracketed string and the
further properties separately and thus defined the following regular
expression:
#snip
vartypePattern = re.compile("([a-zA-Z]+)(\(.*\))*([^(].*[^)])")
vartypeSplit = vartypePattern.match("float(3,2) not null")
#snip
That works for some expressions with a bracketed expression. E.g. the
above expression gives back:
vartypeSplit.groups() = ('float', '(30,2)', ' not null').
However, simple one-string expressions like
vartypeSplit = vartypePattern.match("float")
are always splitted into two strings. The result is:
vartypeSplit.groups() = ('flo', None, 'at').
I would have either expected ('float',None,None) or ('float','','').
For other strings, the last two characters are also found in a
separate group.
Is this a bug or a feature? ;-)
Can anybody point me in the right direction to solve the problem.
Many thanks 
Piet



More information about the Python-list mailing list