regex help
Gabriel Rossetti
gabriel.rossetti at arimaz.com
Wed Dec 16 12:16:45 EST 2009
Hello everyone,
I'm going nuts with some regex, could someone please show me what I'm
doing wrong?
I have an XMPP msg :
<message xmlns='jabber:client' to='node at host.com'>
<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
<parameters>
<param1>123</param1>
<param2>456</param2>
</parameters>
<payload type='plain'>...</payload>
</mynode>
<x xmlns='jabber:x:expire' seconds='15'/>
</message>
the <parameter> node may be absent or empty (<parameter/>), the <x> node
may be absent. I'd like to grab everything exept the <payload> nod and
create something new using regex, with the XMPP message example above
I'd get this :
<message xmlns='jabber:client' to='node at host.com'>
<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
<parameters>
<param1>123</param1>
<param2>456</param2>
</parameters>
</mynode>
<x xmlns='jabber:x:expire' seconds='15'/>
</message>
for some reason my regex doesn't work correctly :
r"(<message .*?>).*?(<mynode
.*?>).*?(?:(<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"
I group the opening <message> node, the opening <mynode> node and if the
<parameters> node is present and not empty I group it and if the <x>
node is present I group it. For some reason this doesn't work correctly :
>>> import re
>>> s1 = "<message xmlns='jabber:client' to='node at host.com'><mynode
xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload
type='plain'>...</payload></mynode><x xmlns='jabber:x:expire'
seconds='15'/></message>"
>>> s2 = "<message xmlns='jabber:client' to='node at host.com'><mynode
xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters/><payload
type='plain'>...</payload></mynode><x xmlns='jabber:x:expire'
seconds='15'/></message>"
>>> s3 = "<message xmlns='jabber:client' to='node at host.com'><mynode
xmlns='myprotocol:core' version='1.0' type='mytype'><payload
type='plain'>...</payload></mynode><x xmlns='jabber:x:expire'
seconds='15'/></message>"
>>> s4 = "<message xmlns='jabber:client' to='node at host.com'><mynode
xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload
type='plain'>...</payload></mynode></message>"
>>> s5 = "<message xmlns='jabber:client' to='node at host.com'><mynode
xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters/><payload
type='plain'>...</payload></mynode></message>"
>>> s6 = "<message xmlns='jabber:client' to='node at host.com'><mynode
xmlns='myprotocol:core' version='1.0' type='mytype'><payload
type='plain'>...</payload></mynode></message>"
>>> exp = r"(<message .*?>).*?(<mynode
.*?>).*?(?:(<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"
>>>
>>> re.match(exp, s1).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>",
'<parameters><param1>123</param1><param2>456</param2></parameters>', None)
>>>
>>> re.match(exp, s2).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s3).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s4).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>",
'<parameters><param1>123</param1><param2>456</param2></parameters>', None)
>>>
>>> re.match(exp, s5).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s6).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
Does someone know what is wrong with my expression? Thank you, Gabriel
More information about the Python-list
mailing list