[XML-SIG] Weirdness (bug?) with smart_len (wasRe: Issues with Unicode type)

Eric van der Vlist vdv@dyomedea.com
25 Sep 2002 11:24:19 +0200


On Mon, 2002-09-23 at 23:16, Uche Ogbuji wrote:

> Oh, but then Python is so much simpler:
>=20
>    =20
> SP_PAT =3D re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")
> def smart_len(u):
>     sp_count =3D len(SP_PAT.findall(u))
>     return len(u) - sp_count
>=20

I am trying to use this when python is compiled with ucs2, but I am
seeing a weird behavior when using this function: it seems that it can't
stand being compiled as a .pyc!

I have:

test.py:
#!/usr/bin/env python
import Smart_len

print Smart_len.smart_len(u'\U00010800')

and Smart_len.py:

import re

SP_PAT =3D re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")

def smart_len(u):
	sp_count =3D len(SP_PAT.findall(u))
	return len(u) - sp_count

It's working the 1st time (or when I remove Smart_len.pyc) but fails
after the second execution:

vdv@ibook:~/xmlschemata-cvs/downloads/python/xvif$ rm Smart_len.pyc
vdv@ibook:~/xmlschemata-cvs/downloads/python/xvif$ ./test.py=20
1
vdv@ibook:~/xmlschemata-cvs/downloads/python/xvif$ ./test.py=20
Traceback (most recent call last):
  File "./test.py", line 2, in ?
    import Smart_len
UnicodeError: UTF-8 decoding error: unexpected code byte

Weird, isn't it?

Thanks

Eric
--=20
Rendez-vous =E0 Paris.
                          http://www.technoforum.fr/integ2002/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------