[ANN] pyxser-0.2r --- Python XML Serialization

Daniel Molina Wegener dmw at coder.cl
Mon Apr 20 04:52:42 CEST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Stefan Behnel <stefan_ml at behnel.de>
on Sunday 19 April 2009 15:08
wrote in comp.lang.python:


> Daniel Molina Wegener wrote:
>> Stefan Behnel <stefan_ml at behnel.de>
>> on Sunday 19 April 2009 02:25
>> wrote in comp.lang.python:
>> 
>> 
>>> Daniel Molina Wegener wrote:
>>>>         * Every serilization is made into unicode objects.
>>> Hmm, does that mean that when I serialise, I get a unicode object back?
>>> What about the XML declaration? How can a user create well-formed XML
>>> from your output? Or is that not the intention?
>> 
>>   Yes, if you serialize an object you get an XML string as
>> unicode object, since unicode objects supports UTF-8 and
>> some other encodings.
> 
> That's not what I meant. I was wondering why you chose to use a unicode
> string instead of a byte string (which XML is defined for). If your only
> intention is to deserialise the unicode string into a tree, that may be
> acceptable.

  Since libxml2 default encoding is UTF-8, and most applications are using
XML encoded in UTF-8, it's clear to define it as the default encoding for
the generated XML. Also, if take a little bit of time and read the
documentation, you can use any encoding supported by Python, such as
latin1, aka iso-8859-1. UTF-8 it's just the default encoding.

  The first intention was to have an C14N representation of python objects,
and regarding the C14N specification, I can't use another encoding for C14N
representation.

> However, as soon as you start writing the data to a file or 
> through a network pipe, or pass it to an XML parser, you'd better make it
> well-formed XML. So you either need to encode it as UTF-8 (for which you
> do not need a declaration),

  I repeat, it's just the default encoding. But do you which exception do
you get with byte strings and wrong encoded strings (think on accents and
special characters)?, Unicode objects in python support most of regular
encodings.

> or you will need to encode it in a different 
> byte encoding, and then prepend a declaration yourself. In any case, this
> is a lot more overhead (and cumbersome for users) than writing out a
> correctly serialised byte string directly.

  No, I'm just using the default encoding for libxml2 which can be converted
or reencoded to other character sets, and if read the documentation, you
will see that you can use most of python supported encodings.

> 
> You seemed to be very interested in good performance, so I don't quite
> understand why you want to require an additional step with a relatively
> high performance impact that only makes it harder for users to use the
> tool correctly.

  By using a different encoding than the default encoding for libxml2 makes
the work hard for libxml2 since it requires that every #PCDATA section to be
reencoded to the desired encoding and comparing one string conversion in
python against many string conversion under libxml2, the program gets more
slow performance by using a different encoding than the default encoding.
Also, since it is the default encoding, using an UTF-8 string in python
by passing the UTF-8 string buffer and size does not have a huge impact
on performance.

> 
> Stefan

Best regards,
- -- 
 .O. | Daniel Molina Wegener   | FreeBSD & Linux
 ..O | dmw [at] coder [dot] cl | Open Standards
 OOO | http://coder.cl/        | FOSS Developer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iQIcBAEBCgAGBQJJ6+N6AAoJEHxqfq6Y4O5NdKAQAMXyoK/V4/bI16D9naydS4n8
IdjZ+R9MJIOKeUhuDABnk1ieyOB8Uxga86lyVOIaXnN4LK6wWioci+TxzoVgJJ8q
pUiiG9E1jq6rQ7DTJN3enoCi7odOVrKr4L69mkZ9GMLkfWI3cdvcwZIq42eev2LI
yGCnJbHCwR2tgo4YCSy/luBucHCdW8ZkV0A8WMD7f2nZJgRygzqwwx6gOUpFGj1H
UH0AfzCvZLndhh9THl4xz2eIT+6SeaNM5s9Oq04gz64jOKiHPuX1sZMAqxQgQCVQ
v7HnPBq1oBkqwX/sSF4BR+Gqitue10ya1jWHJsln2e76KGXFDCaun1F1vfoa8HZI
RE7XawXprTTpCCQ9KVv+NSeKG6dnnxhYKA0SKXCmcgh2CTjxZPFpNqXlTCof2pdp
gKLWwD5te/DaYTh/GRpTnYsJMGtrHlUQ8KEIBEg2j7cItkgpPx1siNDe0WQoXo17
+fwmKeuNDJwCWAM1n6Bgp28AkJ7Fs32E+t1zN5Ij0QrbJX/ez58Z3hGszS57zsNY
bvhcdFVvt+AOF+uL2Kubmaj3g0ta406Oic/MzCjIe9yE+pmBikcgYce0oU3b44F5
8z/w3ZsaWPCMS2V4FRqaUMQzDpE7XW/7GRU4OaHyJLfGQxj0bfDogL0WAhYhKhyf
/myumLDlCsPu1HhD6PdB
=nbx6
-----END PGP SIGNATURE-----



More information about the Python-list mailing list