xml.minidom is stripping out my CRLF's in attrib values!!

sismex01 at hebmex.com sismex01 at hebmex.com
Mon Sep 9 10:50:38 EDT 2002

> > All line breaks must have been normalized on input to #xA 
> > as described in 2.11 End-of-Line Handling, so the rest of
> > this algorithm operates on text normalized in this way. 
> > 
> > Begin with a normalized value consisting of the empty string.
> > 
> > For each character, entity reference, or character reference in the
> > unnormalized attribute value, beginning with the first and continuing
> > to the last, do the following: 
> > 
> > For a character reference, append the referenced character to the
> > normalized value. 
> > 
> > For an entity reference, recursively apply step 3 of this algorithm
> > to the replacement text of the entity. 
> > 
> > For a white space character (#x20, #xD, #xA, #x9), append a space
> > character (#x20) to the normalized value. 
> > 
> > For another character, append the character to the normalized value.
> > 
> > If the attribute type is not CDATA, then the XML processor must
> > further process the normalized attribute value by discarding any
> > leading and trailing space (#x20) characters, and by replacing
> > sequences of space (#x20) characters by a single space (#x20)
> > character. 
> > 
> > Note that if the unnormalized attribute value contains a character
> > reference to a white space character other than space (#x20), the
> > normalized value contains the referenced character itself
> > (#xD, #xA or #x9). This contrasts with the case where the
> > unnormalized value contains a white space character (not a
> > reference), which is replaced with a space character (#x20)
> > in the normalized value and also contrasts with the case where
> > the unnormalized value contains an entity reference whose
> > replacement text contains a white space character; being
> > recursively processed, the white space character is replaced
> > with a space character (#x20) in the normalized value. 
> > 
> > All attributes for which no declaration has been read should be
> > treated by a non-validating processor as if declared CDATA. 
> > 
> So the only way to get a newline into an attribute is to 
> escape it in using an entity reference.


Thanks Duncan, miniXML grows by the minute!



For those wondering, "miniXML" is a tiny pure-Python
implementation of an XML tokenizer (a generator),
a parser (a class which uses the tokenizer) a
document-object class (for in-memory access, read-write),
an event-processing class (for streaming), an object
pickling mixin class, and some other stuff tossed in
for good measure.

For those times when you don't need the big guns.


La informacion contenida en este mensaje es confidencial y restringida y
esta destinada unicamente para el uso de la persona arriba indicada, Esta
comunicacion representa la opinion personal del remitente y no refleja
necesariamente la opinion de la Compañia. Se le notifica que esta
estrictamente prohibida cualquier difusion, distribucion o copia de este
mensaje. Si ha recibido esta comunicacion o copia de este mensaje por error,
o si hay problemas en la transmision, favor de comunicarse con el remitente.

Todo el correo electrónico enviado para o desde esta dirección será
procesado por el sistema de correo corporativo de HEB. Tal correo
electrónico esta sujeto a ser almacenado y puede ser revisado por alguien
ajeno al recipiente autorizado con el propósito de monitorear que se cumplan
las normas de seguridad de la empresa.

More information about the Python-list mailing list