xml.minidom is stripping out my CRLF's in attrib values!!

sismex01 at hebmex.com sismex01 at hebmex.com
Mon Sep 9 10:50:38 EDT 2002


> > All line breaks must have been normalized on input to #xA 
> > as described in 2.11 End-of-Line Handling, so the rest of
> > this algorithm operates on text normalized in this way. 
> > 
> > Begin with a normalized value consisting of the empty string.
> > 
> > For each character, entity reference, or character reference in the
> > unnormalized attribute value, beginning with the first and continuing
> > to the last, do the following: 
> > 
> > For a character reference, append the referenced character to the
> > normalized value. 
> > 
> > For an entity reference, recursively apply step 3 of this algorithm
> > to the replacement text of the entity. 
> > 
> > For a white space character (#x20, #xD, #xA, #x9), append a space
> > character (#x20) to the normalized value. 
> > 
> > For another character, append the character to the normalized value.
> > 
> > If the attribute type is not CDATA, then the XML processor must
> > further process the normalized attribute value by discarding any
> > leading and trailing space (#x20) characters, and by replacing
> > sequences of space (#x20) characters by a single space (#x20)
> > character. 
> > 
> > Note that if the unnormalized attribute value contains a character
> > reference to a white space character other than space (#x20), the
> > normalized value contains the referenced character itself
> > (#xD, #xA or #x9). This contrasts with the case where the
> > unnormalized value contains a white space character (not a
> > reference), which is replaced with a space character (#x20)
> > in the normalized value and also contrasts with the case where
> > the unnormalized value contains an entity reference whose
> > replacement text contains a white space character; being
> > recursively processed, the white space character is replaced
> > with a space character (#x20) in the normalized value. 
> > 
> > All attributes for which no declaration has been read should be
> > treated by a non-validating processor as if declared CDATA. 
> > 
> 
> So the only way to get a newline into an attribute is to 
> escape it in using an entity reference.
> 

WOOHOO!!

Thanks Duncan, miniXML grows by the minute!

-gus

pd:

For those wondering, "miniXML" is a tiny pure-Python
implementation of an XML tokenizer (a generator),
a parser (a class which uses the tokenizer) a
document-object class (for in-memory access, read-write),
an event-processing class (for streaming), an object
pickling mixin class, and some other stuff tossed in
for good measure.

For those times when you don't need the big guns.

-gus
--

Advertencia: 
La informacion contenida en este mensaje es confidencial y restringida y
esta destinada unicamente para el uso de la persona arriba indicada, Esta
comunicacion representa la opinion personal del remitente y no refleja
necesariamente la opinion de la Compañia. Se le notifica que esta
estrictamente prohibida cualquier difusion, distribucion o copia de este
mensaje. Si ha recibido esta comunicacion o copia de este mensaje por error,
o si hay problemas en la transmision, favor de comunicarse con el remitente.


Todo el correo electrónico enviado para o desde esta dirección será
procesado por el sistema de correo corporativo de HEB. Tal correo
electrónico esta sujeto a ser almacenado y puede ser revisado por alguien
ajeno al recipiente autorizado con el propósito de monitorear que se cumplan
las normas de seguridad de la empresa.




More information about the Python-list mailing list