[Tutor] To write data in two different fonts?
Dave Angel
davea at ieee.org
Thu Aug 13 13:21:08 CEST 2009
Nick Raptis wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Dave
> Angel wrote:
>> As I said, you'd probably get in trouble if any of the lines had '&'
>> or '<' characters in them. The following function from the standard
>> library can be used to escape the line directly, or of course you
>> could use the function Nick supplied.
>>
>> xml.sax.saxutils.escape(/data/[, /entities/])
>>
>> Escape '&', '<', and '>' in a string of data.
>>
>> You can escape other strings of data by passing a dictionary as the
>> optional /entities/ parameter. The keys and values must all be
>> strings; each key will be replaced with its corresponding value. The
>> characters '&', '<' and '>' are always escaped, even if /entities/
>> is provided.
>>
>> Let us know if that doesn't do the trick.
>>
>> DaveA
>>
> Thanks Dave for the info on xml.sax.saxutils.escape
> Didn't know about this one.
>
> For the rest:
> It is sometimes
> This is the source code of the xml.sax.saxutils.escape function:
>
> ---------------------------------------
> def __dict_replace(s, d):
> """Replace substrings of a string using a dictionary."""
> for key, value in d.items():
> s = s.replace(key, value)
> return s
>
> def escape(data, entities={}):
> """Escape &, <, and > in a string of data.
>
> You can escape other strings of data by passing a dictionary as
> the optional entities parameter. The keys and values must all be
> strings; each key will be replaced with its corresponding value.
> """
>
> # must do ampersand first
> data = data.replace("&", "&")
> data = data.replace(">", ">")
> data = data.replace("<", "<")
> if entities:
> data = __dict_replace(data, entities)
> return data
> -----------------------------------------
>
> As you can see, it too uses string.replace to do the job.
> However, using a built-in function that works for what you want to do
> is preferable.
> It's tested and might also be optimized to be faster.
> It's easy and fun to look into the source though and know exactly what
> something does.
> It's also one of the ways for a begginer (me too) to progress.
>
> From the source code I can see this for example:
> *Don' t pass the entity dictionary I proposed earlier to this function:*
> entities = {'&' : '&',
> '<' : '<',
> '>' : '>',
> '"' : '"',
> "'" : '''}
> If you pass an entity for '&' into escape(), it will escape it in the
> already partially escaped string, resulting in chaos.
>
> Think of it, this function not checking for a '&' entity passed to it
> might worth qualifying as a bug :)
>
> Nick
>
>
Yes, duplicating the & entitity would be a bug in the caller's code
in this case. (see my posted improvements to the OP code, which removed
the variable entities entirely) The question is whether this function's
doc should have such a warning, or whether the function should make sure
double-substitution does not happen.
The & entity is the only predefined entity in the S3 standard that
has this problem. For example, there's no entity that replaces the
letter 'a' or the semicolon. And a quote sign is never used within an
encoded entity.
I think perhaps an improved version would either ignore a & key in the
supplied dictionary, or throw an exception if one is encountered. The
question that must always be answered is whether this could break
existing code.
There are legitimate reasons for a string to be escaped twice. Think
what happens when a website wants to quote some html source code. Or a
little less recursively, suppose you have a website teaching xml. The
examples posted would need to be double-escaped. However, if someone
had tried to do that in a single call to the current function, their
code would already be broken because the dictionary doesn't preserve
order, so the & substitution might not happen first. Such a user must
call the escape function twice, without passing & at all.
DaveA
More information about the Tutor
mailing list