[Tutor] To write data in two different fonts?

Dave Angel davea at ieee.org
Thu Aug 13 13:21:08 CEST 2009


Nick Raptis wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Dave 
> Angel wrote:
>> As I said, you'd probably get in trouble if any of the lines had '&' 
>> or '<' characters in them.  The following function from the standard 
>> library can be used to escape the line directly, or of course you 
>> could use the function Nick supplied.
>>
>> xml.sax.saxutils.escape(/data/[, /entities/])
>>
>>    Escape '&', '<', and '>' in a string of data.
>>
>>    You can escape other strings of data by passing a dictionary as the
>>    optional /entities/ parameter. The keys and values must all be
>>    strings; each key will be replaced with its corresponding value. The
>>    characters '&', '<' and '>' are always escaped, even if /entities/
>>    is provided.
>>
>> Let us know if that doesn't do the trick.
>>
>> DaveA
>>
> Thanks Dave for the info on xml.sax.saxutils.escape
> Didn't know about this one.
>
> For the rest:
> It is sometimes
> This is the source code of the xml.sax.saxutils.escape function:
>
> ---------------------------------------
> def __dict_replace(s, d):
>    """Replace substrings of a string using a dictionary."""
>    for key, value in d.items():
>        s = s.replace(key, value)
>    return s
>
> def escape(data, entities={}):
>    """Escape &, <, and > in a string of data.
>
>    You can escape other strings of data by passing a dictionary as
>    the optional entities parameter.  The keys and values must all be
>    strings; each key will be replaced with its corresponding value.
>    """
>
>    # must do ampersand first
>    data = data.replace("&", "&amp;")
>    data = data.replace(">", "&gt;")
>    data = data.replace("<", "&lt;")
>    if entities:
>        data = __dict_replace(data, entities)
>    return data
> -----------------------------------------
>
> As you can see, it too uses string.replace to do the job.
> However, using a built-in function that works for what you want to do 
> is preferable.
> It's tested and might also be optimized to be faster.
> It's easy and fun to look into the source though and know exactly what 
> something does.
> It's also one of the ways for a begginer (me too) to progress.
>
> From the source code I can see this for example:
> *Don' t pass the entity dictionary I proposed earlier to this function:*
> entities = {'&' : '&amp;',
>           '<' : '&lt;',
>           '>' : '&gt;',
>           '"' : '&quot;',
>           "'" : '&apos;'}
> If you pass an entity for '&' into escape(), it will escape it in the 
> already partially escaped string, resulting in chaos.
>
> Think of it, this function not checking for a '&' entity passed to it 
> might worth qualifying as a bug :)
>
> Nick
>
>
Yes, duplicating the &amp; entitity would be a bug in the caller's code 
in this case.  (see my posted improvements to the OP code, which removed 
the variable entities entirely) The question is whether this function's 
doc should have such a warning, or whether the function should make sure 
double-substitution does not happen.

The &amp; entity is the only predefined entity in the S3 standard that 
has this problem.  For example, there's no entity that replaces the 
letter 'a' or the semicolon.  And a quote sign is never used within an 
encoded entity.

I think perhaps an improved version would either ignore a & key in the 
supplied dictionary, or throw an exception if one is encountered.  The 
question that must always be answered is whether this could break 
existing code.

There are legitimate reasons for a string to be escaped twice.  Think 
what happens when a website wants to quote some html  source code.  Or a 
little less recursively, suppose you have a website teaching xml.  The 
examples posted would need to be double-escaped.  However, if someone 
had tried to do that in a single call to the current function, their 
code would already be broken because the dictionary doesn't preserve 
order, so the & substitution might not happen first.  Such a user must 
call the escape function twice, without passing & at all.

DaveA



More information about the Tutor mailing list