[I18n-sig] string encoding attribute (Strawman Proposal: Binary
Mon, 12 Feb 2001 11:53:55 +0100
Toby Dickenson wrote:
> On Sat, 10 Feb 2001 15:56:10 +0100, "M.-A. Lemburg" <email@example.com>
> >Note that changing e.g. .encode('latin-1') to return a binary string
> >doesn't really make sense, since here we know the encoding ! Instead,
> >strings should probably carry along the encoding information in an
> >additional attribute (it is not always useful, but can help in
> >a few situations) provided that it is known.
> To what use would that encoding attribute be put?
The lack of encoding information is the cause of all the problems
related to coercing 8-bit strings to Unicode. If we had this
information on a per-string basis, then we could do a *much*
> surely not to
> provide automatic encoding when these tagged strings interact with
> unicode strings (Thats back towards the solution that I think we
> already ruled out)
Depends on who "we" is ;-) I believe that we should reconsider
the idea on different grounds.
Back when this was discussed on
python-dev, the main argument against adding such an attribute
was that the its value would be coerced to 'binary' much too
fast to be of any value. That was certainly true at the time,
but the current ideas tossed around on this list suggest that
we are moving towards a clearer distinction between binary and
In the current context, the attribute could well be used to
avoid using magic when it comes to guessing the encoding of 8-bit
strings at coercion time.
> If .encode('latin1') or .encode('utf8') are going to return anything
> tagged with an encoding, then surely it should be a tagged binary
No. The encoding attribute would then return 'latin-1' and 'utf-8'
resp. -- that's the point of the attribute: it should store the
encoding information in case it is available.
Python Pages: http://www.lemburg.com/python/