[Python-Dev] Re: Allowing u.encode() to return non-strings

Terry Reedy tjreedy at udel.edu
Wed Jun 30 12:32:50 EDT 2004


"Bill Janssen" <janssen at parc.com> wrote in message
news:04Jun29.224113pdt."58612"@synergy1.parc.xerox.com...
> is that the byte vectors we tend to call strings in Python have no
> string-ness, as understood in the 21st century.

Python strings are sequences of 0 to n chars from an abstract 256-char
alphabet.  This meets my understanding of the standard 20th century CS
definition of string.  Has there been a significant change in the last few
years?

>  There is no character set associated with them,

The byte set is intentionally not any *particular* natural language char
set, but a possible carrier for any of them.  Perhaps unfortunately, it
lacks a single standard glyph set or graphic representation., but I believe
Unicode also differentiates between characters (code points?) and glyphs
(which are also not standardized).  The byte set also (fortunately) lacks
the complications of letters, capitals, signs, marks, ligatures, symbols,
and so on, which complications usually make the chararacter set for a
particular language somewhat fuzzy.

> documentation, particularly the language manual, is extremely
> confusing on this point, in classifying "string" and "Unicode" objects
> as the same sort of thing.

I think it a matter a viewpoint whether one emphasizes the similarities or
differences.

> And then not documenting them clearly.

The subject of strings, Unicode, internationalization, and Python could use
a manual in itself.

> Unicode ... is not integrated with the file streams support.

Reading numbers other than bytes is also not integrated with the file type.
Adding a 'bytes' parameters to file(), or a readbytes(n) method, would be
generally helpful for anyone wanting to iterate thru a file in chunks other
than 'lines'.

Terry J. Reedy






More information about the Python-Dev mailing list