
Nick Coghlan writes:
On Fri, May 27, 2011 at 12:02 PM, INADA Naoki <songofacandy@gmail.com> wrote:
Then, I hope bytes has a fast and efficient "format" method like:
I still don't see a use case for a fast and efficient bytes.format() method. The latin-1 codec is O(n) with a very small coefficient. It seems to me this is "really" all about TOOWTDI: we'd like to be able to interpolate data received as arguments into a data stream using the same idiom everywhere, whether the stream consists of text, bytes, or class Froooble instances. (I admit I don't offhand know how you'd spell "{0}" in a Froooble stream.) OK, so at present only bytes is a plausible application, but I'm willing to go there. Then, if it turns out that the latin-1 codec imposes too high overhead on .format() in some application, the concerned parties can optimize it.
b'{0} {1}'.format(23, b'foo') # accepts int, float, bytes, bool, None
I don't see a use case for accepting bool or None. I hadn't thought about float, but are you really gonna need it? On-the-fly generation of CSS "'{0}em'.format(0.5)" or something like that, I guess?
23 foo
b'{0}'.format('foo') # raises TypeError for other types.
Philip Eby has a use case for accepting str as long as the ascii codec in strict error mode works on the particular instances of str. Although I'm not sure he would consider a .format() method efficient enough, ISTR he wanted the compiler to convert literals.
TypeError
What method is invoked to convert the numbers to text? What encoding is used to convert those numbers to text? How does this operation avoid also converting the *bytes* object to text and then reencoding it?
OTOH, Nick, aren't you making this harder than it needs to be? After all,
Bytes are not text.
Precisely. So bytes.format() need not handle *all* text-like manipulations, just protocol magic that puns ASCII-encoded text. If a bytes object is displayed sorta like text, then it *is* *all* bytes in the ASCII repertoire (not even the right half of Latin-1 is allowed). In bytes.format(), bytes are bytes, they don't get encoded, they just get interpolated into the bytes object being created. For other stuff, especially integers, if there is a conventional represention for it in ASCII, it *might* be an appropriate conversion for bytes.format() (but see above for my reservations about several common Python types). str (Unicode) might be converted via the ascii codec in strict errors mode, although the purist in me really would rather not go there. AFAICS, this handles all use cases presented so far.
The pedagogic cost of making it even harder than it already is to convince people that bytes are not text would also need to be considered.
This bothers me quite a bit, but my sense is that practicality is going to beat purity (into a bloody pulp :-P) once again.