At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``).
As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). So, on balance, making ebytes a separate type (perhaps one that's just a pointer to the bytes and a pointer to the encoding) would indeed make more sense. It having different coercion rules for interacting with strings would make more sense too in that case. (The ideal, of course, would still be to not let bytes objects be stringlike at all, with only ebytes acting string-like. That way, you'd be forced to be explicit about your encoding when working with bytes, but all you'd need to do was make an ebytes call.)