
On Thu, May 26, 2011 at 3:29 AM, INADA Naoki <songofacandy@gmail.com> wrote:
There are some situation that I want to use bytes as a string in real world.
Breaking the bytes-are-text mental model is something we deliberately set out to do with Python 3 (because it is wrong). In today's global environment, programmers *need* to learn about text encoding issues as treating bytes as text without finding out the encoding first is a surefire way to get unintelligible mojibake. If "What does 'latin-1' mean?" is a question that gets them there, then that's fine. You *cannot* transparently handle data in arbitrary encodings, as the meanings of the bytes change based on the encoding (this is especially true when dealing with non-ASCII compatible encodings). That said, decoding and reencoding via 'ascii' (strict 7-bit) or 'latin-1' (full 8-bit) is the easiest way to handle both strings and bytes input reasonably efficiently. See urllib.parse for examples on how to do that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia