
On Sun, Apr 28, 2013 at 5:12 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
I must be missing something - why not something like
def ensure_unicode(s): if not isinstance(s, text_type): s = s.decode('unicode_escape') return s
assuming you only ever pass bytestrings or Unicode in s, and text_type is unicode on 2.x and str on 3.x?
Because I don't *have* unicode in 2.x, only bytes. All the values coming from os, platform, etc. are bytes, and they don't necessarily have a specified encoding. For that matter, the strings I'm parsing are also bytes, with no specified encoding. Going to unicode is an invitation to platform-specific or machine-specific encoding errors.
Sadly, some type checking is unavoidable if you can't control what people pass in to your code, but it seems easy enough to deal with from a pragmatic point of view.
It's not the type that's the problem, it's the *contents* that make a difference here.