String is ASCII or UTF-8?

C. Benson Manica cbmanica at gmail.com
Tue Mar 9 12:17:58 EST 2010


On Mar 9, 12:07 pm, Tim Golden <m... at timgolden.me.uk> wrote:

> You can't. You can apply one or more heuristics, depending on exactly
> what your requirement is. But any valid ASCII text is also valid
> UTF8-encoded text since UTF-8 isn't "two bytes per char" but a variable
> number of bytes per char.

Hm, well that's very unfortunate.  I'm using a database library which
seems to assume that all strings passed to it are ASCII, and I'm
attempting to use it on two different systems - one where all strings
are ASCII, and one where they seem to be UTF-8.  The strings come from
the same place, i.e. they're exclusively normal ASCII characters.
What I would want is to check once for whether the strings passed to
function foo() are ASCII or UTF-8, and if they are to assume that all
strings need to be decoded.  So that's not possible?



More information about the Python-list mailing list