[Tutor] encoding question
spir
denis.spir at gmail.com
Sun Jan 5 11:06:34 CET 2014
On 01/05/2014 03:31 AM, Alex Kleider wrote:
> I've been maintaining both a Python3 and a Python2.7 version. The latter has
> actually opened my eyes to more complexities. Specifically the need to use
> unicode strings rather than Python2.7's default ascii.
So-called Unicode strings are not the solution to all problems. Example with
your 'á', which can be represented by either 1 "precomposed" code (unicode code
point) 0xe1, or ibasically by 2 ucodes (one for the "base" 'a', one for the
"combining" '´'). Imagine you search for "Bogotá": how do you know which is
reprsentation is used in the text you search? How do you know at all there are
multiple representations, and what they are? The routine wil work iff, by
chance, your *programming editor* (!) used the same representation as the
software used to create the searched test...
Usually it the case, because most text-creation software use precomposed codes,
when they exist, for composite characters. (But this fact just makes the issue
more rare, hard to be aware of, and thus difficult to cope with correctly in
code. As far as I know nearly no software does it.)
Denis
More information about the Tutor
mailing list