Python 3 is killing Python

wxjmfauth at gmail.com wxjmfauth at gmail.com
Wed Jul 16 15:22:46 CEST 2014


Le mercredi 16 juillet 2014 15:11:26 UTC+2, Marko Rauhamaa a écrit :
> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> 
> 
> 
> > With a few exceptions, /etc is filled with text files, not binary
> 
> > files, and half the executables on the system are text (Python, Perl,
> 
> > bash, sh, awk, etc.).
> 
> 
> 
> Our debate seems to stem from a different idea of what text is. To me,
> 
> text in the Python sense is a sequence of UCS-4 character code points.
> 
> The opposite of text is not necessarily binary.
> 
> 
> 
> Most of those "text" files under /etc expect ASCII. In many contexts,
> 
> they tolerate UTF-8 or Latin-3 or whatever, but it's a bit iffy (how are
> 
> extra-ASCII passwords encoded in the /etc/shadow?). Also, the files
> 
> under /etc, /var/log etc should not depend on the locale since they are
> 
> typically interpreted by daemons, which typically don't possess locales.
> 
> 
> 
> > Relatively rare. Like, um, email, news, html, Unix config files,
> 
> > Windows ini files, source code in just about every language ever,
> 
> > SMSes, XML, JSON, YAML, instant messenger apps,
> 
> 
> 
> I would be especially wary of letting Python 3 interpret those files for
> 
> me. Python's [text] strings could be a wonderful tool on the inside of
> 
> my program, but I definitely would like to micromanage the I/O. Do I
> 
> obey the locale or not? That's too big (and painful) a question for
> 
> Python to answer on its own (and pretend like everything's under
> 
> control).
> 
> 
> 
> > word processors... even *graphic* applications invariably have a text
> 
> > tool.
> 
> 
> 
> Thing is, the serious text utilities like word processors probably need
> 
> lots of ancillary information so Python's [text] strings might be too
> 
> naive to represent even a single character.
> 
> 
> 
> >> More often, len(b'λ') is what I want.
> 
> >
> 
> > Oh really? Are you sure? What exactly is b'λ'?
> 
> 
> 
> That's something that ought to work in the UTF-8 paradise.
> 
> Unfortunately, Python only allows ASCII in bytes. ASCII only! In this
> 
> day and age! Even C is not so picky:
> 
> 
> 
>    #include <stdio.h>
> 
> 
> 
>    int main()
> 
>    {
> 
>        printf("Hyvää yötä\n");
> 
>        return 0;
> 
>    }
> 
> 
> 
> 
> 
> Marko

--------

And if you are visiting, spying the bugs tracker,
dev lists and ... you will happily, this perpertual
way of thinking:

if ascii:
    do stuff
else:
    find a work around

It's quite funny for a tool which pretends
to live in the unicode world.


jmd



More information about the Python-list mailing list