Python 3.2 has some deadly infection
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Fri Jun 6 11:44:57 EDT 2014
Le vendredi 6 juin 2014 17:25:47 UTC+2, Chris Angelico a écrit :
> On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>
> > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
>
> >>
>
> >>
>
> >> How text is represented is very different from whether text is a
>
> >> fundamental data type. A fundamental text file is such that ordinary
>
> >> operating system facilities can't see inside the black box (that is,
>
> >> they are *not* encoded as far as the applications go).
>
> >
>
> > Of course they are. It may be an ASCII-encoding of some flavor or other, or
>
> > something really (to me) strange -- but an encoding is most assuredly in
>
> > affect.
>
>
>
> Allow me to explain what I think Marko's getting at here.
>
>
>
> In most file systems, a file exists on the disk as a set of sectors of
>
> data, plus some metadata including the file's actual size. When you
>
> ask the OS to read you that file, it goes to the disk, reads those
>
> sectors, truncates the data to the real size, and gives you those
>
> bytes.
>
>
>
> It's possible to mount a file as a directory, in which case the
>
> physical representation is very different, but the file still appears
>
> the same. In that case, the OS goes reading some part of the file,
>
> maybe decompresses it, and gives it to you. Same difference. These
>
> files still contain bytes.
>
>
>
> A "fundamental text file" would be one where, instead of reading and
>
> writing bytes, you read and write Unicode text. Since the hard disk
>
> still works with sectors and bytes, it'll still be stored as such, but
>
> that's an implementation detail; and you could format your disk UTF-8
>
> or UTF-16 or FSR or anything you like, and the only difference you'd
>
> see is performance.
>
>
>
> This could certainly be done, in theory. I don't know how well it'd
>
> fit with any of the popular OSes of today, but it could be done. And
>
> these files would not have an encoding; their on-platter
>
> representations would, but that's purely implementation - the text
>
> that you wrote out and the text that you read in are the same text,
>
> and there's been no encoding visible.
>
>
----------
>From the three, you can already eliminates one.
It's not a good new.
sys.getsizeof('Gödel'.encode('utf-8'))
23
sys.getsizeof('Gödel'.encode('utf-16-le'))
27
sys.getsizeof('Gödel')
42
os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
61
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
79
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
100
jmf
More information about the Python-list
mailing list