[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

INADA Naoki songofacandy at gmail.com
Thu May 26 04:57:24 CEST 2011


On Thu, May 26, 2011 at 10:58 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 5/25/2011 1:29 PM, INADA Naoki wrote:
>
>> Sadly, Python 3's bytes is not bytestring.
>
> By intention.

Yes, I know. But I feel sad because it cause many confusions.
Bytes supports some string methods.

>>> b"foo".capitalize()  # Oh,
b'Foo'
>>> b"foo".isalpha()   # alphabets in not-string?
True
>>> b"foo%d" % 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'int'



>
>> import sys
>> fin = open(sys.stdin.fileno(), 'r', encoding='latin1')
>> fout = open(sys.stdout.fileno(), 'w', encoding='latin1')
>> for n, L in enumerate(fin):
>>     fout.write('{0:5d}\t{1}'.format(n, L))
>>
>> If using 'latin1' is Pythonic way to handle encoding transparent string,
>> I think Python should provide another alias like 'bytes'.
>
> I presume that you mean you would like to write
> fin = open(sys.stdin.fileno(), 'r', encoding='bytes')
> fout = open(sys.stdout.fileno(), 'w', encoding='bytes')
>
> If such a thing were added, the 256 bytes should directly map to the first
> 256 codepoints. I don't know if 'latin1' does that or not. In any case,

Yes, 'latin1' directly maps 256 bytes to 256 codepoints.

> one
> can rewrite the above without decoding input lines.
>
> with open('tem.py', 'rb') as fin, open('tem2.txt', 'wb') as fout:
>  for n, L in enumerate(fin):
>    fout.write('{0:5d}\t'.format(n).encode('ascii'))
>    fout.write(L)
>
> (sys.x.fineno raises fineno AttributeError in IDLE.)
>

There are 2 problems.

1) binary mode doesn't support line buffering. So I should disable buffering
    and this may cause performance regression.

2) Requiring .encode('ascii') is less attractive when using Python as
a scripting
   language in Unix.

But latin1 approach has disadvantage of performance and memory usage.

I think Python 3 doesn't provide easy and efficient way to implement encoding
transparent command like 'cat -n'. It's very sad.

-- 
INADA Naoki  <songofacandy at gmail.com>



More information about the Python-ideas mailing list