[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
INADA Naoki
songofacandy at gmail.com
Thu May 26 04:57:24 CEST 2011
On Thu, May 26, 2011 at 10:58 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 5/25/2011 1:29 PM, INADA Naoki wrote:
>
>> Sadly, Python 3's bytes is not bytestring.
>
> By intention.
Yes, I know. But I feel sad because it cause many confusions.
Bytes supports some string methods.
>>> b"foo".capitalize() # Oh,
b'Foo'
>>> b"foo".isalpha() # alphabets in not-string?
True
>>> b"foo%d" % 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
>
>> import sys
>> fin = open(sys.stdin.fileno(), 'r', encoding='latin1')
>> fout = open(sys.stdout.fileno(), 'w', encoding='latin1')
>> for n, L in enumerate(fin):
>> fout.write('{0:5d}\t{1}'.format(n, L))
>>
>> If using 'latin1' is Pythonic way to handle encoding transparent string,
>> I think Python should provide another alias like 'bytes'.
>
> I presume that you mean you would like to write
> fin = open(sys.stdin.fileno(), 'r', encoding='bytes')
> fout = open(sys.stdout.fileno(), 'w', encoding='bytes')
>
> If such a thing were added, the 256 bytes should directly map to the first
> 256 codepoints. I don't know if 'latin1' does that or not. In any case,
Yes, 'latin1' directly maps 256 bytes to 256 codepoints.
> one
> can rewrite the above without decoding input lines.
>
> with open('tem.py', 'rb') as fin, open('tem2.txt', 'wb') as fout:
> for n, L in enumerate(fin):
> fout.write('{0:5d}\t'.format(n).encode('ascii'))
> fout.write(L)
>
> (sys.x.fineno raises fineno AttributeError in IDLE.)
>
There are 2 problems.
1) binary mode doesn't support line buffering. So I should disable buffering
and this may cause performance regression.
2) Requiring .encode('ascii') is less attractive when using Python as
a scripting
language in Unix.
But latin1 approach has disadvantage of performance and memory usage.
I think Python 3 doesn't provide easy and efficient way to implement encoding
transparent command like 'cat -n'. It's very sad.
--
INADA Naoki <songofacandy at gmail.com>
More information about the Python-ideas
mailing list