
On Thu, May 26, 2011 at 10:58 AM, Terry Reedy <tjreedy@udel.edu> wrote:
On 5/25/2011 1:29 PM, INADA Naoki wrote:
Sadly, Python 3's bytes is not bytestring.
By intention.
Yes, I know. But I feel sad because it cause many confusions. Bytes supports some string methods.
b"foo".capitalize() # Oh, b'Foo' b"foo".isalpha() # alphabets in not-string? True b"foo%d" % 3 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
import sys fin = open(sys.stdin.fileno(), 'r', encoding='latin1') fout = open(sys.stdout.fileno(), 'w', encoding='latin1') for n, L in enumerate(fin): fout.write('{0:5d}\t{1}'.format(n, L))
If using 'latin1' is Pythonic way to handle encoding transparent string, I think Python should provide another alias like 'bytes'.
I presume that you mean you would like to write fin = open(sys.stdin.fileno(), 'r', encoding='bytes') fout = open(sys.stdout.fileno(), 'w', encoding='bytes')
If such a thing were added, the 256 bytes should directly map to the first 256 codepoints. I don't know if 'latin1' does that or not. In any case,
Yes, 'latin1' directly maps 256 bytes to 256 codepoints.
one can rewrite the above without decoding input lines.
with open('tem.py', 'rb') as fin, open('tem2.txt', 'wb') as fout: for n, L in enumerate(fin): fout.write('{0:5d}\t'.format(n).encode('ascii')) fout.write(L)
(sys.x.fineno raises fineno AttributeError in IDLE.)
There are 2 problems. 1) binary mode doesn't support line buffering. So I should disable buffering and this may cause performance regression. 2) Requiring .encode('ascii') is less attractive when using Python as a scripting language in Unix. But latin1 approach has disadvantage of performance and memory usage. I think Python 3 doesn't provide easy and efficient way to implement encoding transparent command like 'cat -n'. It's very sad. -- INADA Naoki <songofacandy@gmail.com>