[ python-Feature Requests-920680 ] readline not implemented for
UTF-16
SourceForge.net
noreply at sourceforge.net
Wed May 26 15:52:36 EDT 2004
Feature Requests item #920680, was opened at 2004-03-21 17:37
Message generated for change (Comment added) made by etrepum
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=920680&group_id=5470
Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Bob Ippolito (etrepum)
Assigned to: M.-A. Lemburg (lemburg)
Summary: readline not implemented for UTF-16
Initial Comment:
The StreamReader for UTF-16 (all three of them) doesn't
implement readline.
----------------------------------------------------------------------
>Comment By: Bob Ippolito (etrepum)
Date: 2004-05-26 15:52
Message:
Logged In: YES
user_id=139309
Also, I've moved the latest copy of the code to my public repository at:
http://svn.red-bean.com/bob/unicode/trunk/utf16reader.py
this should be free of any quirks, but I still can't reproduce whatever
problem jim is having.
----------------------------------------------------------------------
Comment By: Bob Ippolito (etrepum)
Date: 2004-05-26 15:46
Message:
Logged In: YES
user_id=139309
Can you please give an example of a case where short lines get
concatenated? I can't fix it if I don't know what's wrong.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-05-26 15:41
Message:
Logged In: YES
user_id=38388
I don't have time to review this now, but will get back to
it after EuroPython if you ping me. Thanks.
----------------------------------------------------------------------
Comment By: Jim Jewett (jimjjewett)
Date: 2004-05-19 19:10
Message:
Logged In: YES
user_id=764593
It might be just an upload/download quirk, but when I tried,
this concatenated short lines. u"\n".join(...) worked better,
but I'm not sure how that plays with other line breaks.
It might work better to stick a class around the realine
functions, so that self.buff can always be a (state-preserved)
list; just return the first row, until the list length gets to one,
then concatenate to that and resplit.
----------------------------------------------------------------------
Comment By: Bob Ippolito (etrepum)
Date: 2004-05-19 14:38
Message:
Logged In: YES
user_id=139309
Attaching a revised monkeypatch:
* splitlines is used (I wasn't aware of the other unicode EOL markers)
* 256 bytes is the new default buffer size
Why do you want sized and unsized to be in the same function? They're
both dispatched from readline as appropriate, and they are very different
code paths. It would be much uglier as one function, so I'm not going to
do it in my own code.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-05-19 04:19
Message:
Logged In: YES
user_id=38388
Thanks for the patch. Some comments:
* Unicode has a lot more line-end markers than just LF;
you should use .splitlines() to break lines at all of them
* please collapse both methods (sized + unsized) into
one method and default to 256 bytes for the buffer
size
----------------------------------------------------------------------
Comment By: Bob Ippolito (etrepum)
Date: 2004-05-18 19:22
Message:
Logged In: YES
user_id=139309
I've attached a monkeypatch to get readline support for utf-16 codecs..
import utf16reader
utf16reader.install()
It can be trivially inserted into the utf16 encodings implementation.. it
would be really cool if someone would audit the implementation and
sneak it in before Python 2.4 :)
----------------------------------------------------------------------
Comment By: Bob Ippolito (etrepum)
Date: 2004-03-21 17:54
Message:
Logged In: YES
user_id=139309
I don't need it enough to write a patch, but this is what I used instead..
and it seems like it might work:
try:
for line in inFile:
tline = translator(line)
outFile.write(tline)
except NotImplementedError:
BUFFER = 16384
bytes = inFile.read(BUFFER)
while bytes:
lines = bytes.split(u'\n')
bytes = lines.pop()
for line in lines:
tline = translator(line)
outFile.write(tline)
newbytes = inFile.read(BUFFER)
bytes += newbytes
if not newbytes and bytes:
bytes += u'\n'
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-03-21 17:44
Message:
Logged In: YES
user_id=38388
Patches are welcome :-)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=920680&group_id=5470
More information about the Python-bugs-list
mailing list