[ python-Feature Requests-920680 ] readline not implemented for UTF-16

Wed May 26 15:52:36 EDT 2004

Feature Requests item #920680, was opened at 2004-03-21 17:37
Message generated for change (Comment added) made by etrepum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=920680&group_id=5470

Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Bob Ippolito (etrepum)
Assigned to: M.-A. Lemburg (lemburg)
Summary: readline not implemented for UTF-16

Initial Comment:
The StreamReader for UTF-16 (all three of them) doesn't 
implement readline.

----------------------------------------------------------------------

>Comment By: Bob Ippolito (etrepum)
Date: 2004-05-26 15:52

Message:
Logged In: YES 
user_id=139309

Also, I've moved the latest copy of the code to my public repository at:
http://svn.red-bean.com/bob/unicode/trunk/utf16reader.py

this should be free of any quirks, but I still can't reproduce whatever 
problem jim is having.

----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2004-05-26 15:46

Message:
Logged In: YES 
user_id=139309

Can you please give an example of a case where short lines get 
concatenated?  I can't fix it if I don't know what's wrong.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-05-26 15:41

Message:
Logged In: YES 
user_id=38388

I don't have time to review this now, but will get back to
it after EuroPython if you ping me. Thanks.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2004-05-19 19:10

Message:
Logged In: YES 
user_id=764593

It might be just an upload/download quirk, but when I tried, 
this concatenated short lines.  u"\n".join(...) worked better, 
but I'm not sure how that plays with other line breaks.  

It might work better to stick a class around the realine 
functions, so that self.buff can always be a (state-preserved) 
list; just return the first row, until the list length gets to one, 
then concatenate to that and resplit.

----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2004-05-19 14:38

Message:
Logged In: YES 
user_id=139309

Attaching a revised monkeypatch:
* splitlines is used (I wasn't aware of the other unicode EOL markers)
* 256 bytes is the new default buffer size

Why do you want sized and unsized to be in the same function?  They're 
both dispatched from readline as appropriate, and they are very different 
code paths.  It would be much uglier as one function, so I'm not going to 
do it in my own code.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-05-19 04:19

Message:
Logged In: YES 
user_id=38388

Thanks for the patch. Some comments:

* Unicode has a lot more line-end markers than just LF;
  you should use .splitlines() to break lines at all of them

* please collapse both methods (sized + unsized) into
  one method and default to 256 bytes for the buffer
  size

----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2004-05-18 19:22

Message:
Logged In: YES 
user_id=139309

I've attached a monkeypatch to get readline support for utf-16 codecs..

import utf16reader
utf16reader.install()

It can be trivially inserted into the utf16 encodings implementation.. it 
would be really cool if someone would audit the implementation and 
sneak it in before Python 2.4 :)

----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2004-03-21 17:54

Message:
Logged In: YES 
user_id=139309

I don't need it enough to write a patch, but this is what I used instead.. 
and it seems like it might work:

    try:    
        for line in inFile:
            tline = translator(line)
            outFile.write(tline)
    except NotImplementedError:
        BUFFER = 16384
        bytes = inFile.read(BUFFER)
        while bytes:
            lines = bytes.split(u'\n')
            bytes = lines.pop()
            for line in lines:
                tline = translator(line)
                outFile.write(tline)
            newbytes = inFile.read(BUFFER)
            bytes += newbytes
            if not newbytes and bytes:
                bytes += u'\n'

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-03-21 17:44

Message:
Logged In: YES 
user_id=38388

Patches are welcome :-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=920680&group_id=5470