[Python-bugs-list] [ python-Bugs-595350 ] string method bugs w/ 8bit, unicode args

noreply@sourceforge.net noreply@sourceforge.net
Mon, 19 Aug 2002 15:08:25 -0700


Bugs item #595350, was opened at 2002-08-14 21:56
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=595350&group_id=5470

Category: Unicode
Group: Python 2.2.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Inyeol Lee (inyeol)
Assigned to: Guido van Rossum (gvanrossum)
Summary: string method bugs w/ 8bit, unicode args

Initial Comment:
Python 2.2.1 (#1, Apr 10 2002, 18:25:16) 
[GCC 2.95.3 20010315 (release)] on sunos5

1. "abc".endswith("c") ->1
   "abc".endswith(u"c") -> 0 # bug.
   u"abc".endswith("c") -> 1
   u"abc".endswith(u"c") -> 1

2. "aaa".rfind("a") -> 2
   "aaa".rfind(u"a") -> 0 # bug.
   u"aaa".rfind("a") -> 2
   u"aaa".rfind(u"a") -> 2

   .rindex() has the same bug.

3. "abc".rfind("") -> 3
   "abc".rfind(u"") -> 0 # bug.
   u"abc".rfind("") -> 0 # bug.
   u"abc".rfind(u"") -> 0 # bug.

   .rindex() has the same bug.

4. "abc".replace("", "x") -> ValueError
   "abc".replace(u"", "x") -> u'abcxxxx' # bug.
   u"abc".replace("", "x") -> u'abcxxxx' # bug.
   u"abc".replace(u"", "x") -> u'abcxxxx' # bug.

   They should raise ValueError, or return u'xaxbxcx'.
   BTW, how about changing s.replace("") behavior to
return
   "xaxbxcx" (or u"xaxbxcx") for all 4 cases? It is
consistent
   with other string methods and re.sub() method.
   It seems that Guido doesn't mind changing this.

[Guido]
> If someone really wants 'abc'.replace('', '-') to
return '-a-b-c-',
> please submit patches for both 8-bit and Unicode
strings to
> SourceForge and assign to me.  I looked into this and
it's
> non-trivial: the implementation used for 8-bit
strings goes into an
> infinite loop when the pattern is empty, and the
Unicode
> implementation tacks '----' onto the end.  Please
supply doc and
> unittest patches too.  At least re does the right
thing already:

5. (it's not a bug)
   Except for .replace() above, s.split() is the only
string method
   which raises exception. How about changing this to
return
   unmodified string when empty string  is given as a
separator?
   This is consistent with re.split() behavior.

- Inyeol Lee

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-19 18:08

Message:
Logged In: YES 
user_id=6380

I'm still reviewing this.

Next time, please send context diffs; diffs relative to
current (or at least fairly recent :-) CVS would also be
appreciated.

Don't add the bug number in comments for each change; I will
have to remove all those manually now...

I'm uploading a version patch that applies cleanly to
current CVS; it's not ready yet (the Unicode tests fail and
I have to clean up the comments).

Backporting will be a bitch, because I don't want the
changes for x.replace('', ...) to be backported (new
functionality etc.).

----------------------------------------------------------------------

Comment By: Inyeol Lee (inyeol)
Date: 2002-08-16 13:45

Message:
Logged In: YES 
user_id=595280

uploaded patch for these bugs.  -Inyeol Lee

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=595350&group_id=5470