[Python-bugs-list] [ python-Feature Requests-681533 ] Additional string stuff.

Fri, 27 Jun 2003 23:29:03 -0700

Feature Requests item #681533, was opened at 2003-02-06 03:14
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=681533&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeremy Fincher (jemfinch)
Assigned to: Nobody/Anonymous (nobody)
Summary: Additional string stuff.

Initial Comment:
In a lot of my programs that mess with strings, I end up 
somewhere making a variable "ascii" via 
(string.maketrans('', '')).  It's just a 256-character string 
comprising the ascii character set. 

I use it, oftentimes, simply to be able to turn the 'translate' 
method on strings into a 'delete' method -- if I want to, 
say, remove all the spaces in aString, I'd do 
(aString.translate(ascii, string.whitespace)). 

Certainly an ascii variable in the string module couldn't 
hurt, would fit in with ascii_letters, etc. and would at least 
standarize the name of the full ascii set sure to be 
present in many Python programs. 

A little further out there, but I think just as useful, would 
be a delete method on strings.  So "foo bar 
baz".delete(string.whitespace) would return "foobarbaz".  
It would be equivalent to "foo bar baz".translate(ascii, 
string.whitespace), or the wildly inefficient: 

def delete(s, deleteChars): 
    l = [] 
    for c in s: 
        if c not in deleteChars: 
            l.append(c) 
    return ''.join(l) 

Anyway, that's all I can think of.  Do with it what you will. 

Jeremy 

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2003-06-28 01:29

Message:
Logged In: YES 
user_id=80475

The trend is away from including character strings as 
attributes -- they are instead being replaced with functions 
like str.isascii() or str.iswhitespace().

Also, it is so easy to construct a character list that it is not 
worth cluttering the string API (everything there must be 
documented, duplicated for unicode objects, and duplicated 
again for userstrings).  

Instead for maketrans, I would use something like this:
  rhchars  = ''.join(map(chr, range(65,100)))

So, unless compelling use cases can be found, I recommend 
closing this one.   

----------------------------------------------------------------------

Comment By: Cherniavsky Beni (cben)
Date: 2003-03-10 11:22

Message:
Logged In: YES 
user_id=36166

Just make the interface like the translate method of unicode
objects: it accepts "a mapping of Unicode ordinals to
Unicode ordinals, Unicode strings or None. Unmapped
characters are left untouched. Characters mapped to None are
deleted.".
This would make the str/unicode translate methods consistent
(currently there is no way to call the method that will work
for both).
I have no opinion on whether implementing 1-to-n
translations (like Python2.3 supports for unicode objects)
is worth the trouble for plain strings.
Of course, the table_string[, deletechars] interface should
still be supported for compatibility.

----------------------------------------------------------------------

Comment By: Jeremy Fincher (jemfinch)
Date: 2003-03-05 22:03

Message:
Logged In: YES 
user_id=99508

Ah, yes, you're right.  I never knew ASCII was only 7 bits per 
character.  Perhaps string.all_characters?  I just definitely think 
it should be publically available so there can be some 
consistency between applications that need a string of all 256 
8-bit characters. 

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2003-03-05 09:13

Message:
Logged In: YES 
user_id=45365

Note that "ascii" is definitely a bad name, as it means different things to anyone. To Python it usually means "7-bit ASCII" (as in the unicode "ascii" codec). To you it apparently means "8-bit something-or-other".

I have no opinion on whether this feature is a good idea, but if it is I would suggest a name with "any" or "all" in it, and possibly "8bit" too.

----------------------------------------------------------------------

Comment By: Jeremy Fincher (jemfinch)
Date: 2003-03-04 13:25

Message:
Logged In: YES 
user_id=99508

What's the status on this? 

I checked string.py, and there actually already is a value 
that's the 256 ASCII characters, but it's called _idmap.  Might 
we consider changing the name of that to "ascii"?  I'd be 
happy to make the patch. 

Jeremy 

----------------------------------------------------------------------

Comment By: Jeremy Fincher (jemfinch)
Date: 2003-02-07 01:47

Message:
Logged In: YES 
user_id=99508

I guess that's what I get for not reading the documentation :) 

Oh well, the other two suggestions stand :) 

Jeremy 

----------------------------------------------------------------------

Comment By: Jeremy Fincher (jemfinch)
Date: 2003-02-07 00:19

Message:
Logged In: YES 
user_id=99508

I guess that's what I get for not reading the documentation :) 

Oh well, the other two suggestions stand :) 

Jeremy 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-02-06 21:15

Message:
Logged In: YES 
user_id=31435

Just noting that you can pass None for sep if you want to 
explicitly ask for the default behavior.

>>> "      a  b   c".split(None, 1)
['a', 'b   c']
>>>

----------------------------------------------------------------------

Comment By: Jeremy Fincher (jemfinch)
Date: 2003-02-06 04:19

Message:
Logged In: YES 
user_id=99508

Let me also make one other suggestion:  the split method on a 
string object has, by default, behavior that can't be replicated 
by passing an argument as a separator.  That is, the default 
separated acts like re.split(r'\s+'), but it's impossible to pass 
any value into the method to achieve that same result. 

The problem arises when a user wants to use the maxsplit() 
parameter to the method.  Because maxsplit is a positional 
parameter instead of a keyword parameter, the user *must* 
declare a separate to split on, and thus loses his ability to split 
on whitespace-in-general.  If maxsplit was changed from being 
a positional parameter to being a keyword parameter, then a 
programmer wouldn't have to give up the default behavior of 
the split method in order to pass it a maxsplit. 

At present, negative maxsplit values don't differ in any way 
from split's default behavior (with no maxsplit parameter given).  
Thus, the keyword maxsplit could default to -1 with no 
breakage of code.  I can't see any place where changing 
maxsplit to a keyword parameter would break any existing 
code 

Jeremy 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=681533&group_id=5470