[Python-bugs-list] [ python-Feature Requests-681533 ] Additional string stuff.
SourceForge.net
noreply@sourceforge.net
Fri, 27 Jun 2003 23:29:03 -0700
Feature Requests item #681533, was opened at 2003-02-06 03:14
Message generated for change (Comment added) made by rhettinger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=681533&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeremy Fincher (jemfinch)
Assigned to: Nobody/Anonymous (nobody)
Summary: Additional string stuff.
Initial Comment:
In a lot of my programs that mess with strings, I end up
somewhere making a variable "ascii" via
(string.maketrans('', '')). It's just a 256-character string
comprising the ascii character set.
I use it, oftentimes, simply to be able to turn the 'translate'
method on strings into a 'delete' method -- if I want to,
say, remove all the spaces in aString, I'd do
(aString.translate(ascii, string.whitespace)).
Certainly an ascii variable in the string module couldn't
hurt, would fit in with ascii_letters, etc. and would at least
standarize the name of the full ascii set sure to be
present in many Python programs.
A little further out there, but I think just as useful, would
be a delete method on strings. So "foo bar
baz".delete(string.whitespace) would return "foobarbaz".
It would be equivalent to "foo bar baz".translate(ascii,
string.whitespace), or the wildly inefficient:
def delete(s, deleteChars):
l = []
for c in s:
if c not in deleteChars:
l.append(c)
return ''.join(l)
Anyway, that's all I can think of. Do with it what you will.
Jeremy
----------------------------------------------------------------------
>Comment By: Raymond Hettinger (rhettinger)
Date: 2003-06-28 01:29
Message:
Logged In: YES
user_id=80475
The trend is away from including character strings as
attributes -- they are instead being replaced with functions
like str.isascii() or str.iswhitespace().
Also, it is so easy to construct a character list that it is not
worth cluttering the string API (everything there must be
documented, duplicated for unicode objects, and duplicated
again for userstrings).
Instead for maketrans, I would use something like this:
rhchars = ''.join(map(chr, range(65,100)))
So, unless compelling use cases can be found, I recommend
closing this one.
----------------------------------------------------------------------
Comment By: Cherniavsky Beni (cben)
Date: 2003-03-10 11:22
Message:
Logged In: YES
user_id=36166
Just make the interface like the translate method of unicode
objects: it accepts "a mapping of Unicode ordinals to
Unicode ordinals, Unicode strings or None. Unmapped
characters are left untouched. Characters mapped to None are
deleted.".
This would make the str/unicode translate methods consistent
(currently there is no way to call the method that will work
for both).
I have no opinion on whether implementing 1-to-n
translations (like Python2.3 supports for unicode objects)
is worth the trouble for plain strings.
Of course, the table_string[, deletechars] interface should
still be supported for compatibility.
----------------------------------------------------------------------
Comment By: Jeremy Fincher (jemfinch)
Date: 2003-03-05 22:03
Message:
Logged In: YES
user_id=99508
Ah, yes, you're right. I never knew ASCII was only 7 bits per
character. Perhaps string.all_characters? I just definitely think
it should be publically available so there can be some
consistency between applications that need a string of all 256
8-bit characters.
----------------------------------------------------------------------
Comment By: Jack Jansen (jackjansen)
Date: 2003-03-05 09:13
Message:
Logged In: YES
user_id=45365
Note that "ascii" is definitely a bad name, as it means different things to anyone. To Python it usually means "7-bit ASCII" (as in the unicode "ascii" codec). To you it apparently means "8-bit something-or-other".
I have no opinion on whether this feature is a good idea, but if it is I would suggest a name with "any" or "all" in it, and possibly "8bit" too.
----------------------------------------------------------------------
Comment By: Jeremy Fincher (jemfinch)
Date: 2003-03-04 13:25
Message:
Logged In: YES
user_id=99508
What's the status on this?
I checked string.py, and there actually already is a value
that's the 256 ASCII characters, but it's called _idmap. Might
we consider changing the name of that to "ascii"? I'd be
happy to make the patch.
Jeremy
----------------------------------------------------------------------
Comment By: Jeremy Fincher (jemfinch)
Date: 2003-02-07 01:47
Message:
Logged In: YES
user_id=99508
I guess that's what I get for not reading the documentation :)
Oh well, the other two suggestions stand :)
Jeremy
----------------------------------------------------------------------
Comment By: Jeremy Fincher (jemfinch)
Date: 2003-02-07 00:19
Message:
Logged In: YES
user_id=99508
I guess that's what I get for not reading the documentation :)
Oh well, the other two suggestions stand :)
Jeremy
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-02-06 21:15
Message:
Logged In: YES
user_id=31435
Just noting that you can pass None for sep if you want to
explicitly ask for the default behavior.
>>> " a b c".split(None, 1)
['a', 'b c']
>>>
----------------------------------------------------------------------
Comment By: Jeremy Fincher (jemfinch)
Date: 2003-02-06 04:19
Message:
Logged In: YES
user_id=99508
Let me also make one other suggestion: the split method on a
string object has, by default, behavior that can't be replicated
by passing an argument as a separator. That is, the default
separated acts like re.split(r'\s+'), but it's impossible to pass
any value into the method to achieve that same result.
The problem arises when a user wants to use the maxsplit()
parameter to the method. Because maxsplit is a positional
parameter instead of a keyword parameter, the user *must*
declare a separate to split on, and thus loses his ability to split
on whitespace-in-general. If maxsplit was changed from being
a positional parameter to being a keyword parameter, then a
programmer wouldn't have to give up the default behavior of
the split method in order to pass it a maxsplit.
At present, negative maxsplit values don't differ in any way
from split's default behavior (with no maxsplit parameter given).
Thus, the keyword maxsplit could default to -1 with no
breakage of code. I can't see any place where changing
maxsplit to a keyword parameter would break any existing
code
Jeremy
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=681533&group_id=5470