Grouping function for string module?
Dinu C. Gherman
gherman at darwin.in-berlin.de
Tue Sep 14 05:27:08 EDT 1999
Hello all,
I started the following thread below with a simple request
for opinions about the usefulness of a string "grouping"
function in the string module.
Unfortunately, I did it on the wrong mailing list... so, now,
let's go public...
Regards,
Dinu
--
Dinu C. Gherman
................................................................
"An average of more than 15 % of adults in 12 industrialized
countries are functionally illiterate; in Ireland, the United
Kingdom and the United States, the rates are over 20 %."
(The State of the World's Children 1999,
UNICEF, http://www.unicef.org/sowc99)
------------------------------------------------------------
"Dinu C. Gherman" wrote:
>
> Hello,
>
> I wonder if a grouping function would be considered useful
> by more than a few people? I needed it already several times
> (and implemented it) and was always surprised it "wasn't
> there" as it seems quite a natural thing to me.
>
> I think of something like this, without giving my own current
> implementation (with a dual function for left-grouping, like
> with strip or an additional parameter for a function doing
> left and right grouping):
>
> def rgroup(str, size, sep=' '):
> """Group a string into blocks (starting at the right).
>
> e.g. rgroup('11000', 4) ==> '1 1000'
> """
>
> return str # dummy implementation...
>
> Any comments?
>
> Dinu
------------------------------------------------------------
Jeff Pinyan wrote:
>
> > Any comments?
>
> Coming from a perl background, it sounds vaguely like a functionality of
> the pack() and unpack() functions.
>
> Just as an exercise, I wrote up an rgroup(). Not a bad concept, Dinu.
------------------------------------------------------------
Greg Stein wrote:
>
> I don't see much utility other than for inserting commas/periods in a
> monetary value. The function seems overly specialized. Worse: the
> parameters would expand unreasonably; people will say "but I don't want
> a space" or "I want it to count from the other side" -- two more params
> all of a sudden. Or skip the latter param and bring it Yet Another
> Function.
>
> What other uses does this apply to? Why should this be part of the
> standard library? (other than for parity with Perl's function)
>
> Cheers,
> -g
------------------------------------------------------------
Tim Peters wrote:
>
> I don't know of any string language that offers this as a primitive, so am
> not surprised at its absence in Python. struct.unpack can easily be twisted
> toward this end, and is much more flexible in catering to an arbitrary mix
> of "field widths" too.
>
> import string
> import struct
>
> def rgroup(str, size, sep=' '):
> """Group a string into blocks (starting at the right).
>
> e.g. rgroup('11000', 4) ==> '1 1000'
> """
>
> whole, leftover = divmod(len(str), size)
> fmt = (`size` + "s") * whole
> if leftover:
> fmt = `leftover` + "s" + fmt
> return string.join(struct.unpack(fmt, str), sep)
>
> [...]
>
> naggingly y'rs - tim
------------------------------------------------------------
"Dinu C. Gherman" wrote:
>
> Greg Stein wrote:
> >
> > I don't see much utility other than for inserting commas/periods in a
> > monetary value. The function seems overly specialized. Worse: the
> > parameters would expand unreasonably; people will say "but I don't want
> > a space" or "I want it to count from the other side" -- two more params
> > all of a sudden. Or skip the latter param and bring it Yet Another
> > Function.
>
> I needed it for formatting binary numbers expressed as Python
> strings. And it has a rather clearly defined job, that of group-
> ing a string into blocks of a given size with an optional sepe-
> rator. Having lstrip, rstrip and strip could also be regarded as
> an inflation of functions, couldn't it? After all, rstrip and
> strip are just that (with lstrip given below):
>
> def rstrip(s):
> r = map(None, s)
> r.reverse()
> r = lstrip(r) # r being a list now!
> r.reverse()
> return string.join(r, '')
>
> def strip(s):
> return lstrip(rstrip(s))
>
> > What other uses does this apply to? Why should this be part of the
> > standard library? (other than for parity with Perl's function)
>
> Don't get me wrong I'm not imposing this idea on you, I just
> asked for opinions! Mine is that lgroup/rgroup would be as
> useful/convenient a pair in the standard string module as
> lstrip/rstrip.
>
> Ok, let me ask bluntly, what good applications are there for
> a function like lstrip? Why do we need it in a standard lib
> if you can write it down in a few lines like this (assuming
> string was imported before):
>
> def lstrip(s):
> if not s: return s
> r = s[:]
> while 1:
> if r[0] in string.whitespace: r = r[1:]
> else: return r
>
> Ok, let me give you another application, not to persuade
> you, but to be constructive. You could do things like this
> more easily:
>
> >>> from string import lgroup
> >>> s = '0001101001011100'
> >>> for b in split(lgroup(s, 4, ' '), ' '):
> ... print b
> ...
> 0001
> 1010
> 0101
> 1100
> >>>
>
> Then, although I don't care much about Perl, it seems that
> its users seem to use/need/appreciate... such a function,
> simply because it's there - which means nothing, ok, except
> that for the same reason they can also just ignore it.
>
> Regards,
>
> Dinu
------------------------------------------------------------
"Dinu C. Gherman" wrote:
>
> Tim Peters wrote:
> >
> > I don't know of any string language that offers this as a primitive, so am
> > not surprised at its absence in Python. struct.unpack can easily be twisted
> > toward this end, and is much more flexible in catering to an arbitrary mix
> > of "field widths" too.
>
> True, variable block sizes would add more flexibility, but
> then you get closer to a real parsing function, which would
> be at least some overkill for the string module, perhaps...
>
> [...]
>
> Dinu
------------------------------------------------------------
Michael Muller wrote:
>
> I've been following this thread and it seems to me that a more generalized
> solution to this problem would be to overload string.split() so that the
> second parameter could be an integer indicating the maximum width of each
> substring. So for example:
>
> string.split('12345678', 3) => ['123', '456', '78']
>
> Solving the problem identified here then becomes a 2-step process:
>
> string.join(string.split(someString, width), seperator)
------------------------------------------------------------
"M.-A. Lemburg" wrote:
>
> Michael Muller wrote:
> >
> > I've been following this thread and it seems to me that a more generalized
> > solution to this problem would be to overload string.split() so that the
> > second parameter could be an integer indicating the maximum width of each
> > substring. So for example:
> >
> > string.split('12345678', 3) => ['123', '456', '78']
> >
> > Solving the problem identified here then becomes a 2-step process:
> >
> > string.join(string.split(someString, width), seperator)
>
> That won't work since split() already has up to 3 arguments. How
> about adding two new functions to cut strings into even parts,
> e.g. cut and rcut:
>
> cut(string,snippet_length)
> Returns a list of substrings generated by splitting string
> at even intervals of the given length. The last entry
> may have less characters.
>
> rcut(...)
> Just like cut() except that it works from right to left.
>
> Would be a useful addition to produce human readable output
> or to format a long HEX string into mulitple lines. Anyway,
> I will probably have something like this in mxTextTools sooner or
> later...
>
> --
> Marc-Andre Lemburg
------------------------------------------------------------
Skip Montanaro wrote:
>
> Michael> I've been following this thread and it seems to me that a more
> Michael> generalized solution to this problem would be to overload
> Michael> string.split() so that the second parameter could be an integer
> Michael> indicating the maximum width of each substring. So for
> Michael> example:
>
> Michael> string.split('12345678', 3) => ['123', '456', '78']
>
> Michael> Solving the problem identified here then becomes a 2-step process:
>
> Michael> string.join(string.split(someString, width), seperator)
>
> string.split already takes two other option parameters (string to split on
> and max number of splits). Why not just use re.split:
>
> >>> l = re.split("(...)", string.lowercase)
> >>> map(l.remove, [""]*l.count(""))
> [None, None, None, None, None, None, None, None]
> >>> l
> ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz']
>
> ?
>
> Skip
------------------------------------------------------------
Skip Montanaro wrote:
>
> Actually, the more I think about it, why not encapsulate it in a function?
>
> import re
> def splitn(s, n):
> """split string s into chunks no more than n characters long"""
> l = re.split("(.{%d,%d})" % (n,n), s)
> map(l.remove, [""]*l.count(""))
> return l
>
> Skip
------------------------------------------------------------
Guido van Rossum wrote:
>
> > Actually, the more I think about it, why not encapsulate it in a function?
> >
> > import re
> > def splitn(s, n):
> > """split string s into chunks no more than n characters long"""
> > l = re.split("(.{%d,%d})" % (n,n), s)
> > map(l.remove, [""]*l.count(""))
> > return l
>
> Can we please stop this silly thread? This will never become a
> standard function as long as I am in charge.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
------------------------------------------------------------
Ken Manheimer wrote:
>
> I can't resist:
>
> > -----Original Message-----
> > From: Skip Montanaro [mailto:skip at mojam.com]
> > Sent: Thursday, September 09, 1999 11:07 AM
> > [...]
> > >>> l = re.split("(...)", string.lowercase)
> > >>> map(l.remove, [""]*l.count(""))
> > [None, None, None, None, None, None, None, None]
> > >>> l
> > ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz']
>
> Here's a case where filter() is your friend - instead of the map
> expression:
>
> >>> l = filter(None, l)
>
> or, composing it all:
>
> >>> l = filter(None, re.split("(...)", string.lowercase))
>
> (I only mention this because it seems like the filter expr is a lot less
> complicated than the map you constructed. I have no feel for the
> performance implications of this construct w.r.t., eg, an explicit loop,
> though - nor enough concern to investigate, i must say:-)
>
> Ken
------------------------------------------------------------
Perry Stoll wrote:
>
> I've been following this thread and it seems to me that a more generalized
> solution to this problem would be to overload string.split() so that the
> second parameter could be an integer indicating the maximum width of each
> substring. So for example:
>
> string.split('12345678', 3) => ['123', '456', '78']
>
> More generalized? How about something that operate on sequence objects,
> something like the following.
>
> -Perry
>
> def regroup(seq, size):
> """Return a list of sequences of length SIZE which are
> subsequences of input IN.
>
> LIST out = regroup( SEQUENCE in , INT size)
>
> So, for i < len(IN) % size:
>
> out[i] = in[ (i * size) : (i + 1) * size ]
>
> Note, OUT[-1] will be shorter than SIZE iff len(IN) % size != 0
>
> >>> regroup('Pythonistas Unite!', 2)
> ['Py', 'th', 'on', 'is', 'ta', 's ', 'Un', 'it', 'e!']
>
> >>> regroup(range(1,10), 2)
> [[1, 2], [3, 4], [5, 6], [7, 8], [9]]
>
> """
> out = []
> outlen = len(seq) / size
> save = out.append # shortcut function lookup in loop
> end = 0 # initialize for first loop
> for i in range(outlen):
> start = end # start where we left off last time
> end = end + size # compute new end point
> save( seq[ start : end ] ) # save the sub sequence
> if len(seq) % size != 0: # if there is any remaining
> save( seq[ end : ] ) # grab everything remaining
> return out
------------------------------------------------------------
Skip Montanaro wrote:
>
> Skip> Actually, the more I think about it, why not encapsulate it in a
> Skip> function?
> ...
> Guido> Can we please stop this silly thread? This will never become a
> Guido> standard function as long as I am in charge.
>
> I don't recall proposing it as a standard function. Others proposed
> extending string.split. I merely pointed out that re.split+map (or
> re.split+filter as Ken explained) already does what people asked for.
> Encapsulating it as a function instead of always having two somewhat
> mystical lines of code seemed to make sense.
>
> [...]
>
> Skip
------------------------------------------------------------
"Barry A. Warsaw" wrote:
>
> >>>>> "Guido" == Guido van Rossum <guido at cnri.reston.va.us> writes:
>
> Guido> This will never become a standard function as long as I am
> Guido> in charge.
>
> You should add a "Mr. Bond" and maniacal laugh when you say that. :)
>
> -Barry
------------------------------------------------------------
"Dinu C. Gherman" wrote:
>
> Guido van Rossum wrote:
> >
> > Can we please stop this silly thread? This will never become a
> > standard function as long as I am in charge.
>
> Bondish or not ;-), I agree this is just about the wrong place to
> discuss, so please do ME a favour and stop contributing to this
> thread, be it silly or not!
>
> I take all the blame on me for starting it here. Yes, I'm guilty,
> mea culpa, mea maxima culpa! It will never happen again! But a
> good overview of all Python mailing lists and their charters
> would, perhaps, be indeed an idea, if it's not already there.
>
> As some sort of self-punishment I will wrap-up the contributions
> so far and post them to c.l.p., so we can all piecefully shake
> and enjoy our Martinis again...
>
> Cheers,
>
> Dinu
More information about the Python-list
mailing list