Mailman 3 Iterable String Redux (aka String ABC) - Python-Dev

newer
Re: [Python-Dev] Finishing up PEP...

Iterable String Redux (aka String ABC)

Armin Ronacher

May 27, 2008

2:32 p.m.

Hi, Strings are currently iterable and it was stated multiple times that this is a good idea and shouldn't change. While I still don't think that that's a good idea I would like to propose a solution for the problem many people are experiencing by introducing an abstract base class for strings. Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings. Imagine it's implemented in a way similar to that:: def flatten(iterable): for item in iterable: try: if isinstance(item, basestring): raise TypeError() iterator = iter(item) except TypeError: yield item else: for i in flatten(iterator): yield i A problem comes up as soon as user defined strings (such as UserString) is passed to the function. In my opinion a good solution would be a "String" ABC one could test against. Regards, Armin

Show replies by date

Guido van Rossum

May 2008

2:42 p.m.

[+python-3000] On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher <armin.ronacher@active-4.com> wrote:

...

Strings are currently iterable and it was stated multiple times that this is a good idea and shouldn't change. While I still don't think that that's a good idea I would like to propose a solution for the problem many people are experiencing by introducing an abstract base class for strings.

Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings. Imagine it's implemented in a way similar to that::

def flatten(iterable): for item in iterable: try: if isinstance(item, basestring): raise TypeError() iterator = iter(item) except TypeError: yield item else: for i in flatten(iterator): yield i

A problem comes up as soon as user defined strings (such as UserString) is passed to the function. In my opinion a good solution would be a "String" ABC one could test against.

I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with. Another problem is that not everybody draws the line in the same place -- how should instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) be treated? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Benji York

3:09 p.m.

On Tue, May 27, 2008 at 3:42 PM, Guido van Rossum <guido@python.org> wrote:

...

[+python-3000]

On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher <armin.ronacher@active-4.com> wrote:

...
Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings. Imagine it's implemented in a way similar to that::

I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with. Another problem is that not everybody draws the line in the same place -- how should instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) be treated?

Maybe the opposite approach would be more fruitful. Flattening is about removing nested "containers", so perhaps there should be an ABC that things like lists and tuples provide, but strings don't. No idea what that might be. -- Benji York

Jim Jewett

5:40 p.m.

New subject: [Python-3000] Iterable String Redux (aka String ABC)

On 5/27/08, Benji York wrote:

...

Guido van Rossum wrote:

...
Armin Ronacher wrote:

...

...
...
Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings.

...

...
I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with. Another problem is that not everybody draws the line in the same place -- how should instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) be treated?

...

Maybe the opposite approach would be more fruitful. Flattening is about removing nested "containers", so perhaps there should be an ABC that things like lists and tuples provide, but strings don't. No idea what that might be.

It isn't really stringiness that matters, it is that you have to terminate even though you still have an iterable container. The test is roughly (1==len(v) and v[0]==v), except that you want to stop a layer sooner. Guido had at least a start in Searchable, back when ABC were still in the sandbox: http://svn.python.org/view/sandbox/trunk/abc/abc.py?rev=55321&view=auto Searchable represented the fact that (x in c) =/=> (x in iter(c)) because of sequence searches like ("Error" in results) -jJ

Raymond Hettinger

5:54 p.m.

New subject: [Python-3000] Iterable String Redux (aka StringABC)

"Jim Jewett"

...

It isn't really stringiness that matters, it is that you have to terminate even though you still have an iterable container.

Well said.

...

Guido had at least a start in Searchable, back when ABC were still in the sandbox:

Have to disagree here. An object cannot know in general whether a flattener wants to split it or not. That is an application dependent decision. A better answer is be able to tell the flattener what should be considered atomic in a given circumstance. Raymond

Jamie Gennis

4:13 a.m.

New subject: [Python-3000] Iterable String Redux (aka StringABC)

Perhaps drawing a distinction between containers (or maybe "collections"?), and non-container iterables is appropriate? I would define containers as objects that can be iterated over multiple times and for which iteration does not instantiate new objects. By this definition generators would not be considered containers (but views would), and for practicality it may be worth also having an ABC for containers-and-generators (no idea what to name it). This would result in the following hierarchy: iterables - strings, bytes, etc. - containers-and-generators - - containers - - - tuple, list, set, dict views, etc. - - generators I don't think there needs to be different operations defined for the different ABCs. They're all just iterables with different iteration semantics. Jamie On Tue, May 27, 2008 at 3:54 PM, Raymond Hettinger <python@rcn.com> wrote:

...

"Jim Jewett"

...
It isn't really stringiness that matters, it is that you have to terminate even though you still have an iterable container.

Well said.

Guido had at least a start in Searchable, back when ABC

...
were still in the sandbox:

Have to disagree here. An object cannot know in general whether a flattener wants to split it or not. That is an application dependent decision. A better answer is be able to tell the flattener what should be considered atomic in a given circumstance.

Raymond

_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/jgennis%40gmail.com

Ron Adam

10:50 a.m.

New subject: Iterable String Redux (aka StringABC)

Raymond Hettinger wrote:

...

"Jim Jewett"

...
It isn't really stringiness that matters, it is that you have to terminate even though you still have an iterable container.

Well said.

...
Guido had at least a start in Searchable, back when ABC were still in the sandbox:

Have to disagree here. An object cannot know in general whether a flattener wants to split it or not. That is an application dependent decision. A better answer is be able to tell the flattener what should be considered atomic in a given circumstance.

Raymond

A while back (a couple of years I think), we had a discussion on python-list about flatten in which I posted the following version of a flatten function. It turned out to be nearly twice as fast as any other version. def flatten(L): """ Flatten a list in place. """ i = 0 while i < len(L): while type(L[i]) is list: L[i:i+1] = L[i] i += 1 return L For this to work the object to be flattened needs to be both mutable and list like. At the moment I can't think of any reason I would want to flatten anything that was not list like. To make it a bit more flexible it could be changed just a bit. def flatten(L): """ Flatten a list in place. """ objtype = type(L) i = 0 while i < len(L): while type(L[i]) is objtype: L[i:i+1] = L[i] i += 1 return L Generally, I don't think you would want to flatten dissimilar objects. Cheers, Ron

Antoine Pitrou

3:18 p.m.

(just my 2 eurocents) Guido van Rossum <guido <at> python.org> writes:

...

I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with.

If we stay minimalistic we could consider that the three basic operations that define a string are: - testing for substring containment - splitting on a substring into a list of substrings - slicing in order to extract a substring Which gives us ['__contains__', 'split', '__getitem__'], and expands intuitively to ['__contains__', 'find', 'index', 'split', 'rsplit', '__getitem__'].

...

Another problem is that not everybody draws the line in the same place -- how should instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) be treated?

In the followup of the flatten() example, bytes and bytearray should be Strings, but array.array and memoryview shouldn't. array.array is really a different kind of container rather than a proper string, and as for memoryview... well, since it's not documented I don't know what it's supposed to do :-) Regards Antoine.

Georg Brandl

3:44 p.m.

Antoine Pitrou schrieb:

...

(just my 2 eurocents)

Guido van Rossum <guido <at> python.org> writes:

...
I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with.

If we stay minimalistic we could consider that the three basic operations that define a string are: - testing for substring containment - splitting on a substring into a list of substrings - slicing in order to extract a substring

...

Which gives us ['__contains__', 'split', '__getitem__'], and expands intuitively to ['__contains__', 'find', 'index', 'split', 'rsplit', '__getitem__'].

I'd argue that "find" is more primitive than "split" -- split is intuitively implemented using find and slicing, but implementing find using split and len is unintuitive. (Of course, "index" can be used instead of "find".)

...

...
Another problem is that not everybody draws the line in the same place -- how should instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) be treated?

In the followup of the flatten() example, bytes and bytearray should be Strings, but array.array and memoryview shouldn't. array.array is really a different kind of container rather than a proper string, and as for memoryview... well, since it's not documented I don't know what it's supposed to do :-)

This is really a problem -- since the PEP 3118 authors don't seem to bother, I'll have to write up something based on the PEP, but I don't know if it is still up-to-date. Georg

Antoine Pitrou

4:15 p.m.

Georg Brandl <g.brandl <at> gmx.net> writes:

...

I'd argue that "find" is more primitive than "split" -- split is intuitively implemented using find and slicing, but implementing find using split and len is unintuitive. (Of course, "index" can be used instead of "find".)

I meant semantically primitive. I think the difference between a String and a plain Sequence is that, in a String, the existence and relative position of substrings has a meaning. This is true for character strings but it can also be true for other kinds of strings (think genome strings, they are usually represented using ASCII letters but it's out of convenience - they could be made of opaque objects instead). That's why, in string classes, you have methods like split() to deal with the processing of substrings - which you do not have on lists, not that's it more difficult to implement or algorithmically less efficient, but because it makes no point. Well I hope it makes at least a bit of sense :-) Regards Antoine.

Georg Brandl

4:31 p.m.

Antoine Pitrou schrieb:

...

Georg Brandl <g.brandl <at> gmx.net> writes:

...
I'd argue that "find" is more primitive than "split" -- split is intuitively implemented using find and slicing, but implementing find using split and len is unintuitive. (Of course, "index" can be used instead of "find".)

I meant semantically primitive. I think the difference between a String and a plain Sequence is that, in a String, the existence and relative position of substrings has a meaning. This is true for character strings but it can also be true for other kinds of strings (think genome strings, they are usually represented using ASCII letters but it's out of convenience - they could be made of opaque objects instead).

That's why, in string classes, you have methods like split() to deal with the processing of substrings - which you do not have on lists, not that's it more difficult to implement or algorithmically less efficient, but because it makes no point.

Well I hope it makes at least a bit of sense :-)

It does, but I don't see how it contradicts my proposition. find() takes a substring as well. Georg

Antoine Pitrou

4:35 p.m.

Georg Brandl <g.brandl <at> gmx.net> writes:

...

It does, but I don't see how it contradicts my proposition. find() takes a substring as well.

Well, I'm not sure what your proposal was :-) Did you mean to keep split() out of the String interface, or to provide a default implementation of it based on find() and slicing?

Georg Brandl

4:44 p.m.

Antoine Pitrou schrieb:

...

Georg Brandl <g.brandl <at> gmx.net> writes:

...
It does, but I don't see how it contradicts my proposition. find() takes a substring as well.

Well, I'm not sure what your proposal was :-) Did you mean to keep split() out of the String interface, or to provide a default implementation of it based on find() and slicing?

You wrote:

...

If we stay minimalistic we could consider that the three basic operations that define a string are: - testing for substring containment - splitting on a substring into a list of substrings - slicing in order to extract a substring

I argued that instead of split, find belongs into that list. (BTW, length inquiry would be a fourth.) That the other methods, among them split, can be implemented in terms of those, follows from both sets of basic operations. Georg

Antoine Pitrou

4:59 p.m.

Georg Brandl <g.brandl <at> gmx.net> writes:

...

You wrote:

...
If we stay minimalistic we could consider that the three basic operations that define a string are: - testing for substring containment - splitting on a substring into a list of substrings - slicing in order to extract a substring

I argued that instead of split, find belongs into that list. (BTW, length inquiry would be a fourth.)

Well, find() does test for substring containment, so in essence it is in that list, although in my first post I chose '__contains__' as the canonical representative of substring containment :-) And, you are right, length inquiry belongs into it too.

...

That the other methods, among them split, can be implemented in terms of those, follows from both sets of basic operations.

When I wrote "the three basic operations that define a string", perhaps I should have written "the three essential operations" instead. I was not attempting to give implementation guidelines but to propose a semantic definition of what constitutes a string and distinguishes it from other kinds of objects. Anyway, I think we are picking on words here. Do we agree on the following basic String interface : ['__len__', '__contains__', '__getitem__', 'find', 'index', 'split', 'rsplit']? cheers Antoine.

Armin Ronacher

4:19 p.m.

Hi, Georg Brandl <g.brandl <at> gmx.net> writes:

...

I'd argue that "find" is more primitive than "split" -- split is intuitively implemented using find and slicing, but implementing find using split and len is unintuitive. (Of course, "index" can be used instead of "find".) It surely is, but it would probably make sense to require both. Maybe have something like this:

class SymbolSequence(Sequence) class String(SymbolSequence) String would be the base of str/unicode and CharacterSequence of str/bytes. A SymbolSequence is basically a sequence based on one type of symbols that implements slicing, getting symbols by index, count() and index(). A String is basically everything a str/unicode provides as method except of those which depend on informatio based on the symbol. For example upper() / isupper() etc would go. Additionally I guess it makes sense to get rid of encode() / decode() / format(). Regards, Armin

Raymond Hettinger

4:17 p.m.

...

...
...
I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with.

If we stay minimalistic we could consider that the three basic operations that define a string are: - testing for substring containment - splitting on a substring into a list of substrings - slicing in order to extract a substring

...
Which gives us ['__contains__', 'split', '__getitem__'], and expands intuitively to ['__contains__', 'find', 'index', 'split', 'rsplit', '__getitem__'].

With the Sequence ABC, you already get index, contains, __len__, count, __iter__, and __getitem__. What's the benefit of an additional ABC with just three more methods? What can be learned from any known use cases for abstract strings (iirc, idlelib has an interesting example of subclassing UserString). Is there anything about this proposal that is intrinsically texty? In the 3.0 world, text is an abstract sequence of code points. Would you want to require an encode() method so there will always be a way to make it concrete? The split()/rsplit() methods have a complex specification. Including them may make it hard to write a compliant class.

...

From what's been discussed so far, I don't see any advantage of isinstance(o, String) over hasattr(o, 'encode') or somesuch.

Raymond

Bill Janssen

4:33 p.m.

...

...
...
...
I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with.

If we stay minimalistic we could consider that the three basic operations that define a string are: - testing for substring containment - splitting on a substring into a list of substrings - slicing in order to extract a substring

...
Which gives us ['__contains__', 'split', '__getitem__'], and expands intuitively to ['__contains__', 'find', 'index', 'split', 'rsplit', '__getitem__'].

With the Sequence ABC, you already get index, contains, __len__, count, __iter__, and __getitem__. What's the benefit of an additional ABC with just three more methods? What can be learned from any known use cases for abstract strings (iirc, idlelib has an interesting example of subclassing UserString). Is there anything about this proposal that is intrinsically texty?

In the 3.0 world, text is an abstract sequence of code points. Would you want to require an encode() method so there will always be a way to make it concrete?

I would.

...

The split()/rsplit() methods have a complex specification. Including them may make it hard to write a compliant class.

...

...
From what's been discussed so far, I don't see any advantage of isinstance(o, String) over hasattr(o, 'encode') or somesuch.

Look, even if there were *no* additional methods, it's worth adding the base class, just to differentiate the class from the Sequence, as a marker, so that those of us who want to ask "isinstance(o, String)" can do so. Personally, I'd add in all the string methods to that class, in all their gory complexity. Those who need a compliant class should subclass the String base class, and override/add what they need. Bill

Mike Klaas

6:01 p.m.

On 28-May-08, at 2:33 PM, Bill Janssen wrote:

...

...
...
From what's been discussed so far, I don't see any advantage of isinstance(o, String) over hasattr(o, 'encode') or somesuch.

Look, even if there were *no* additional methods, it's worth adding the base class, just to differentiate the class from the Sequence, as a marker, so that those of us who want to ask "isinstance(o, String)" can do so.

Personally, I'd add in all the string methods to that class, in all their gory complexity. Those who need a compliant class should subclass the String base class, and override/add what they need.

I'm not sure I agree with you on the solution, but I definitely agree that although str/unicode are conceptually sequences of characters, it is rarely useful to think of them as iterables of objects, unlike all other Sequences. (Note: I don't dispute that it is occasionally useful to treat them as such.) In my perfect world, strings would be indicable and sliceable, but not iterable. A character iterator could be obtained using a new method, such as .chars(). s = 'hello world' list(s) # exception set(s) # exception tuple(s) # exception for char in s: # exception [ord(c) for c in s] # exception s[2] # ok s[::-1] # ok for char in s.chars(): # ok [ord(c) for c in s.chars()] # ok Though an argument could be made against this, I consider the current behaviour of strings one of the few instances where purity beats practicality in python. It is often the cause of errors that fail too late in my experience. -Mike

Greg Ewing

7:44 p.m.

Mike Klaas wrote:

...

In my perfect world, strings would be indicable and sliceable, but not iterable.

An object that was indexable but not iterable would be a very strange thing. If it has __len__ and __getitem__, there's nothing to stop you iterating over it by hand anyway, so disallowing __iter__ would just seem perverse. A case could be made for strings being sliceable but neither indexable nor iterable, but it's probably too late to make such a radical change now. -- Greg

Mike Klaas

9:59 p.m.

On 28-May-08, at 5:44 PM, Greg Ewing wrote:

...

Mike Klaas wrote:

...
In my perfect world, strings would be indicable and sliceable, but not iterable.

An object that was indexable but not iterable would be a very strange thing. If it has __len__ and __getitem__, there's nothing to stop you iterating over it by hand anyway, so disallowing __iter__ would just seem perverse.

Python has a beautiful abstraction in iteration: iter() is a generic function that allows you lazily consume a sequence of objects, whether it be lists, tuples, custom iterators, generators, or what have you. It is trivial to write your code to be agnostic to the type of iterable passed-in. Almost anything else a consumer of your code passes in will result in an immediate exception. Unfortunately, python has two extremely common data types which do not fail when this generic function is applied to them, and instead almost always returns a result which is not desired. Instead, it iterates over the characters of the string, a behaviour which is rarely needed in practice due to the wealth of methods available. I agree that it would be perverse to disallowing iterating over a string. I just wish that the way to do that wasn't glommed on to the object-iteration abstraction. As it stands, any consumer of iterables has to keep strings in mind. It is particularly irksome when the target input is an iterable of strings. I recall a function that accepts a list/iterable of item keys, hashes them, and then retrieves values based on the item hashes (usually over the network, so it is necessary to batch requests). This function is often used in the interactive interpreter, and it is thus very prone to being passed-in a string rather than a list. There was no good way to prevent the (frequent) mysterious "not found" errors save adding an explicit type check for basestring. String already behaves slightly differently from the way other sequences act: It is the only sequence for which 'seq in seq' is true, and the only sequence for which 'x in seq' can be true but 'any(x==item for item in seq)' is false. Abstractions are sometimes imperfect: this is why there is an explicit typecheck for strings in the sum() builtin. I'll stop here as I realize that the likelihood that this will be accepted is terribly small, especially considering the late stage of the process. But I would be willing to develop a patch that implements this behaviour on the off chance it is. -Mike

Greg Ewing

5:30 p.m.

Mike Klaas wrote:

...

I agree that it would be perverse to disallowing iterating over a string.

Just to be clear, I'm saying that it would be perverse to disallow iterating *but* to allow indexing of individual characters. Either you should have both or you should have neither. -- Greg

Raymond Hettinger

5:23 p.m.

[Armin Ronacher]

...

Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings.

Stated more generally: The problematic situation is that flatten() implementations typically need some way to decide what kinds of objects are atomic. Different apps draw the line in different places (chars, words, paragraphs, blobs, files, directories, xml elements with attributes, xml bodies, csv records, csv fields, etc.).

...

A problem comes up as soon as user defined strings (such as UserString) is passed to the function. In my opinion a good solution would be a "String" ABC one could test against.

Conceptually, this is a fine idea, but three things bug me. First, there is a mismatch between the significance of the problem being addressed versus the weight of the solution. The tiny "problem" is a sense that the simplest version of a flatten recipe isn't perfectly general. The "solution" is to introduce yet another ABC, require adherence to the huge string API and require that everything that purports to be a string register itself. IMO, that is trying to kill a mosquito with a cannon. Second, this seems like the wrong solution to the problem as it places the responsibility in the wrong place and thereby hardwires its notion of what kind of objects should be split. A flatten() implementation doesn't really care about whether an input is a string which supports all the string-like methods such as capitalize(). Wouldn't it be better to write your version of flatten() with a registration function so that a user could specify which objects are atomic? Otherwise, you will have to continually re-edit your flatten() code as you run across other non-stringlike objects that also need to be treated as atomic. Third, I thought ABCs were introduced as an optional feature to support large apps that needed both polymorphic object flexibility and rigorous API matching. Now, it seems that even the tiniest recipe is going to expose its internals and insist on objects being registered as one of several supported abstract types. I suppose this is better than insisting on one of several concrete types, but it still smells like an anti-pattern. Raymond

Steven D'Aprano

7:07 p.m.

(If you receive this twice, please excuse the duplicate email. User-error on my part, sorry.) On Wed, 28 May 2008 08:23:38 am Raymond Hettinger wrote:

...

A flatten() implementation doesn't really care about whether an input is a string which supports all the string-like methods such as capitalize(). Wouldn't it be better to write your version of flatten() with a registration function so that a user could specify which objects are atomic? Otherwise, you will have to continually re-edit your flatten() code as you run across other non-stringlike objects that also need to be treated as atomic.

Just throwing a suggestion out there... def atomic(obj, _atomic=(basestring,)): try: return bool(obj.__atomic__) except AttributeError: if isinstance(obj, _atomic): return True else: try: iter(obj) except TypeError: return True return False assert atomic("abc") assert not atomic(['a', 'b', 'c']) If built-in objects grew an __atomic__ attribute, you could simplify the atomic() function greatly: def atomic(obj): return bool(obj.__atomic__) However atomic() is defined, now flatten() is easy: def flatten(obj): if atomic(obj): yield obj else: for item in obj: for i in flatten(item): yield i If you needed more control, you could customise it using standard techniques e.g. shadow the atomic() function with your own version, sub-class the types you wish to treat differently, make __atomic__ a computed property instead of a simple attribute, etc. Re-writing the above to match Python 3 is left as an exercise. -- Steven

Terry Reedy

10:15 p.m.

"Steven D'Aprano" <steve@pearwood.info> wrote in message news:200805281007.59902.steve@pearwood.info... Just throwing a suggestion out there... def atomic(obj, _atomic=(basestring,)): try: return bool(obj.__atomic__) except AttributeError: if isinstance(obj, _atomic): return True else: try: iter(obj) except TypeError: return True return False assert atomic("abc") assert not atomic(['a', 'b', 'c']) If built-in objects grew an __atomic__ attribute, you could simplify the atomic() function greatly: def atomic(obj): return bool(obj.__atomic__) However atomic() is defined, now flatten() is easy: def flatten(obj): if atomic(obj): yield obj else: for item in obj: for i in flatten(item): yield i If you needed more control, you could customise it using standard techniques e.g. shadow the atomic() function with your own version, sub-class the types you wish to treat differently, make __atomic__ a computed property instead of a simple attribute, etc. ================== This is a lot of work to avoid being explicit about either atomic or non-atomic classes on an site, package, module, or call basis ;-)

Paul Moore

7:17 a.m.

On 27/05/2008, Raymond Hettinger <python@rcn.com> wrote:

...

Conceptually, this is a fine idea, but three things bug me.

First, there is a mismatch between the significance of the problem being addressed versus the weight of the solution.

Agreed, absolutely.

...

Second, this seems like the wrong solution to the problem as it places the responsibility in the wrong place and thereby hardwires its notion of what kind of objects should be split.

Again, agreed. The flatten function is one of the canonical examples of the visitor patterns. I see no generalisation of this proposal to other visitor patterns. I'd rather see a solution which addressed the wider visitor use case (I think I just sprained my back bending over backwards to avoid mentioning generic functions :-))

...

Third, I thought ABCs were introduced as an optional feature [...]

Again, I agree absolutely. Paul.

Greg Ewing

7:17 p.m.

New subject: A thought on generic functions

Paul Moore wrote:

...

I'd rather see a solution which addressed the wider visitor use case (I think I just sprained my back bending over backwards to avoid mentioning generic functions :-))

Speaking of generic functions, while thinking about the recent discussion on proxy objects, it occurred to me that this is something you can do with an OO system that you can't do so easily with a generic function system. If the operations being proxied were generic functions rather than methods, you'd have to override them all individually instead of having a central point to catch them all. -- Greg

Nick Coghlan

5:03 a.m.

New subject: A thought on generic functions

Greg Ewing wrote:

...

Paul Moore wrote:

...
I'd rather see a solution which addressed the wider visitor use case (I think I just sprained my back bending over backwards to avoid mentioning generic functions :-))

Speaking of generic functions, while thinking about the recent discussion on proxy objects, it occurred to me that this is something you can do with an OO system that you can't do so easily with a generic function system. If the operations being proxied were generic functions rather than methods, you'd have to override them all individually instead of having a central point to catch them all.

I don't think it would actually be that much worse - something like typetools.ProxyMixin would just involve a whole series of register calls instead of method definitions. I wouldn't expect the total amount of code involved to change much. That said, a recursive flatten() implementation is indeed a problem that generic functions are well suited to solving - have the default implementation attempt to iterate over the passed in object yielding its contents, yielding the object itself only if iteration fails, and then, for the types the application wishes to consider atomic, register an alternative implementation that just yields the object without attempting to iterate over it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Greg Ewing

5:57 p.m.

New subject: A thought on generic functions

Nick Coghlan wrote:

...

I don't think it would actually be that much worse - something like typetools.ProxyMixin would just involve a whole series of register calls instead of method definitions. I wouldn't expect the total amount of code involved to change much.

I'm not thinking about the __xxx__ methods, they're an aberration. I'm thinking about all the user-defined methods and attributes that get caught in one go by the __getattr__ method of the proxy.

...

That said, a recursive flatten() implementation is indeed a problem that generic functions are well suited to solving

Yes, I agree with that. It was just something I thought of that shows that generic functions and OO are not quite equivalent in general. -- Greg

Nick Coghlan

6:23 p.m.

New subject: A thought on generic functions

Greg Ewing wrote:

...

Nick Coghlan wrote:

...
I don't think it would actually be that much worse - something like typetools.ProxyMixin would just involve a whole series of register calls instead of method definitions. I wouldn't expect the total amount of code involved to change much.

I'm not thinking about the __xxx__ methods, they're an aberration. I'm thinking about all the user-defined methods and attributes that get caught in one go by the __getattr__ method of the proxy.

Ah, I see what you mean. That's where the generic system itself needs to be based on generic functions - then you can hook the lookup function so that proxies get looked up based on their target type rather than the fact they're a proxy. It all gets very brain bending and self referential, which is when folks tend to throw generics in the 'too complicated' basket ;) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Greg Ewing

6:34 p.m.

New subject: A thought on generic functions

Nick Coghlan wrote:

...

That's where the generic system itself needs to be based on generic functions - then you can hook the lookup function so that proxies get looked up based on their target type rather than the fact they're a proxy. It all gets very brain bending and self referential, which is when folks tend to throw generics in the 'too complicated' basket ;)

Yep. :-) Also, when hooking into things at such a deep level, I'd be a bit worried that I was catching too *much*, or that I was imposing performance penalties on unrelated things. At least with __getattr__ you know what you're doing can *only* affect the proxy object and nothing else. -- Greg

Neil Toronto

7:23 p.m.

New subject: A thought on generic functions

Greg Ewing wrote:

...

Paul Moore wrote:

...
I'd rather see a solution which addressed the wider visitor use case (I think I just sprained my back bending over backwards to avoid mentioning generic functions :-))

Speaking of generic functions, while thinking about the recent discussion on proxy objects, it occurred to me that this is something you can do with an OO system that you can't do so easily with a generic function system. If the operations being proxied were generic functions rather than methods, you'd have to override them all individually instead of having a central point to catch them all.

It depends on your dispatch rules. Say the implementation orders the candidates lexically (like default CLOS). This is equivalent to choosing as first candidates the set of functions with the most specific first argument. Resolution for a generic function call and generic method call are semantically the same, so there's no reason not to have the latter, and proxying by __getattr__ tricks becomes doable again. Neil

Terry Reedy

5:58 p.m.

"Armin Ronacher" <armin.ronacher@active-4.com> wrote in message news:loom.20080527T192243-415@post.gmane.org... | Basically *the* problematic situation with iterable strings is something like | a `flatten` function that flattens out every iterable object except of strings. In most real cases I can imagine, this is way too broad. For instance, trying to 'flatten' an infinite iterable makes the flatten output one also. Flattening a set imposes an arbitrary order (but that is ok if one feeds the output to set(), which de-orders it). Flattening a dict decouples keys and values. Flattening iterable set-theoretic numbers (0={}, n = {n-1, {n-1}}, or something like that) would literaly yield nothing. | Imagine it's implemented in a way similar to that:: | | def flatten(iterable): | for item in iterable: | try: | if isinstance(item, basestring): | raise TypeError() | iterator = iter(item) | except TypeError: | yield item | else: | for i in flatten(iterator): | yield i I can more easily imagine wanting to flatten only certain classes, such and tuples and lists, or frozensets and sets. def flatten(iterable, classes): for item in iterable: if type(item) in classes: for i in flatten(item, classes): yield i else: yield item | A problem comes up as soon as user defined strings (such as UserString) is | passed to the function. In my opinion a good solution would be a "String" | ABC one could test against. This might be a good idea regardless of my comments. tjr

Boris Borcic

6:18 a.m.

Armin Ronacher wrote:

...

Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings.

To flesh out the span of your "something like", recently I had a WSGI-based app that to some request mistakenly returned a 200K string instead of the same wrapped as a 1-element list; and the WSGI layer -according to spec- served it back character by character. Which "worked" - and durably confused not only me but IIS and a network router as well. While blame can certainly be assigned elsewhere - WSGI spec or implementation (wsgiref included) - unwelcome iterability of strings was a necessary cause. Cheers, BB

6130

Age (days ago)

6134

Last active (days ago)

List overview

Download

32 comments

18 participants

participants (18)

Antoine Pitrou
Armin Ronacher
Benji York
Bill Janssen
Boris Borcic
Georg Brandl
Greg Ewing
Guido van Rossum
Jamie Gennis
Jim Jewett
Mike Klaas
Neil Toronto
Nick Coghlan
Paul Moore
Raymond Hettinger
Ron Adam
Steven D'Aprano
Terry Reedy

Iterable String Redux (aka String ABC)

Jamie Gennis

Ron Adam

Mike Klaas

Mike Klaas

Neil Toronto

tags

participants (18)