
Hello, During the last 10 years, Python has made steady progress in convenience to assemble strings. However, it seems to me that joining is still, when possible, the cleanest way to code string assembly. However, I'm still sometimes confused between the different syntaxes used by join methods: 0. os.path.join takes *args 1. str.join takes a list argument, this inconsistence make it easy to mistake with the os.path.join signature Also, I still think that: '_'.join(['cancel', name]) Would be more readable as such: ['cancel', name].join('_') Not only this would fix both of my issues with the current status-quo, but this would also be completely backward compatible, and probably not very hard to implement: just add a join method to list. Thanks in advance for your reply Have a great day -- ∞

PS: sorry for my silly example, i know that example could also be written f'cancel_{name}', which is awesome, thank you for that ! But for more complex strings I'm trying to avoid: def foo(): return textwrap.dedent(f''' some {more(complex)} {st.ri("ng")} ''').strip() For some reason, I prefer: def foo(): return '\n'.join(['some', more(complex), st.ri('ng')]) But that would be even more readable (less nesting of statements): def foo(): return ['some', more(complex), st.ri('ng')].join('\n') Hope this makes sense Have a great day

On Mon, Jan 28, 2019 at 8:44 PM Jamesie Pic <jpic@yourlabs.org> wrote:
['cancel', name].join('_')
This is a frequent suggestion. It is also one that makes no sense whatsoever if you think about Python's semantics. What would you expect to happen with this line: ['foo', b'foo', 37, re.compile('foo')].join('_') List are not restricted to containing only strings (or things that are string-like enough that they might play well with joining). Growing a method that pertains only to that specialized sort of list breaks the mental model of Python. Moreover, there is no way to TELL if a particular list is a "list of strings" other than checking each item inside it (unlike in many languages). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2019-01-28 18:22, David Mertz wrote:
That problem already exists with str.join though. It's just currently spelled this way: ','.join(['foo', b'foo', 37, re.compile('foo')]) . . . and the result is an error. I don't see how it's semantically any less sensible to call list.join on a list of non-string things than it is to pass a list of non-string things to str.join. Personally what I find is perverse is that .join is a method of strings but does NOT call str() on the items to be joined. The cases where I would have been surprised or bitten by something accidentally being converted to a string are massively outweighed by the cases where I want everything to be converted into a string, because, dangit, I'm joining them into a bigger string. I agree that a list method would be nice, but we then have to think about should we add similar methods to all iterable types, since str.join can take any iterable (not just a list). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

If there is a more Pythonic way of joining lists, tuples, sets, etc., it is by using a keyword and not a method. For example, using a keyword, say *joins*: '-' joins ['list', 'of', 'strings']
This is more readable than using the method join() since you can read this as "dash joins a list of strings". Although, the current method of joining lists is almost similar to this, the current method is somewhat "confusing" for beginners or for people who came from other languages. BTW, this is just what comes in my mind and not supported by Python. On Tue, Jan 29, 2019 at 1:22 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:

One could always write str.join('_', ['list', 'of', 'strings']) I'm not advocating for this syntax, but perhaps it is clarifying. Also, a quick search finds this thread from 20 years ago on this very issue: https://mail.python.org/pipermail/python-dev/1999-June/095366.html On Mon, Jan 28, 2019 at 9:37 PM Ronie Martinez <ronmarti18@gmail.com> wrote:

On Tue, Jan 29, 2019, 12:22 AM Brendan Barnwell
This feels like an important asymmetry to me. There is a difference between to object itself being the wrong kind of thing and the arguments to a method being wrong. In the first case, the object (a heterogenous list) can NEVER support a .join() method. It's simply the wrong kind of object. Of course, it's right as far as the basic type system goes, but its deeper (maybe "structural") type cannot support that method. On the other hand, sure, almost any function, including methods, will choke on bad arguments. But no string *object* rules out joining if good arguments can be found. I am sure readers will immediately reply, "what about list.sort()?" Unfortunately, that really will simply fail on lists of the wrong "type." After all these years, I still think that change in Python 2.3 or so was the wrong choice (for those with fewer gray hairs: when the hills were young, Python objects were arbitrarily comparable under inequality, even when the answer didn't "mean" anything). I actually agree that a 'cast_to_string_and_join()' function sounds useful. Of course, you can write one easily enough, it doesn't need to be a method. For that matter, I think I'd probably rather that str.join() was simply a function in the string module or somewhere similar, with a signature like 'join(delim, iter_of_strings)'

On Tue, Jan 29, 2019 at 4:48 PM David Mertz <mertz@gnosis.cx> wrote:
Considering that you can provide a key function to sort(), there is by definition no list of objects which utterly cannot be sorted. That said, though, I don't think this is an overly strong argument. The main reason lists don't have a join method is that str.join() can take *any iterable*, so it's perfectly legal to join tuples or generators without needing to listify them. Consider: # Join the parts, ignoring empty ones "_".join(filter(None, parts)) c = collections.Counter(...) "_".join(item for item, count in c.most_common()) # solving Brendan's complaint of perversity "_".join(map(str, stuff)) If these were flipped around, you'd have to explicitly call list() on them just to get a join method. BTW, Ronie: I would disagree. Python uses syntactic elements only where functions are incapable of providing equivalent functionality. That's why print became a function in 3.0 - it didn't need to be magical syntax any more. ChrisA

Yeah, that's a good reason to use .format when you have a fixed number of arguments. "{}, {}, {}, {}".format(some, random, stuff, here) And then there is map. Otherwise .join is very common on iterables like '\n'.join(make_string(object) for object in something) '\n'.join(map(make_string, something)) '\n'.join(map(str, nonstr)) '\n'.join('{}: {}'.format(x, y) for x,y in blabla) '\n'.join(map('[{}]'.format, stuff)) A "join format" construct is very typical in codes producing strings from iterable. I agree on the part "a list doesn't always contain string so why would it have a join method".

Thanks for your feedback ! So, do you think anything can be done to make when assembling strings less confusing / fix the inconsistency between the syntax of of os.path.join and str.join ? Have a great day

Thanks for the advice Jonathan, can you clarify the documentation topic you think should be improved or created ? "Assembling strings" or "inconsistencies between os.path.join and str.join" ? I've written an article to summarize but I don't want to publish it because my blog serves my lobbying for python and not against it. Also I don't feel confident about it because I never had the luck to work closely with core-devs or other people with a lot more experience than me like I can so easily find on internet (thank you all, I love you !). So, I deliver it here under WTFPL license. The mistake I'm still doing after 10 years of Python I love Python really, but there's a mistake I've been doing over and over again while assembling strings of all sorts in Python and that I have unconsciously ignored until now. Love it or hate it, but when you start with python it's hard to be completely indifferent to: '\n'.join(['some', 'thing']) But then you read the kilometers of justifications that the python devs have already had for the past 20 years about it and, well, grow indifference about it "that's the way it's gonna be if I want to use python". But recently, I started to tackle one of the dissatisfaction I have with my own code: I think how I assemble strings doesn't make me feel great compared to the rest of what I'm doing with Python. However, it strikes me that assembling strings in python is something I do many times a day, for 10 years, so, taking some time to question my own doing could prove helpful on the long run. The little story of a little obsession... ## `os.path.join(*args)` vs. `str.join(arg)` I'm living a dream with os.path.join: >>> os.path.join('some', 'path') 'some/path' But then I decide that cross platform is going to be to much work so why not join with slashes directly and only support free operating systems: >>> '/'.join('some', 'path') TypeError: join() takes exactly one argument (2 given) "Well ! I forgot about this for a minute, let's "fix" it and move on": >>> '/'.join(['some', 'path']) 'some/path' Ohhh, I'm not really sure in this case, isn't my code going to look more readable with the os.path.join notation after all ? Ten years later, I still make the same mistake, because 2 seconds before doing a str join I was doing a path join. The fix is easy because the error message is clear, so it's easier to ignore the inconsistency and just fix it and move on. But, what if, this was an elephant in the room that it was so easy to look away from ? ## Long f-strings vs. join The new python format syntax with f-strings is pretty awesome, let's see how we can assemble a triple quoted f-string: foo = f''' some {more(complex)} {st.ri("ng")} '''.strip() Pretty cool right ? In a function it would look like this: def foo(): return f''' some {more(complex)} {st.ri("ng")} ''').strip() Ok so that would also work but we're going to have to import a module from the standard library to restore visual indentation on that code: import textwrap def foo(): return textwrap.dedent(f''' some {more(complex)} {st.ri("ng")} ''').strip() Let's compare this to the join notation: def foo(): return '\n'.join('some', more(complex), st.ri('ng')) Needless to say, I prefer the join notation for this use case. Not only does it fit in a single line but it doesn't require to dedent the text with an imported function, nor does it require to juggle with quotes, but also it sorts of look like it would be more performant. All in all, I prefer the join notation to assemble longer strings. Note that in practice, using f-strings for the "pieces" that I want to assemble and that works great: def foo(): return '\n'.join('some', more(complex), f'_{other}_') Anyway, ok good-enough looking code ! Let's see what you have to say: TypeError: join() takes exactly one argument (2 given) Oh, that again, kk gotfix: def foo(): return '\n'.join(['some', more(complex), f'_{other}_']) I should take metrics about the number of times were I make this mistake during a day, cause it looks like it would be a lot (i switch between os.path.join to str.join a lot). ## The 20-yr old jurisprudence So, what looks more ergonomic between those two syntax: [ 'some', more(complex), f'_{other}_' ].join('\n') '\n'.join([ 'some', more(complex), f'_{other}_' ]) It seems there is a lot of friction when proposing to add a convenience join method to the list method. I won't go over the reasons for this here, there's already a lot to read about it on internet, that's been written during the last 20 years. ## Conclusion I have absolutely no idea what should be done about this, the purpose of this article was just to share a bit of one of my obsessions with string assembling. Maybe it strikes me assembling strings multiple times a day with a language I've got 10 years of full-time experience and still repeating the same mistakes. Not because I don't understand the jurisprudence, not because I don't understand the documentation, or because the documentation is wrong, but probably just because i switch from os.path.join and str.join which take different syntax, i think. Perhaps the most relevant proposal here would be to extend str.join signature, which currently supports this notation: str.join(iterable) To support also this notation: str.join(arg1, ...argN) So at least, people won't be doing mistakes when switching over from os.path.join and str.join. Perhaps, something else ? Have a great day

A couple notes: On Tue, Jan 29, 2019 at 5:31 AM Jamesie Pic <jpic@yourlabs.org> wrote:
can you clarify the documentation topic you think should be improved or created ? "Assembling strings"
I would think "assembling strings", though there is a lot out there already.
or "inconsistencies between os.path.join and str.join" ?
well, if we're talking about moving forward, then the Path object is probably the "right" way to join paths anyway :-) a_path / "a_dir" / "a_filename" But to the core language issue -- I started using Python with 1.5.* and back then join() was in the string module (and is there in 2.7 still) And yes, I did expect it to be a list method... Then it was added as a method of the string object. And I thought THAT was odd -- be really appreciated that I didn't need to import a module to do something fundamental. But the fact is, that joining strings is fundamentally a string operation, so it makes sense for it to be there. In earlier py2, I would have thought, maybe it should be a list method -- it's pretty darn common to join lists of strings, yes? But what about tuples? Python was kind of all about sequences -- so maybe all sequences could have that method -- i.e part of the sequence ABC. But with > py3k, Python is more about iterables than sequences -- and join (and many other methods and functions) operate on any iterable -- and this is a really good thing. So add join to ALL iterables? That makes little sense, and really isn't possible -- an iterable is something that conforms to the iterator protocol -- it's not a type, or even an ABC. So in the end, join really does only make sense as string method. Or Maybe as a built in -- but we really don't need any more of those. If you want to argue that str.join() should take multiple arguments, like os.path.join does, then, well we could do that -- it currently takes one and only one argument, so it could be extended to join multiple arguments -- but I have hardly ever seem a use case for that. The mistake I'm still doing after 10 years of Python
hmm -- I've seen a lot of newbies struggle with this, but haven't had an issue with it for years myself.
>>> '/'.join('some', 'path') TypeError: join() takes exactly one argument (2 given)
pathlib aside, that really isn't the right way to join paths ..... os.path.jon exists for a (good) reasons. One of which is this: In [22]: os.path.join("this/", "that") Out[22]: 'this/that' -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

From here, by all means repackage for your own convenience in your own
I've not been following closely, so please forgive me if I'm repeating something already said in this thread. Summary: str.join allows us to easily avoid, when assembling strings, 1. Quadratic running time. 2. Redundant trailing comma syntax error. The inbuilt help(str.join) gives: S.join(iterable) -> str Return a string which is the concatenation of the strings in the iterable. The separator between elements is S. This is different from sum in two ways. The first is the separator S. The second is performance related. Consider s = 0 for i in range(100): s += 1 and s = '' for i in range(100): s += 'a' The first has linear running time (in the parameter represented by 100). The second has quadratic running time (unless string addition is doing something clever, like being lazy in evaluation). The separator S is important. In Python a redundant trailing comma, like so, val = [0, 1, 2, 3,] is both allowed and useful. (For example, when the entries are each on a simple line, it reduces the noise that arises when an entry is added at the end. And when the entries are reordered.) For some languages, the redundant trailing comma is a syntax error. To serialise data for such languages, you can do this: >>> '[{}]'.format(', '.join(map(str, v))) '[0, 1, 2, 3]' library, or use a third party library that already has what you want. (A widely used pypi package has, I think, a head start for adoption into the standard library.) By the way, as search for "python strtools" gives me https://pypi.org/project/extratools/ https://www.chuancong.site/extratools/functions/strtools/ https://pypi.org/project/str-tools/. # This seems to be an empty stub. -- Jonathan

On Tue, Jan 29, 2019 at 09:21:48PM +0000, Jonathan Fine wrote:
The lack of a syntax error for trailing commas is a language-wide feature that has nothing to do with str.join.
Three ways. sum() intentionally doesn't support strings at all: py> sum(['a', 'b', 'c'], '') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sum() can't sum strings [use ''.join(seq) instead] unless you cleverly defeat this intentional limitation. (How to do this is left as an exercise for the reader.)
In CPython, string addition does often do something clever. But not by being lazy -- it optimizes the string concatenation by appending to the strings in place if and only if it is safe to do so. -- Steve

Thank you Jonathan, performance is one of the various reasons I prefer join to assembles strings, than, say, triple-quote dedent'ed f-strings or concatenation. It also plays well syntaxically, even though there is still a little room for improvement. For example, in PHP implode('-', array(2, 'a')) returns '2-a', and now that I think of it, it's the only thing i regret from php's stdlib... And assembling a string like that really looks like a common problem programmers face every day of their journey... The chuacong.site design for extratools documentation is really beautiful ! I found the smartplit function but no smartjoin. On my side I have requested comments on a PR in the boltons repo already, let's see if they find refutation before proposing a smartjoin implementation to extratools. https://github.com/mahmoud/boltons/pull/197 Would you recommend to release it on its own ? Ie. from implode import implode ? Thanks

On Tue, Jan 29, 2019 at 9:50 PM Chris Barker via Python-ideas <python-ideas@python.org> wrote:
I would think "assembling strings", though there is a lot out there already.
Which one do you prefer ?
So in the end, join really does only make sense as string method.
What do you think of list.stringify(delim) ? Thanks for your reply, I recon using paths does make the article more confusing, it was meant as an example to illustrate common problems that a programmer caring about user experience are like. It makes the article look like the point was to build crossplatform paths, and distracts the user from the whole purpose of assembling a string with code. Have a great day ;)

On Tue, Jan 29, 2019 at 10:51:26PM +0100, Jamesie Pic wrote:
What do you think of list.stringify(delim) ?
What's so special about lists? What do you think of: tuple.stringify deque.stringify iterator.stringify dict.keys.stringify etc. And what's so special about strings that lists have to support a stringify method and not every other type? list.floatify list.intify list.tuplify list.setify list.iteratorify Programming languages should be more about composable, re-usable general purpose components more than special cases. -- Steve

1) I'm in favor of adding a stringify method to all collections 2) strings are special and worthy of a "special case" because strings tend to be human readable and are used in all kinds of user interface. -------- Original Message -------- On Jan 29, 2019, 16:04, Steven D'Aprano wrote:

"Not every five line function needs to be in the standard library" ... even more true for every one line function. I can think of a few dozen variations of similar but not quite identical behavior to my little stringify() that "could be useful." Python gives us easy composition to create each of them. It's not PHP, after all. On Tue, Jan 29, 2019 at 8:52 PM Alex Shafer <ashafer@pm.me> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Frankly this sounds like resistance to adaptation and evolution. How long ago was that adage written? Or perhaps this is a pathological instance of the snowball fallacy? Adding one widely requested feature does not imply that all requested features will be added. -------- Original Message -------- On Jan 29, 2019, 18:57, David Mertz wrote:

Of course not! The request was for something that worked on Python *collections*. If the OP wanted something that worked on iterables in general, we'd need a different function with different behavior. Of course, it also doesn't work on dictionaries. I don't really have any ideas what the desired behavior might be for dicts. Various things are conceivable, none obvious. But it's fine on lists, sets, tuples, deques, and some other things that are roughly sequence-like. On Tue, Jan 29, 2019, 10:38 PM Robert Vanden Eynde <robertve92@gmail.com wrote:

The point really is that something called 'stringify()' could do a lot of different reasonable and useful things. None of them are universally what users would want. Unless you have to function scads if optional keyword arguments, is behavior would surprise many users and not for their purpose. On Tue, Jan 29, 2019, 10:46 PM David Mertz <mertz@gnosis.cx wrote:

I love it when the discussion goes fast like here! :D The messages are short or long-structured-and-explaining, I love it :) -- Sorry if I may look like a troll sometimes, I truly like the conversation and I want to share the excitement :)

On Wed, Jan 30, 2019 at 2:45 AM David Mertz <mertz@gnosis.cx> wrote:
Done! Does that really need to be in the STDLIB?
Well, Robert suggested to define it in the python startup script. The issue I'm having with that is that it will make my software harder to distribute: it will require the user to hack their startup script, or even worse : do it ourself in setup.py ! Jonathan suggested to add it to an external package like strtools that has a smartsplit() function, but not smartjoin(). So far I have a PR in boltons, I've requested their comments, so, I'll let you know if they have a refutation to provide. Otherwise, I will try to submit it to the strtools package. Otherwise, I can make a custom package for that one-liner, like it's fairly common to do in NPM packages. Do you have any suggestions on the API ? I see that the implode name is available on PyPi, do you think this would be nice to import the one-liner ? from implode import implode Thanks for your reply -- ∞

To be fair, we could add an implementation to the sequence ABC, and get pretty far. Not that I’m suggesting that — as I said earlier, Python is all about iterables, not sequences, anyway. Also, despite some folks’ instance that this “stringify’ method is something many folks want -.I’m not even sure what it is. I was thinking it was: def stringify(self, sep): return sep.join(str(i) for i in self) Which, by the way would work for any iterable :-) If you want a language designed specifically for text processing, use Perl. Python is deliberately strongly typed, so that: 2 + “2” Raises an error. Why should: “”.join([2, “2”]) not raise an error as well? And aside from repr or ascii, when I turn numbers into text, I usually want to control the formatting anyway: “ “.join(f”{n:.2f}” for n in seq) So having str() called automatically for a join wouldn’t be that useful. -CHB

def stringify(self, sep): return sep.join(str(i) for i in self)
= map(sep.join(map(str, self)) However some folks want: def stringify(*args, *, sep:str=SomeDefault): return sep.join(map(str, args)) In order to have:
stringify(1, 2, "3", sep="-") 1-2-3
And I agree about the formatting, we know that str(x) and format(x) are synonyms so I'd suggest: def stringify(*args, *, sep:str=SomeDefault, fmt=''): return sep.join(format(x, fmt) for x in args) And the implicit call to str is really not surprising for a function called stringify IMO If you want a language designed specifically for text processing, use Perl.
True ! However typing python -cp "1+1" is really tempting...
Python is deliberately strongly typed, so that:
I agree

def stringify(*args, *, sep:str=SomeDefault):
I meant def stringify(*args, sep:str=SomeDefault) So an idea would use duck typing to find out if we have 1 iterable or a multiple stuff : def stringify(*args, sep:str=SomeDefault, fmt=''): it = args[0] if len(args) == 1 and hasattr(args[0], '__iter__') else args return sep.join(format(x, fmt) for x in it) But 🦆 duck typing is nasty... I don't want that in the stdlib (but in a pip package, sure!)

On Wed, Jan 30, 2019 at 7:14 AM Robert Vanden Eynde <robertve92@gmail.com> wrote:
But 🦆 duck typing is nasty... I don't want that in the stdlib (but in a pip package, sure!)
Not only do I actually like your implementation, but I also love duck typing. For me duck typing means freedom, not barrier. -- ∞

On Wed, Jan 30, 2019 at 7:03 AM Robert Vanden Eynde <robertve92@gmail.com> wrote:
What do you think could be the developer intent when they do ",".join([2, "2']) ? If the intent is clearly to assemble a string, as it looks like, then I don't find any disadvantage to automate this task for them. -- ∞

On 1/30/2019 5:07 AM, Jamesie Pic wrote:
Your examples show literals, but I literally (heh) never use str.join this way. I always pass it some variable. And 100% of the time, if that variable (say it's a list) contains something that's not a string, I want it to raise an exception. I do not want this to succeed: lst = ['hello', None] ', '.join(lst) lst is usually computed a long way from where the join happens. So, I do not want this task automated for me. Eric

On Wed, Jan 30, 2019 at 11:24 AM Eric V. Smith <eric@trueblade.com> wrote:
That's a really good point ! So, maybe we have a parameter for that ... from implode import implode assert implode('-', [3, None, 2], none_str='') == '3-2' Even that still seems pretty fuzzy to me, please, can you share an idea for improvement ? -- ∞

On Wed, Jan 30, 2019 at 11:07:52AM +0100, Jamesie Pic wrote:
What do you think could be the developer intent when they do ",".join([2, "2']) ?
I don't know what your intent was, although I can guess, but I do know that I sure as hell don't want a dumb piece of software like the interpreter running code that I didn't write because it tried to guess what I possibly may have meant. http://www.catb.org/jargon/html/D/DWIM.html And from the Zen: Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. Don't think about toy examples where you put literals in the code. Sure, we want a string, but that's trivial. What sort of string and what should it look like? Think about non-trivial code like this: header = generate_header() body = template.format(','.join(strings)) document = make(header, body) and imagine that somehow, a non-string slips into something which is supposed to be a string. Now what do you think my intent is? It isn't enough to just say "I want a string dammit, and I don't care what's in it!". If a non-string slips in there, I sure as hell want to know how and why, because that's a bug, not a minor inconvenience. The most junior developer in the team could easily paper over the bug by adding in a call to map(str, strings) but that doesn't fx the bug, it just hides it and all but guarantees the document generated is corrupt, or worse, wrong. "I find it amusing when novice programmers believe their main job is preventing programs from crashing. ... More experienced programmers realize that correct code is great, code that crashes could use improvement, but incorrect code that doesn’t crash is a horrible nightmare." -- Chris Smith If we look at where the strings come from: strings = [format_record(obj) for obj in datasource if condition(obj)] we're now two steps away from the simplistic "we want a string" guess of your example. When we look at format_record and find this: def format_record(record): if record.count < 2: ... elif record.type in ('spam', 'eggs'): ... elif record.next() is None: ... # and so on for half a page we're just getting further and further away from the trivial cases of "just give me a string dammit!". Going back to your example (correcting the syntax error): ",".join([2, "2"]) To save you about a quarter of a second by avoiding having to type quote marks around the first item, you would cost me potentially hours or days of hair-tearing debugging trying to work out why the document I'm generating is occasionally invalid or corrupt in hard to find ways. That's not a trade off I have any interest in making. -- Steve

Wow, thanks for your great reply Steven ! It really helps me get a better understanding of what I'm trying to do and move forward in my research ! Some values are not going to be nice as strings, so I think I'm more going to try to make a convenience shortcut for str map join, for when I want to generate a human readable string. Ie.: mapjoin(*args, sep='\n', key=str). Then I could replace: readable = '\n'.join(map(str, [ 'hello', f'__{name}__', etc... ])) OR def foo(): readable = textwrap.dedent(f''' hello __{name}__ ''').strip() With: readable = mapjoin( 'hello', f'__{name}__' sep='\n', # map=format_record could be used ) That removes the "fuzzy" feeling I get from my previous proposals. So, after a while if people are using that mapjoin that we could have on PyPi, we could perhaps consider it to improve str.join. Or, do you think adding such features to str.join is still discussable ?

Oops, fixing my last example: readable = mapjoin( 'hello', f'__{name}__', sep='\n', # key=format_record, could be used here ) Signature would be like (illustrating defaults): mapjoin(*args, sep='\n', key=str)

The intent is not clear. How is the 2 to be formatted? I fixed a nasty bug recently where a join of a list of strings contained a non-string in some cases. If the str(bad_value) had been the default I would not have been able to track this down from the traceback in a few minutes. I'm -1 on this idea as it would hide bugs in my experience. Barry

Thanks for your email Barry. This is indeed a good point and the proposal has changed a bit since then. It's more "add a key kwarg to str.join where you can set key=str yourself if you want".

Let's see if this gets any download at all: https://pypi.org/project/mapjoin/ Sorry for this obscenity xD Thank you all for your replies ! Have a great day Best regards

On Wed, Jan 30, 2019 at 12:09:55AM +0000, Alex Shafer wrote:
2) strings are special and worthy of a "special case" because strings tend to be human readable and are used in all kinds of user interface.
So are ints, floats, bools, lists, tuples, sets, dicts, etc. We already have a "stringify" function that applies to one object at a time. It's spelled str(), or if you prefer a slightly different format, repr(). To apply the stringify function of your choice to more than one object, you can use a for-loop, or a list comprehension, or a set comprehension, or map(). This is called composition of re-usable components, and it is a Good Thing. If you don't like the two built-in stringify functions, you can write your own, and they still work with for-loops, comprehensions and map(). Best of all, we're not even limited to strings. Change your mind and want floats instead of strings? Because these are re-usable, composable components, you don't have to wait for Python 4.3 to get a list floatify() method, you can just unplug the str() component and replace it with the float() component. -- Steve

On Wed, Jan 30, 2019 at 9:21 AM Steven D'Aprano <steve@pearwood.info> wrote:
If you don't like the two built-in stringify functions, you can write your own, and they still work with for-loops, comprehensions and map().
I don't disagree, after all, there are many NPM packages that contain really short functions, we could package the function on its own. I see that the "implode" namespace is not taken on PyPi, so, what do you suggest it would look like ? from implode import implode ? Or can you suggest better names ?
Best of all, we're not even limited to strings. Change your mind and want floats instead of strings?
To be user friendly software will need to build proper text output. And most of the time joining a sequence is the best way to go. But, I often mistake because switching over from os.path.join and str.join. -- ∞

On Wed, Jan 30, 2019 at 11:17 AM Jamesie Pic <jpic@yourlabs.org> wrote:
often mistake because switching over from os.path.join and str.join.
I didn't mean "replacing an os.path.join call with an str.join call", I mean that I'm calling str.join 2 seconds after os.path.join, and forgot about the inconsistency we have between the two. Does this make any sense? Thanks for your great replies -- ∞

Thanks Steven for your reply. For me, assembling a string from various variables is a much more common programing task, because that's how users except software to communicate with them, be it on CLI, GUI, or through Web. It doesn't matter if your software works and the user doesn't understand it. It doesn't matter if your software doesn't work, as long as the user understands it. I wonder what makes my use case so special, perhaps because when I make software it's always on the purpose to serve an actual human being need ?

On Wed, Jan 30, 2019 at 8:50 PM Jamesie Pic <jpic@yourlabs.org> wrote:
Most places where you need to talk to humans, you'll end up either interpolating the values into a template of some sort (see: percent formatting, the format method, and f-strings), or plug individual values straight into method calls (eg when building a GUI). I'm not sure why or how your use-case is somehow different here. It's generally best to provide simple low-level functionality, and then let people build it into whatever they like. For example, VLC Media Player and Counter-Strike: Global Offensive don't have any means of interacting, but with some simple Python programming in between, it's possible to arrange it so that the music automatically pauses while you're in a match. But there does NOT need to be a game feature "automatically pause VLC while in a match". Joining a collection of strings is possible. Stringifying a collection of arbitrary objects is possible. There doesn't need to be a single feature that does both at once. ChrisA

On Wed, Jan 30, 2019 at 11:06 AM Chris Angelico <rosuav@gmail.com> wrote:
The new python format syntax with f-strings is pretty awesome, let's see how we can assemble a triple quoted f-string: foo = f''' some {more(complex)} {st.ri("ng")} '''.strip() Pretty cool right ? In a function it would look like this: def foo(): return f''' some {more(complex)} {st.ri("ng")} ''').strip() Ok so that would also work but we're going to have to import a module from the standard library to restore visual indentation on that code: import textwrap def foo(): return textwrap.dedent(f''' some {more(complex)} {st.ri("ng")} ''').strip() Let's compare this to the join notation: def foo(): return '\n'.join('some', more(complex), st.ri('ng')) Needless to say, I prefer the join notation for this use case. Not only does it fit in a single line but it doesn't require to dedent the text with an imported function, nor does it require to juggle with quotes, but also it sorts of look like it would be more performant. All in all, I prefer the join notation to assemble longer strings. Note that in practice, using f-strings for the "pieces" that I want to assemble and that works great: def foo(): return '\n'.join('some', more(complex), f'_{other}_') Anyway, ok good-enough looking code ! Let's see what you have to say: TypeError: join() takes exactly one argument (2 given) Oh, that again, kk gotfix: def foo(): return '\n'.join(['some', more(complex), f'_{other}_']) I should take metrics about the number of times were I make this mistake during a day, cause it looks like it would be a lot (i switch between os.path.join to str.join a lot). It seems there is a lot of friction when proposing to add a convenience join method to the list method. I won't go over the reasons for this here, there's already a lot to read about it on internet, that's been written during the last 20 years. ## Conclusion I have absolutely no idea what should be done about this, the purpose of this article was just to share a bit of one of my obsessions with string assembling. Maybe it strikes me assembling strings multiple times a day with a language I've got 10 years of full-time experience and still repeating the same mistakes because I coded an os.path.join call 3 seconds before assembling a string with str.join, silly me ^^ Not because I don't understand the jurisprudence, not because I don't understand the documentation, or because the documentation is wrong, but probably just because i switch from os.path.join and str.join which take different syntax, i think.
Even for a program without user interface: you still want proper logs in case your software crashes for example . So even if you're not building a user interface, you still want to assemble human readable strings. If it's such a common task, why not automate what's obvious to automate ? -- ∞

On Wed, Jan 30, 2019 at 11:06 AM Chris Angelico <rosuav@gmail.com> wrote:
Actually we're moving away from templates, in favor of functional decorating component-based pattern pretty much like React, in some R&D open source project. Not only do we get much better performance than with a template rendering engine, but we also get all the power of a good programing language: Python :) -- ∞

On Wed, Jan 30, 2019 at 10:33 PM Jamesie Pic <jpic@yourlabs.org> wrote:
Well, I've no idea how your component-based system works, but in React itself, under the covers, the values end up going straight into function calls, which was the other common suggestion I gave :) There's a reason that those two styles, rather than join() itself, will tend to handle most situations. ChrisA

On 2019-01-29 16:14, MRAB wrote:
Then you can still convert them yourself beforehand, and any stringifying that .join did would be a no-op. If you want to call repr on all your stuff beforehand, great, then you'll get strings and you can join them just like anything else. But you'll ADDITIONALLY be able to not pre-stringify them in a custom way, in which case they'll be stringified in the default way. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 2019-01-29 15:38, Greg Ewing wrote:
Oh please. Because it also RETURNS a string. Of course count won't return a string, it returns a count. But str.join is for "I want to join these items into a single string separated by this delimiter". If the output is to a be a string obtained by combining other items, there is nothing lost by converting them to strings. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

So you'd propose to add some kind of def Join(sep, *args): return sep.join(map(str, args)) To the standard lib ? Or to add another method to str class that do that ? class str: ... def Join(self, *args): return self.join(map(str, args)) I agree such a function is super convenient but does it need to be added to the standard lib I have it in my custom utils.py and my PYTHONTARTUP.py file so that I can use it everywhere. Call it Join, superjoin, joinargs... On Tue, 29 Jan 2019, 02:43 Jamesie Pic <jpic@yourlabs.org wrote:

Oh and if you want to write ['a', 'b', 'c'].join('.') Check out pip install funcoperators and you can write : ['a', 'b', 'c'] |join('.') Given you defined the function below : from funcoperators import postfix def join(sep): return postfix(lambda it: sep.join(map(str, it)) You can even choose the operator : ['a', 'b', 'c'] -join('.') ['a', 'b', 'c'] /join('.') ['a', 'b', 'c'] @join('.') ... Disclaimer : I'm the creator of funcoperators On Tue, 29 Jan 2019, 02:43 Jamesie Pic <jpic@yourlabs.org wrote:

funcoperators is pretty neat ! But at this stage of the discussion I would also try to get automatic string casting since the purpose is to assemble a string. It would be great in the stdlib because switching between os.path.join and str.join is so error-prone, and assembling strings seems like a pretty common task. It's not uncommon to find str.join in arguments against Python. Monkey patching str in PYTHONTARTUP.py would work, but then that would require users pulling my package to also hack their startup script. Or even worse: we could patch the startup script upon package installation. It seems like it would make redistribution a lot harder than it should. Another approach would be to add a stringify(delim='\n') method to iterables, it would accept a delimiter argument and would return a string of all items casted to string and separated by the delimiter. That would be certainly more backward-compatible than supporting an alternate str.join(1, 'b') call. Meanwhile I've opened a PR on boltons, but, well, it looks a lot like php.net/implode, and I'm not really sure we want that :D https://github.com/mahmoud/boltons/pull/197/commits/2b4059855ab4ceae54032bff... -- ∞

On 29/01/2019 01:40, Jamesie Pic wrote:
It seems fairly consistent to make: os.path.join('a', 'b', 'c') short for: os.path.sep.join(['a', 'b', 'c'])
Please, no. This would be un-Pythonic in my view. It makes so much more sense that str should have a method that takes an iterable, returning str, than that every iterable should have a join(str) returning str. Consider you get this kind of thing for free: "-".join(str(i) for i in range(10)) I learned enough Groovy last year to use Gradle and was so disappointed to find myself having to write: excludes: exclusions.join(',') // Yes, it's that way round :o Even Java agrees (since 1.8) with Python. Jeff Allen

I'm not disagreeing by any mean. I'm just saying assembling strings is a common programing task and that we have two different methods with the same name and inconsistent signatures and that it's error-prone. I'm most certainly *not* advocating for breaking compatibility or whatnot.

Hi, At the end this long thread because 2 functions doing quite the same thing have the same name but not the same signature and it's confusing for some people (I'm one of those) |str.||join|(/iterable/) |os.path.||join|(/path/, /*paths/) There are strong arguments about why it's implemented like that and why it's very difficult to change it. Maybe some change could be giving str.join 1 iterable or many args : about str.join: a - if 0 arg : error b - if 1 arg : process or return error if not iterable c - if > 1 arg: do b using all args as one iterable maybe some performance issues could go against it. I agree with the fact that this is a minor need and it should not allow major change Le 30/01/2019 à 11:01, Jamesie Pic a écrit :

Thanks for your reply Jimmy ! As suggested by Chris and Steven, we might also want to throw in a "key" kwarg, that could be none by default to keep BC, but also allow typecasting: ' '.join('a', 2, key=str) -- ∞

On Wed, Jan 30, 2019 at 10:14 PM Chris Angelico <rosuav@gmail.com> wrote:
I didn't, but I don't know if Chris Barker did.
nope -- not me either :-)
(Can't swing a cat without hitting someone named Steve or Chris, in some spelling or another!)
good thing there aren't a lot of cats being swung around, then. One more note about this whole thread: I do a lot of numerical programming, and used to use MATLAB and now numpy a lot. So I am very used to "vectorization" -- i.e. having operations that work on a whole collection of items at once. Example: a_numpy_array * 5 multiplies every item in the array by 5 In pure Python, you would do something like: [ i * 5 for i in a_regular_list] You can imagine that for more complex expressions the "vectorized" approach can make for much clearer and easier to parse code. Also much faster, which is what is usually talked about, but I think the readability is the bigger deal. So what does this have to do with the topic at hand? I know that when I'm used to working with numpy and then need to do some string processing or some such, I find myself missing this "vectorization" -- if I want to do the same operation on a whole bunch of strings, why do I need to write a loop or comprehension or map? that is: [s.lower() for s in a_list_of_strings] rather than: a_list_of_strings.lower() (NOTE: I prefer comprehension syntax to map, but map would work fine here, too) It strikes me that that is the direction some folks want to go. If so, then I think the way to do it is not to add a bunch of stuff to Python's str or sequence types, but rather to make a new library that provides quick and easy manipulation of sequences of strings. -- kind of a stringpy -- analogous to numpy. At the core of numpy is the ndarray: a "a multidimensional, homogeneous array of fixed-size items" a strarray could be simpler -- I don't see any reason for more than 1-D, nor more than one datatype. But it could be a "vector" of strings that was guaranteed to be all strings, and provide operations that acted on the entire collection in one fell swoop. If it turned out to be useful, you could even make a version in C or Cython that might give significant performance benefits. I don't have a use case for this -- but if someone does, it's an idea. -CHB Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Jan 31, 2019 at 12:52 PM Chris Barker via Python-ideas < python-ideas@python.org> wrote:
Isn't what you want called "Pandas"? E.g.:
type(strs) <class 'pandas.core.series.Series'>
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Fri, Feb 1, 2019 at 4:51 AM Chris Barker <chris.barker@noaa.gov> wrote:
Here's a simpler and more general approach: a "vector" type. Any time you attempt to look up any attribute, it returns a vector of that attribute for each of its elements. When you call a vector, it calls each element (with the same args) and returns a vector of the results. So the vector would, in effect, have a .lower() method that returns .lower() of all its elements. (David, your mail came in as I was typing mine, so it looks fairly similar, except that this proposed vector type wouldn't require you to put ".str" in the middle of it, so it would work with any type.) ChrisA

On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas wrote:
Julia has special "dot" vectorize operator that looks like this: L .+ 1 # adds 1 to each item in L func.(L) # calls f on each item in L https://julialang.org/blog/2017/01/moredots The beauty of this is that you can apply it to any function or operator and the compiler will automatically vectorize it. The function doesn't have to be written to specifically support vectorization.
Using Julia syntax, that might become a_list_of_strings..lower(). If you don't like the double dot, perhaps str.lower.(a_list_of_strings) would be less ugly. -- Steven

I accidentally replied only to Steven - sorry! - this is what I said, with a typo corrected:
a_list_of_strings..lower()
str.lower.(a_list_of_strings)
I much prefer this solution to any of the other things discussed so far. I wonder, though, would it be general enough to simply have this new '.' operator interact with __iter__, or would there have to be new magic methods like __veccall__, __vecgetattr__, etc? Would a single __vectorize__ magic method be enough? For example, I would expect (1, 2, 3) .** 2 to evaluate as a tuple and [1, 2, 3] .** 2 to evaluate as a list, and some_generator() .** 2 to still be a generator. If there were a __vectorize__(self, func) which returned the iterable result of applying func on each element of self: class list: def __vectorize__(self, func): return [func(e) for e in self] some_list .* other becomes some_list.__vectorize__(lambda e: e * 2) some_string..lower() becomes some_string.__vectorize__(str.lower) some_list..attr becomes some_list.__vectorize__(operator.__attrgetter__('attr')) Perhaps there would be a better name for such a magic method, but I believe it would allow existing sequences to behave as one might expect, but not require each operator to require its own definition. I might also be over-complicating this, but I'm not sure how else to allow different sequences to give results of their same type. On Thu, Jan 31, 2019 at 6:24 PM Steven D'Aprano <steve@pearwood.info> wrote:

I love moredots ❤️ With pip install funcoperators, one can implement the *dotmul* iff dotmul can be implemented as a function. L *dotmul* 1 Would work. Or even a simple tweak to the library would allow L *dot* s to be [x*s for x in L] and L /dot/ s to be [x/s for x in L]" I'd implement something like "if left is iterable and right is not, apply [x*y for x in left] else if both are iterable, apply [x*y for x,y in zip(left, right)] etc." Iterble Disclaimer : I'm the creator of funcoperators On Fri, 1 Feb 2019, 00:23 Steven D'Aprano <steve@pearwood.info wrote:

пт, 1 февр. 2019 г. в 02:24, Steven D'Aprano <steve@pearwood.info>:
IMO, the beauty of vector type is that it contains homogeneous data. Therefore, it allows you to ensure that the method is present for each element in the vector. The first given example is what numpy is all about and without some guarantee that L consists of homogeneous data it hardly make sense. The second one is just `map`. So I can't catch what you are proposing: 1. To make an operator form of `map`. 2. To pull numpy into stdlib. 3. Or something else, which is not obvious to me from the examples given. With kind regards, -gdg

I think the actual proposal is having a new type of list (ie : vectors) that works like numpy but for any data. Instead of a list where the user has to be sure all the data is the same type, vectors makes him-er sure it's full of the same data than can me processed using a particular function (as s-he would do with map). I think the syntax proposed is not cool, it's kinda unique in python and doesn't feel pythonic to me. A thing I thought about but I'm not satisfy is using the new matrix-multiplication operator: my_string_vector @ str.lower def compute_grad(a_student): return "you bad" my_student_vector @ compute_grad But it's a bit confusing to me. Le ven. 1 févr. 2019 à 17:04, Kirill Balunov <kirillbalunov@gmail.com> a écrit :

On Fri, Feb 1, 2019, 6:16 PM Adrien Ricocotam <ricocotam@gmail.com wrote:
This is certainly doable. But why would it be better than: map(str.lower, my_string_vector) map(compute_grad, my_student_vector) These latter seem obvious, clear, and familiar.

On Fri, Feb 1, 2019 at 5:00 PM David Mertz <mertz@gnosis.cx> wrote:
or [s.lower() for s in my_string_vector] Side note: It's really interesting to me that Python introduced comprehension sytax some years ago, and even "hid" reduce(), and now there seems to be a big interest / revival of "map". Even numpy supports inhomogeneous data:
well, no -- it doesn't -- look carefully, that is an array or type '!S4' -- i,e, a 4 element long string --every element in that array is that same type. Also note that numpy's support for strings a not very complete. numpy does support an "object" type, that can be inhomogeneous -- it's still a single type, but that type is a python object (under the hood it's an array fo pointers to pyobjects): In [3]: a = np.array([1, 'spam'], dtype=np.object) In [4]: a Out[4]: array([1, 'spam'], dtype=object) And it does support vectorization to some extent: In [5]: a * 5 Out [5]: array([5, 'spamspamspamspamspam'], dtype=object) But not with any performance benefits. I think there are good reasons to have a "string_vector" that is known to be homogenous: Performance -- it could be significantly optimized (are there many use cases for that? I don't know. Clear API: a string_vector would have all the relevant string methods. You could easily write a list subclass that passed on method calls to the enclosed objects, but then you'd have a fair bit of confusion as to what might be a vector method vs a method on the objects. which I suppose leaves us with something like: list.elements.upper() list.elements * 5 hmm -- not sure how much I like this, but it's pretty doable. I still haven't seen any examples that aren't already spelled 'map(fun, it)' and I don't think you will -- I *think* get credit for starting this part of the the thread, and I started by saying I have often longed for essentially a more concise way to spell map() or comprehensions. performance asside, I use numpy because: c = np.sqrt(a**2 + b**2) is a heck of a lot easer to read, write, and get correct than: c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a), map(lambda x: x**2, b) ))) or: [math.sqrt(x) for x in (a + b for a, b in zip((x**2 for x in a), (x**2 for x in b) ))] Note: it took me quite a while to get those right! (and I know I could have used the operator module to get the map version maybe a bit cleaner, but the point stands) Does this apply to string processing? I'm not sure, though I do a fair bit of chaining of string operations: my_string.strip().lower().title() if you wanted to do that to a list of strings: a_list_of_strings.strip().lower().title() is a lot nicer than: [s.title() for s in (s.lower() for s in [s.strip(s) for s in a_list_of_strings])] or list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings)))) # untested How common is that use case? not common enough for me to go any further with this. -CHB -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker <pythonchb@gmail.com> wrote:
You can also write c = [math.sqrt(x**2 + y**2) for x, y in zip(a, b)] or c = list(map(lambda x, y: math.sqrt(x**2 + y**2), a, b)) or, since math.hypot exists, c = list(map(math.hypot, a, b)) In recent Python versions you can write [*map(...)] instead of list(map(...)), which I find more readable. a_list_of_strings.strip().lower().title()
In this case you can write [s.strip().lower().title() for s in a_list_of_strings] -- Ben

On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
What if it's a more complicated example? len(sorted(a_list_of_strings.casefold())[:100]) where the len() is supposed to give back a list of the lengths of the first hundred strings, sorted case insensitively? (Okay so it's a horrible contrived example. Bear with me.) With current syntax, this would need multiple map calls or comprehensions: [len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]] (Better examples welcomed.) ChrisA

Here is a very toy proof-of-concept:
My few lines are at https://github.com/DavidMertz/stringpy One thing I think I'd like to be different is to have some way of accessing EITHER the collection being held OR each element. So now I just get:
v.__len__() <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]>
Yes, that's an ugly spelling of `len(v)`, but let's bracket that for the moment. It would be nice also to be able to ask "what's the length of the vector, in a non-vectorized way" (i.e. 12 in this case). Maybe some naming convention like:
v.collection__len__() 12
This last is just a possible behavior, not in the code I just uploaded. On Sat, Feb 2, 2019 at 6:47 PM Chris Angelico <rosuav@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Slightly more on my initial behavior:
Vector(37) TypeError: Vector can only be initialized with an iterable
Vector("hello") <Vector of 'hello'>
I'm wondering if maybe making a vector out of a scalar should simply be a length-one vector. What do you think? Also, should a single string be treated like a vector of characters or like a scalar? It feels kinda pointless to make a vector of characters since I cannot think of anything it would do better than a plain string already does (largely just the same thing slower). On Sat, Feb 2, 2019 at 8:54 PM David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Trying to make iterators behave in a semi-nice way also. I kinda like this (example remains silly, but it shows idea).
On Sat, Feb 2, 2019 at 9:03 PM David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2019-02-03 02:03, David Mertz wrote:
[snip] I think it should follow the pre-existing behaviour of list, set, tuple, etc.
Vector("hello") <Vector of ['h', 'e', 'l', 'l', 'o']>
Why is it pointless for a vector, but not for a list?

I try to keep the underlying datatype of the wrapped collection as much as possible. Casting a string to a list changes that.
Strings are already a Collection, there is not firm need cast them to a list to live inside a Vector. I like the idea of maintaining the original type if someone wants it back later (possibly after transformations of the values). Why is it pointless for a vector, but not for a list?
I guess it really isn't. I was thinking of just .upper() and .lower() where upper/lower-casing each individual letter is the same as doing so to the whole string. But for .replace() or .count() or .title() or .swapcase() the meaning is very different if it is letter-at-a-time. I guess a string gets unstringified pretty quickly no matter what though. E.g. this seems like right behavior once we transform something:
I dunno... I suppose I *could* do `self._it = "".join(self._it)` whenever I do a transform on a string to keep the underlying iterable as a string. But the point of a Vector really is sequences of strings not sequences of characters. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Nice that you implemented it ! I think all the issues you have right now would go of using another operation. I proposed the @ notation that is clear and different from everything else, plus the operator is called "matmul" so it completely makes sense. The the examples would be :
We still have some issues : how to we treat operators like v[1:]. I suggest using the same syntax : if we don't use @ the operation is done on the vector and not on its elements. Therefore, v[1:] will remove "Jan" from the vector whereas v @ operator.getitem(slice
That little example shows the need of configuring functions so they only accept on argument. It's actually not a new problem since map have the same "issue". A vector of one element should still be a vector, as a list/tuple/dict of one element is a list/tuple/dict, imo. I suggested Vector objects to inherit from lists, and therefore be iterables. It would be handy to iterator over its elements and simple loops, maps, etc, should still be available to them. It might be clearer to use "old" notations for some operations. About the `Vector("A Super String")`, if we want it to be a vector of one element, we should use `Vector(["A Super String"])`, as we would do in any other function using an iterable as input. Side Note : Honestly, I don't think it's the good thread to debate whether we should use ["in", "un", "an", "non"] - homogeneous or heterogeneous. As long as it's clear, does it matter ? Le dim. 3 févr. 2019 à 04:19, David Mertz <mertz@gnosis.cx> a écrit :

On Sun, Feb 3, 2019 at 3:54 AM Adrien Ricocotam <ricocotam@gmail.com> wrote:
plus the operator is called "matmul" so it completely makes sense. The the
examples would be :
I cannot really see how using the @ operator helps anything here. If this were a language that isn't Python (or conceivably some future version of Python, but that doesn't feel likely or desirable to me), I could imagine @ as an operator to vectorize any arbitrary sequence (or iterator). But given that we've already made the sequence into a Vector, there's no need for extra syntax to say it should act in a vectorized way. Moreover, your syntax is awkward for methods with arguments. How would I spell: v.replace('foo', 'bar') In the @ syntax? I actually made an error on my first pass where simply naming a method was calling it. I thought about keeping it for a moment, but that really only allows zero argument calls. I think the principled thing to do here is add the minimal number of methods to Vector itself, and have everything else pass through as vectorized calls. Most of that minimal number are "magic method": __len__(), __contains__(), __str__(), __repr__(), __iter__(), __reversed__(). I might have forgotten a couple. All of those should not be called directly, normally, but act as magic for operators or built-in functions. I think I should then create regular methods of the same name that perform the vectorized version. So we would have: len(v) # -> 12 v.len() # -> <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]> list(v) # -> ["Jan", "Feb", "Mar", "Apr", "May", "Jul" ...] v.list() # -> <Vector of [["J", "a", "n"], ["F", "e", "b"] ... > I can't implement every single constructor that users might conceivably want, of course, but I can do it for the basic types in builtins and common standard library. E.g. I might do: v.deque() # -> <Vector of [deque(["J", "a", "n"]), deque(["F", "e", "b"]) ... > But I certainly won't manually add: v.custom_linked_list() # From my_inhouse_module.py Hmm... maybe even I could look at names of maybe-constructors in the current namespace and try them. That starts to feel too magic. Falling back to this feels better: map(custom_linked_list, v) # From my_inhouse_module.py -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I honestly don’t understand what you don’t like the @ syntax. My idea is using functions that takes on argument : an object of the type of the vector. That’s actually how map works. What I understood from your previous message is that there’s ambiguity when using magic functions on whether it’s applied to each element of the vector or the vector itself. That was the first thing I saw. While reading your examples, I noticed that you were using « my_vec.function() ». You just said that we will not code the « .function » for any function. That’s the other problem I wanted to address with the @ notation. Functions that could be used are then the same we can use in map. But I do agree it’s not easy to have functions with parameters. That’s why I used functools.partial On Sun 3 Feb 2019 at 19:23, David Mertz <mertz@gnosis.cx> wrote:

On Sun, Feb 3, 2019 at 1:38 PM Adrien Ricocotam <ricocotam@gmail.com> wrote:
I honestly don’t understand what you don’t like the @ syntax.
Can you show any single example that would work with the @ syntax that would not work in almost exactly the same way without it? I have not seen any yet, and none seem obvious. Adding new syntax for its own sake is definitely to be avoided when possible (even though technically the operator exists, so it wouldn't be actual new syntax).
My idea is using functions that takes on argument : an object of the type of the vector. That’s actually how map works.
I do not understand this. Spell my simple example using @ notation. I.e. my_vec @ replace {something? here for 'foo' with 'bar'}
I decided there really isn't. I think that any function applied to the vector should operate on the sequence as a whole. E.g. what length does it have? Cast it to a different kind of sequence. Print it out. Serialize it. Etc. The things that are vectorized should always be methods of the vector instead. And ALMOST every method should in fact be a vectorized operation. In most cases, those will be a "pass through" to the methods of the items inside of the vector. We won't write every possible method in the Vector class. My toy so far only works with methods that the items actually have. In the examples, string methods. But actually, I should add one method like this: my_vec.apply(lambda x: x*2) That is, we might want to vectorize custom functions also. Maybe in that example we should name the function 'double' for clarity: ' my_vec.apply(double)'. I do think that just a few methods need to be custom programmed because they correspond to magic methods of the items rather than regular names (or not even directly to magic methods, but more machinery). So: my_vec.list() #-> cast each item to a list my_vec.tuple() #-> cast each item to a tuple my_vec.set() #-> cast each item to a set Maybe that's doing too much though. We could always do that with map() or comprehensions; it's not clear it's a common enough use case. Functions that could be used are then the same we can use in map. But I do
agree it’s not easy to have functions with parameters. That’s why I used functools.partial
I really did not understand how that was meant to work. But it was a whole lot of lines to accomplish something very small either way.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Adrien Ricocotam wrote:
I honestly don’t understand what you don’t like the @ syntax.
Another probkem with @ is that it already has an intended meaing, i.e. matrix multiplication. What if you have two vectors of matrices and you want to multiply corresponding ones? -- Greg

вс, 3 февр. 2019 г. в 21:23, David Mertz <mertz@gnosis.cx>:
Hi David! Thank you for taking the time to implement this idea. Sorry, I'm on a trip now and can't try it. From what I've read in this thread, I think I mostly agree with your perception how the vector should work: that `len(v) # -> 12` and that `.some_method()` call must apply to elements (although pedants may argue that in this case there is not much difference). The only moment that I don’t like is `v.len(), v.list() and ...`, for the same reasons - in general this will not work. I also don't like the option with `.apply` - what if `.apply` method is already defined for elements in a vector?
Actually my thoughts on this. At first I thought that for these purposes it is possible to use __call__: len(v) # -> 12 v(len) # -> <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]> But it somehow this idea did not fit in my head. Then I found the next way and I think I even like it - to reuse the `__getitem__`, when its argument is a function it means that you apply this function to every element in the vector. len(v) # -> 12 v[len] # -> <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]> In this case you can apply any function, even custom_linked_list from my_inhouse_module.py. From this stream I did not understand what desired behavior for unary operations like `vector + 1` and the others. Also what is the desired behaviour for `vector[1:5]`? Considering the above, I would like to take this operation on the contrary:
With kind regards, -gdg

On Mon, Feb 4, 2019 at 7:14 AM Kirill Balunov <kirillbalunov@gmail.com> wrote:
I think I really like this idea. Maybe as an extra spelling but still allow .apply() to do the same thing. It feels reasonably intuitive to me. Not *identical to* indexing in NumPy and Pandas, but sort of in the same spirit as predicative or selection based indices. What do other people on this thread think? Would you learn that easily? Could you teach it?
This feels more forced, unfortunately. Something short would be good, but not sure I like this. This is really just a short spelling of pandas.IndexSlice or numpy.s_ It came up in another thread some months ago, but there is another proposal to allow the obvious spelling `slice[start:stop:sep]` as a way of creating slices. Actually, I guess that's all halfway for the above. We'd need to do this still: v[itemgetter(IndexSlicer[1:])] That's way too noisy. I guess I just don't find the lowercase `i` to be iconic enough. I think with a better SHORT name, I'd like: v[Item[1:]] Maybe that's not the name? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Hi, I'm not sure to understand the real purpose of Vector. Is that a new collection ? Is that a list with a builtin map() function ? Is it a wrapper to other types ? Should it be iterable ? The clear need explained before is using fluent interface on a collection : MyVector.strip().replace("A","E") Why do we need Vector to behave like list. We just want to work on our strings but with a cleaner/shorter/nicer syntax. My idea (not totally clear in my mind) is that Vector should behave quite like the type it wraps so having only one type. I don't want a collection of strings, I want a MegaString (...) which I can use exactly like alone string. An iteration on Vector would iter like itertools.chain does. At the end, I would only need one more method which would return an iterable of the items like MyVector.explode() For me Vector should be something like that : class Vector: def __init__(self, a_list): self.data = a_list self._type = type(self.data[0]) for data in self.data: if type(data) != self._type: raise TypeError def __getattr__(self, name): fn = getattr(self._type, name) def wrapped(*args, **kwargs): self.data = [fn(i, *args, **kwargs) for i in self.data] return self return wrapped def explode(self): return iter(self.data) I'm not saying it should only handle strings but it seems to be the major use case. Jimmy Le 04/02/2019 à 17:12, David Mertz a écrit :
Le 04/02/2019 à 17:12, David Mertz a écrit :

Before I respond to a specific point below, I'd like to make a general observation. I changed the subject line of this sub-thread to discuss a feature of Julia, which allows one to write vectorized code in standard infix arithmetic notation, that applies to any array type, using any existing function or operator, WITHOUT having to wrap your data in a special delegate class like this "Vector". So as far as I'm concerned, this entire discussion about this wrapper class misses the point. (Aside: why is this class called "Vector" when it doesn't implement a vector?) Anyway, on to my response to a specific point: On Mon, Feb 04, 2019 at 11:12:08AM -0500, David Mertz wrote:
obj[len] already has an established meaning as obj.__getitem__(len). There's going to be clash here between key lookup and applying a function: obj[len] # look up key=len obj[len] # apply function len Mathematica does use square brackets for calling functions, but it uses ordinary arithmetic order len[obj] rather than postfix order obj[len]. At the risk of causing confusion^1, we could have a "vector call" syntax: # apply len to each element of obj, instead of obj itself len[obj] which has the advantage that it only requires that we give functions a __getitem__ method, rather than adding new syntax. But it has the disadvantage that it doesn't generalise to operators, without which I don't think this is worth bothering with. ^1 Cue a thousand Stackoverflow posts asking whether they should use round brackets or square when calling a function, and why they get weird error messages sometimes and not other times. -- Steven

On Thu, Feb 7, 2019 at 4:03 PM Steven D'Aprano <steve@pearwood.info> wrote:
Generalizing to operators is definitely going to require new syntax, since both operands can be arbitrary objects. So if that's essential to the idea, we can instantly reject anything that's based on functions (like "make multiplying a function by a tuple equivalent to blah blah blah"). In that case, we come straight to a few key questions: 1) Is this feature even worth adding syntax for? (My thinking: "quite possibly", based on matmul's success despite having an even narrower field of use than this.) 2) Should it create a list? a generator? something that depends on the type of the operand? (Me: "no idea") 2) Does the Julia-like "x." syntax pass the grit test? (My answer: "nope") 3) If not, what syntax would be more appropriate? This is a general purpose feature akin to comprehensions (and, in fact, can be used in place of some annoyingly-verbose comprehensions). It needs to be easy to type and read. Pike's automap syntax is to subscript an array with [*], implying "subscript this with every possible value". It's great if you want to do just one simple thing: f(stuff[*]) # [f(x) for x in stuff] stuff[*][1] # [x[1] for x in stuff] but clunky for chained operations: (f(stuff[*])[*] * 3)[*] + 1 # [f(x) * 3 + 1 for x in stuff] That might not be a problem in Python, since you can always just use a comprehension if vectorized application doesn't suit you. I kinda like the idea, but the devil's in the details. ChrisA

On 2019-02-07 05:27, Chris Angelico wrote:
Would it be possible, at compile time, to retain it as an automap throughout the expression? stuff[*] # [x for x in suffix] f(stuff[*]) # [f(x) for x in stuff] (f(stuff[*]) * 3) + 1 # [f(x) * 3 + 1 for x in stuff] There could also be a way to 'collapse' it again. An uncollapsed automap would be collapsed at the end of the expression. (Still a bit fuzzy about the details...)

Here are some alternate syntaxes. These are all equivalent to len(print(list)). (len | print)(list) (len |> print)(list) (print <| len)(list) print <| len << list list >> print <| len list >> len |> print ## Traditional argument order print <| len << list ## Stored functions print_lengths = len | print print_lengths = len |> print print_lengths = print <| len These can be called using callable syntax. These can be called using << syntax. These can be called using >> syntax. ## Lightweight traditional syntax order (print | len)() # Explanation The pipeline operator (|, |>, <|) create an object. That object implements, depending on the chosen implementation, some combination of the __call__ operator, the __rshift__ operator, and/or the __lshift__ operator. — I am not proposing Python has all these operators at the same time, just putting these ideas out there for discussion.

Many apologies if people got one or more encrypted versions of this. On 2/7/19 12:13 AM, Steven D'Aprano wrote: It wasn't a concrete proposal, just food for thought. Unfortunately the thinking seems to have missed the point of the Julia syntax and run off with the idea of a wrapper class. I did not miss the point! I think adding new syntax à la Julia is a bad idea—or at very least, not something we can experiment with today (and wrote as much). Therefore, something we CAN think about and experiment with today is a wrapper class. This approach is pretty much exactly the same thing I tried in a discussion of PEP 505 a while back (None-aware operators). In the same vein as that—where I happen to dislike PEP 505 pretty strongly—one approach to simulate or avoid new syntax is precisely to use a wrapper class. As a footnote, I think my demonstration of PEP 505 got derailed by lots of comments along the lines of "Your current toy library gets the semantics of the proposed new syntax wrong in these edge cases." Those comments were true (and I think I didn't fix all the issues since my interest faded with the active thread)... but none of them were impossible to fix, just small errors I had made. With my *very toy* stringpy.Vector class, I'm just experimenting with usage ideas. I have shown a number of uses that I think could be useful to capture most or all of what folks want in "string vectorization." Most of what I've but in this list is what the little module does already, but some is just ideas for what it might do if I add the code (or someone else makes a PR at https://github.com/DavidMertz/stringpy). One of the principles I had in mind in my demonstration is that I want to wrap the original collection type (or keep it an iterator if it started as one). A number of other ideas here, whether for built-in syntax or different behaviors of a wrapper, effectively always reduce every sequence to a list under the hood. This makes my approach less intrusive to move things in and out of "vector mode." For example: v1 = Vector(set_of_strings) set_of_strings = v1.lower().apply(my_str_fun)._it # Get a set back v2 = Vector(list_of_strings) list_of_strings = v2.lower().apply(my_str_fun)._it # Get a list back v3 = Vector(deque_of_strings) deque_of_strings = v3.lower().apply(my_str_fun)._it # Get a deque back v4 = Vector(iter_of_strings) iter_of_strings = v4.lower().apply(my_str_fun)._it # stays lazy! So this is round-tripping through vector-land. Small note: I use the attribute `._it` to store the "sequential thing." That feels internal, so maybe some better way of spelling "get the wrapped thing" would be desirable. I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects. Nothing I wrote is actually string-specific. That is just the main use case stated. My `stringpy.Vector` might be misnamed in that it is happy to contain any kind of items. But we hope they are all items with the particular methods we want to vectorize. I showed an example where a list might contain a custom string-like object that happens to have methods like `.lower()` as an illustration. Inasmuch as I want to handle iterator here, it is impossible to do any type check upon creating a Vector. For concrete `collections.abc.Sequence` objects we could check, in principle. But I'd rather it be "we're all adults here" ... or at most provide some `check_type_uniformity()` function or method that had to be called explicitly.

On Thu, Feb 07, 2019 at 03:17:18PM -0500, David Mertz wrote:
I'm sorry, I did not see your comment that you thought new syntax was a bad idea. If I had, I would have responded directly to that. Why is it an overtly *bad* (i.e. harmful) idea? As opposed to merely not sufficiently useful, or unnecessary? You're certainly right that we can't easily experiment in the interpreter with new syntax, but we can perform thought-experiments and we don't need anything but a text editor for that. As far as I'm concerned, the thought experiment of comparing these two snippets: ((seq .* 2)..name)..upper() versus map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq))) demonstrates conclusively that even with the ugly double dot syntax, infix syntax easily and conclusively beats map. If I recall correctly, the three maps here were originally proposed by you as examples of why map() alone was sufficient and there was no benefit to the Julia syntax. I suggested composing them together as a single operation instead of considering them in isolation.
Therefore, something we CAN think about and experiment with today is a wrapper class.
Again, I apologise, I did not see where you said that this was intended as a proof-of-concept to experiment with the concept. [...]
If the Vector class is only a proof of concept, then we surely don't need to care about moving things in and out of "vector mode". We can take it as a given that "the real thing" will work that way: the syntax will be duck-typed and work with any iterable, and there will not be any actual wrapper class involved and consequently no need to move things in and out of the wrapper. I had taken note of this functionality of the class before, and that was one of the things which lead me to believe that you thought that a wrapper class was in and of itself a solution to the problem. If you had been proposing this Vector class as a viable working solution (or at least a first alpha version towards a viable solution) then worrying about round-tripping would be important. But as a proof-of-concept of the functionality, then: set( Vector(set_of_stuff) + spam ) list( Vector(list_of_stuff) + spam ) should be enough to play around with the concept. [...]
Why do you care about type uniformity or type-checking the contents of the iterable? Comments like this suggest to me that you haven't understood the idea as I have tried to explain it. I'm sorry that I have failed to explain it better. Julia is (if I understand correctly) statically typed, and that allows it to produce efficient machine code because it knows that it is iterating over (let's say) an array of 32-bit ints. While that might be important for the efficiency of the generated machine code, that's not important for the semantic meaning of the code. In Python, we duck-type and resolve operations at runtime. We don't typically validate types in advance: for x in sequence: if not isinstance(x, Spam): raise TypeError('not Spam') for x in sequence: process(x) (except under unusual circumstances). More to the point, when we write a for-loop: result = [] for a_string in seq: result.append(a_string.upper()) we don't expect that the interpreter will validate that the sequence contains nothing but strings in advance. So if I write this using Julia syntax: result = seq..upper() I shouldn't expect the iterpreter to check that seq contains nothing but strings either. -- Steven

On Thu, Feb 7, 2019 at 6:48 PM Steven D'Aprano <steve@pearwood.info> wrote:
I'm sorry, I did not see your comment that you thought new syntax was a bad idea. If I had, I would have responded directly to that.
Well... I don't think it's the worst idea ever. But in general adding more operators is something I am generally wary about. Plus there's the "grit on Uncle Timmy's screen" test. Actually, if I wanted an operator, I think that @ is more intuitive than extra dots. Vectorization isn't matrix multiplication, but they are sort of in the same ballpark, so the iconography is not ruined.
OK... now compare: (Vec(seq) * 2).name.upper() Or: vec_seq = Vector(seq) (vec_seq * 2).name.upper() # ... bunch more stuff seq = vec_seq.unwrap() I'm not saying the double dots are terrible, but they don't read *better* than wrapping (and optionally unwrapping) to me. If we were to take @ as "vectorize", it might be: (seq @* 2) @.name @.upper() I don't hate that. demonstrates conclusively that even with the ugly double dot syntax,
infix syntax easily and conclusively beats map.
Agreed.
Well... your maps are kinda deliberately ugly. Even in that direction, I'd write: map(lambda s: (s*2).name.upper(), seq) I don't *love* that, but it's a lot less monstrous than what you wrote. A comprehension probably even better: [(s*2).name.upper() for s in seq] Again, I apologise, I did not see where you said that this was intended
as a proof-of-concept to experiment with the concept.
All happy. Puppies and flowers.
Well... I at least moderately think that a wrapper class is BETTER than new syntax. So I'd like the proof-of-concept to be at least moderately functional. In any case, there is ZERO code needed to move in/out of "vector mode." The wrapped thing is simply an attribute of the object. When we call vectorized methods, it's just `getattr(type(item), attr)` to figure out the method in a duck-typed way. one of the things which lead me to believe that you thought that a
Yes, I consider the Vector class a first alpha version of a viable solution. I haven't seen anything that makes me prefer new syntax. I feel like a wrapper makes it more clear that we are "living in vector land" for a while. The same is true for NumPy, in my mind. Maybe it's just familiarity, but I LIKE the fact that I know that when my object is an ndarray, operations are going to be vectorized ones. Maybe 15 years ago different decisions could have been made, and some "vectorize this operation syntax" could have made the ndarray structure just a behavior of lists instead. But I think the separation is nice.
That's fine. But there's no harm in the class *remembering* what it wraps either. We might want to distinguish: set(Vector(some_collection) + spam) # Make it a set after the operations (Vector(some_collection) + spam).unwrap() # Recover whatever type it was before
Why do you care about type uniformity or type-checking the contents of the iterable?
Because some people have said "I want my vector to be specifically a *sequence of strings* not of other stuff" And MAYBE there is some optimization to be had if we know we'll never have a non-footype in the sequence (after all, NumPy is hella optimized). That's why the `stringpy` name that someone suggested. Maybe we'd bypass most of the Python-land calls when we did the vectorized operations, but ONLY if we assume type uniformity. But yes, I generally care about duck-typing only. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Feb 7, 2019 at 4:27 PM David Mertz <mertz@gnosis.cx> wrote:
well, vectorization is kinda the *opposite* of matrix multiplication -- matrix multiplication is treating the matrix as a whole, rther than applying multiplication to each element. And it is certainly the opposite in the numpy case. Which gives me an idea -- we could make an object that applied operators (and methods??) to each element individually, and use the @ operator when you wanted the method to act on the whole object instead. Note: I haven't thought about the details at all -- may not be practical to use an operator for that.
(Vec(seq) * 2).name.upper()
Or:
I'm not saying the double dots are terrible, but they don't read *better*
what type would .unwrap() return? One of the strengths of the "operator" approach is that is could apply to any (appropriately mutable) sequence and keep that sequence. I"m not sure how much that actually matters, as I'm expecting this is a 99% list case anyway. and why would .unwrap() be required at all -- as opposed to say: seq = list(vec_seq) than wrapping (and optionally unwrapping) to me. nor to me.
Well... your maps are kinda deliberately ugly.
That's actually pretty key -- in fact, if you wanted to apply a handful of operations to each item in a sequence, you would probably use a single expression (If possible) in a lambda in a map, or in a comprehension, rather than chaining the map. Even if it was more complex, you could write a function, and then apply that with a map or comprehension. In the numpy case, compare: c = sqrt(a**2 + b**2) to c = [sqrt(a**2 + b**2) for a,b in zip(a,b)] so still a single comprehension. But: 1) given the familiariy of math expressions -- the first really does read a LOT better 2) the first version can be better optimized (by numpy) So the questions becomes: * For other than math with numbers (which we have numpy for), are there use cases where we'd really get that much extra clarity? * Could we better optimize, say, a sequence of strings enough to make it all worth it? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Fri, Feb 8, 2019 at 3:17 PM Christopher Barker <pythonchb@gmail.com> wrote:
The idea—and the current toy implementation/alpha—has .unwrap return whatever type went into the Vector creation. Might be a tuple, list, set, deque, or it might be an iterator. It might even be some custom collection that isn't in the standard library. But you can also explicitly make a Vector into something else by using that constructor. Pretty much as I gave example before: set(Vector(a_list)) # Get a set Vector(a_list)).unwrap() # Get a list (without needing to know type to call .unwrap()) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Has anyone thought about my proposal yet? I think because it allows chained function calls to be stored, which is probably something that is a common; if imagine people turning the same series of chained functions into a lambda of its own once it’s used more than once in a program. Arguably, the lambda syntax is more readable and puts on less visual burden. Sent from my iPhone

Just a quick idea. Wouldn't an arrow operator -> be less of an eye sore? Em sex, 8 de fev de 2019 às 18:16, Christopher Barker <pythonchb@gmail.com> escreveu:
-- Marcos Eliziário Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario@gmail.com linked-in : https://www.linkedin.com/in/eliziario/

Christopher Barker writes:
well, vectorization is kinda the *opposite* of matrix multiplication -- matrix multiplication is treating the matrix as a whole,
When I think of treating the matrix as a whole, I think of linear algebra. Matrix multiplication is repeated application of the inner product, which is in turn a sum over vectorized multiplication. I share David's intuition about this, although it might not be the common one. Steve

The @ operator is meant for matrix multiplication (see PEP 465) and is already used for that in NumPy. IMHO just that is a good enough reason for not using @ as an elementwise application operator (ignoring if having an such an operator is a good idea in the first place). Ronald

On Sun, Feb 3, 2019 at 3:16 PM Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Co-opting operators is pretty common in Python. For example, the `.__div__()` operator spelled '/' is most often used for some kind of numeric division. Some variations on that, for example vectorized in NumPy. And different numeric types operate a bit differently. The name of the magic method obvious suggests division. And yet, in the standard library we have pathlib which we can use like this (from the module documentation):
p = Path('/etc')>>> q = p / 'init.d' / 'reboot'
That use is reasonable and iconic, even if it is nothing like division. The `.__mod__()` operator spelled '%' means something very different in relation to a float or int object versus a string object. I.e. modulo division versus string interpolation. I've even seen documentation of some library that coopts `.__matmul__()` to do something with email addresses. It's not a library I use, just something I once saw the documentation on, so I'm not certain of details. But you can imagine that e.g. : email = user @ domain Could be helpful and reasonable (exact behavior and purpose could vary, but it's "something about email" iconically). In other words, I'm not opposed to using the @ operator in my stringpy.Vector class out of purity about the meaning of operators. I just am not convinced that it actually adds anything that is not easier without it. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I know, but if an element-wise operator is useful it would also be useful for libraries like NumPy that already support the @ operator for matrix multiplication. Using @ both for matrix multiplication and element-wise application could be made to work, but would be very confusing. Ronald — Twitter: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

Ronald Oussoren via Python-ideas wrote:
The way @ is defined in numpy does actually work for both. E.g. v1 @ v2 where v1 and v2 are 3-dimensional arrays is equivalent to multiplying two 1D arrays of 2D matrices elementwise. Is this confusing? Maybe, but it's certainly useful. -- Greg

A bit of history: A fair amount of inspiration (or at least experience) for numpy came from MATLAB. MATLAB has essentially two complete sets of math operators: the regular version, and the dot version. A * B Means matrix multiplication, and A .* B Means elementwise multiplication. And there is a full set of matrix and elementwise operators. Back in the day, Numeric (numpy’s predecessor”) used the math operators for elementwise operations, and doing matrix math was unwieldy. There was a lit of discussion and a number of proosals for s full set of additional operators in python that could be used for matrix operations ( side note: there was (is) a numpy.matrix class that defines __mul__ as matrix multiplication). Someone at some point realized that we didn’t need a full set, because multiplication was really the only compelling use case. So the @ operator was added. End history. Numpy, or course, is but one third party package, but it is an important one — major inconsistency with it is a bad idea. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

@David Mertz <mertz@gnosis.cx> I think I can't explain well my ideas ^^. I'll try to be really detailed so I'm not sure I'm actually saying what I'm thinking. Let's consider the idea of that Vector class this way : Vectors are list of a defined type (may be immutable ?) and adds sugar syntaxing for vectorized operations. Based on this small and not complete enough definition, we should be able to apply any function to that vector. I identify two ways functions are used with vectors : it's either applied on the vector as an iterable/list, or on the elements of this vector. Thus, we need to be have different notations for those two uses. To keep it coherent with Python, if a functions is applied on the vector as an iterable, the vector is given as a parameter :
len(v) # Number of elements in the Vector `v`
If we want to apply a function on each element of the list, we should then use another notations. So far, several have been proposed. In the following example showing the different notations, we use the generic way so we can apply it to user-defined functions :
Another example with parameters
My personal opinion is that the two notations feel good. One is standard, the other is not but is less verbose and it's a good point. Now that I detailed everything in my brain and by mail, I guess we are just saying the same thing ! There's something I didn't mention on purpose, it's the use of : `v.lower()` I think having special cases of how vectors works is not a good idea : it's confusing. If we want the user to be able to use user-defined functions we need a notation. Having something different for some of the functions feels weird to me. And obviously, if the user can't use its own functions, this whole thing is pretty useless. Tell me if I got anything wrong. Nb : I found a way to simplify my previous example using lambda instead of partial. Le dim. 3 févr. 2019 à 21:34, David Mertz <mertz@gnosis.cx> a écrit :

len(v) # Number of elements in the Vector `v`
Agreed, this should definitely be the behavior. So how do we get a vector of lengths of each element?
Also possible is: v.len() We couldn't do that for every possible function, but this one is special inasmuch as we expect the items each to have a .__len__() but don't want to spell the dunders. Likewise for just a handful of other methods/functions. The key different though is that *I* would want to a way to use both methods already attached to the objects/items. in a vector and also a generic user-provided function that operates on the items. I guess you disagree about "method pass-through" but it reads more elegantly to me:
Compare these with: v.replace("a", "b") Since we already know v is a Vector, we kinda expect methods to be vectorized. This feels like the "least surprise" and also the least extra code. Moreover, spelling chained methods with many .appy() calls (even if spelled '@') feels very cumbersome: (A) v.apply(lambda s: s.replace("a", "b")).apply(str.upper).apply(lambda s: s.count("B")) (B) v @ lambda s: s.replace("a", "b") @ str.upper @ lambda s: s.count("B") (C) v.replace("a","b").upper().count("B") Between these, (C) feels a heck of a lot more intuitive and readable to me. Here we put an emphasis on the methods already attached to objects. But this isn't terrible: def double(x): return x*2 v.apply(double).replace("a","b").upper().count("B") In @ notation it would be: v @ double @ lambda s: s.replace("a", "b") @ str.upper @ lambda s: s.count("B") The 'double' is slightly easier, but the method calls are much worse. MOREOVER, the model of "everything is apply/@" falls down terribly once we have duck typing. This is a completely silly example, but it's one that apply/@ simply cannot address because it assumes it is the SAME function/method applied to each object:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2019-02-03 22:58, David Mertz wrote:
Do they need multiple uses of apply and @?
(A) v.apply(lambda s: s.replace("a", "b")).apply(str.upper).apply(lambda s: s.count("B"))
v.apply(lambda s: s.replace("a", "b").upper().count("B"))
(B) v @ lambda s: s.replace("a", "b") @ str.upper @ lambda s: s.count("B")
v @ lambda s: s.replace("a", "b").upper().count("B")
(C) v.replace("a","b").upper().count("B")
Between these, (C) feels a heck of a lot more intuitive and readable to me.
[snip]

I've lost track if who is advocating what, but:
# Replace all "a" by "b"
v.apply(lambda s: s.replace("a", "b"))
I do not get the point of this at all -- we already have map" map(v, lambda s s.replace()"a,", "b") these seem equally expressive an easy to me, and map doesn't require a custom class of anything new at all.
v.replace("a", "b")
This is adding something - maybe just compactness, but I also think readability. I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects. I think a vector strings could be useful and then it would be easier to decide which string methods should be applied to items vs the vector as a whole. If you want to do any generic items, it becomes a lot harder. I think numpy has had the success it has because it assumes all dytpes are numerical and thus support (mostly) the same operations. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Feb 4, 2019, 12:47 AM Christopher Barker
I've lost track if who is advocating what, but:
Well, I made a toy implementation of a Vector class. I'm not sure what that means I advocate other than the existence of a module on GitHub. FWIW, I called the repo 'stringpy' as a start, so that expressed some interest in it being about vectors of strings. But so-far, I haven't found anything that actually needs to be string-like. In general, methods get passed through to their underlying objects and deliberately duck typed, like: v.replace("a", "b")
As an extra, we could enforce homogeneity, or even string-nesss specifically. I don't really know what homogeneity means though, once we consider ABCs, subclasses, and duck types that don't use inheritance on r ABC registration. At least so far, I haven't coded anything that would get a performance gain from enforcing the string-nesss of items (but all pure Python so far, no Cython or C) This is adding something - maybe just compactness, but I also think
readability.
I think with changed methods the win gets greater: v.replace("a", "b").upper().apply(myfun) If you want to do any generic items, it becomes a lot harder.
So far, generic has been a lot easier to code than hand-rolled methods.

On Sun, Feb 03, 2019 at 09:46:44PM -0800, Christopher Barker wrote:
I've lost track if who is advocating what, but:
Ironically, I started this sub-thread in response to your complaint that you didn't like having to explicitly write loops/maps. So I pointed out that in Julia, people can use (almost) ordinary infix syntax using operators and function calls and have it apply automatically to each item in arrays. It wasn't a concrete proposal, just food for thought. Unfortunately the thinking seems to have missed the point of the Julia syntax and run off with the idea of a wrapper class. [...]
I do not get the point of this at all -- we already have map"
map(v, lambda s s.replace()"a,", "b")
The order of arguments is the other way around. And you did say you didn't like map. Wouldn't you rather write: items.replace("a", "b") rather than map(lambda s: s.replace("a", "b"), items) or [s.replace("a", "b") for s in items] I know I would. Provided of course we could distinguish between operations which apply to a single string, and those which apply to a generic collection of strings. Beside, while a single map or comprehension is bearable, more complex operations are horrible to read when written that way, but trivially easy to read when written in standard infix arithmetic notation. See my earlier posts for examples.
Indeed. In Julia that also offers opportunities for the compiler to optimize the code, bringing it to within 10% or so of a C loop. Maybe PyPy could get there as well, but CPython probably can't.
I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects.
Not me. -- Steven

On Sun, Feb 3, 2019, 6:36 PM Greg Ewing
What syntax would you like? Not necessarily new syntax per se, but what calling convention. I can think of a few useful cases. vec1.replace("PLACEHOLDER", vec2) Maybe that would transform one vector using the corresponding strings from another vector. What should happen if the vector length mismatch? I think this should probably be an exception... unlike what zip() and itertools.zip_longest() do. But maybe not. concat = vec1 + vec2 Again the vector length question is there. But assuming the same length, this seems like a reasonable way to get a new vector concatenating each corresponding element. Other uses? Are they different in general pattern?

On Sat, Feb 2, 2019 at 10:00 PM MRAB <python@mrabarnett.plus.com> wrote:
I like that! But I'm not sure if '.self' is misleading. I use an attribute called '._it' already that does exactly this. But since we're asking the length of the list or tuple or set or deque or etc that the Vector wraps, does it feel like it would be deceptive to call them all '.self'? I'm really not sure. I could just rename '._it' to '.self' and get the behavior you show (well, I still need a little checking whether the thing wrapped is a collection or an iterator ... I guess a '.self' property. Or some other name to do that). You remind me that I need to at .__getitem__() too so I can slice and index Vectors. But I know how to do that easily enough. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sat, Feb 02, 2019 at 03:22:12PM -0800, Christopher Barker wrote: [This bit was me]
So it is. I wondered what the cryptic '|S4' symbol meant, and I completely missed the '' quotes around the 1. Thanks for the correction. [...]
Indeed. This hypothetical syntax brings the readability advantages of infix operators to code that operates on iterables, without requiring every iterable to support arbitrary functions and methods. -- Steve

On Sat, Feb 2, 2019, 6:23 PM Christopher Barker
I'm warming up some. But is this imagined as vectors of strings, or as generically homogeneous objects? And what is homogeneity exactly in the face of duck typing? Absent the vector wrapping, I think I might write this for your example: map(lambda s: s..strip().lower().title(), a_list_of_strings) That's slightly longer, but just by the length of the word lambda. One could write a wrapper to vectorize pretty easily. So maybe: Vector(a_list_of_strings).strip().lower().title() This would just pass along the methods to the individual items, and wouldn't need to think about typing per se. Maybe other objects happen to have those three methods, so are string-like in a duck way.

On Fri, Feb 01, 2019 at 07:02:30PM +0300, Kirill Balunov wrote:
I didn't say anything about a vector type. "Vectorization" does not mean "vector type". Please read the link I posted, it talks about what Julia does and how it works. There are two relevant meanings for vectorization here: https://en.wikipedia.org/wiki/Vectorization - a style of computer programming where operations are applied to whole arrays instead of individual elements - a compiler optimization that transforms loops to vector operations Given that none of my examples involved writing loops by hand, I could only be talking about the first. The link I posted has examples which should be fairly clear even if you don't know Julia well.
Of course it makes sense. Even numpy supports inhomogeneous data: py> a = np.array([1, 'spam']) py> a array(['1', 'spam'], dtype='|S4') Inhomogeneous data may rule out some optimizations, but that hardly means that it "doesn't make sense" to use it. Again, if you read the link I posted, they make it clear that Julia can vectorize code which supports any type: "Our function f accepts any x type" I don't know Julia well enough to tell whether it supports inhomogeneous arrays. My experiments suggest that it forces all the elements to a single type. But that's not really the point: you can call the function f on an array of one type (say, Spam), then call it again on an array of another type (say, Eggs). So long as the function supports both Spam and Eggs types, it will just work, without having to re-write your array handling code.
The second one is just `map`. So I can't catch what you are proposing:
I'm not proposing anything, I'm drawing people's attention to something another language does to solve an annoyance that Chris has. If someone else likes that solution and wishes to make a concrete proposal for Python, we can consider it. Otherwise it is just food for thought. It may or may not lead anywhere.
1. To make an operator form of `map`. 2. To pull numpy into stdlib.
I cannot imagine how you got that conclusion from anything I said. I was talking about syntax for vectorization, and didn't mention numpy once. I didn't mention django or beautifulsoup either. I hope that you didn't conclude that I wanted to pull them into the stdlib too. -- Steven

On 2019-02-02 04:32, Steven D'Aprano wrote: [snip]
Of course it makes sense. Even numpy supports inhomogeneous data:
[snip] "inhomogeneous"? Who came up with that? <pendantic> "in-" is a negative prefix in Latin words, but "homogeneous" comes from Greek, where the negative prefix is "a-" (or "an-" before a vowel). I'd go with either "heterogeneous" or "non-homogeneous". </pedantic>

On Sat, Feb 02, 2019 at 02:06:56AM -0500, Alex Walters wrote:
"Television" as a word must annoy you :) I mentally replaced "inhomogeneous" with "heterogeneous"
They don't mean the same thing. https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomoge... -- Steven

On Sat, Feb 02, 2019 at 05:10:14AM +0000, MRAB wrote:
I don't know, but it has been used since at least the early 1920s https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomoge... and the Oxford dictionary describes "inhomogenity" as being used from the late 19th century. So my guess is, probably people who were more familiar with Latin and Greek than we are. There are many words that are derived from both Latin and Greek. There's no rule that says that because a word was derived from Greek, we must use Greek grammatical forms for it. We are speaking English, not Greek, and in English, we can negate words using the "in" prefix. -- Steven

сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano <steve@pearwood.info>:
I didn't say anything about a vector type.
I agree you did not say. But since you started a new thread from the one where the vector type was a little discussed, it seemed to me that it is appropriate to mention it here. Sorry about that.
Yes, numpy, at some degree, supports heterogeneous arrays. But not in the way you brought it. Your example just shows homogeneous array of type `'|S4'`. In the same way as `np.array([1, 1.234])` will be homogeneous. Of course you can say - np.array([1, 'spam'], dtype='object'), but in this case it will also be homogeneous array, but of type `object`.
Inhomogeneous data may rule out some optimizations, but that hardly means that it "doesn't make sense" to use it.
I did not say that it "doesn't make sense". I only said that you should be lucky to call `..method()` on collections of heterogeneous data. And therefore, usually this kind of operations imply that you are working with a "homogeneous data". Unfortunately, built-in containers cannot provide such a guarantee without self-checking. Therefore, in my opinion that at the moment such an operator is not needed. With kind regards, -gdg

@D’aprano I think you’re misleading by what I said, sorry for not being crystal clear. I just read the link on Julia (which I didn’t do) and I get what you mean now and it’s not quite different from what I said. I proposed introducing a new type : « vector » A few steps have been made in Python for typing and I think the next step is having typed collections. Keeping with nothing checked is better imo. So if we take this next step, we’ll get a vector type with *not-guaranteed* homogeneous data. Whether its type is « object » « int » or anything else doesn’t matter as long as it’s supposed to be the same. This doesn’t change anything in term of usage. Of course we should/could use map and usual operators on collections. What I was then proposing, to complete what you suggested and because I don’t like the dot notation, is using the matrix-multiplication the same way it is used in Julia with the dots. But I have a question. I never coded anything at C-level nor a compiler, is this possible for user defined types to make the vectorieation optimized the same way it’s done with numbers in numpy ? If yes, I think it would benefit the community. If no, it’s less likely, though it’s pursuing the steps made with typing On Sat 2 Feb 2019 at 10:23, Kirill Balunov <kirillbalunov@gmail.com> wrote:

On 2019-02-02 09:22, Kirill Balunov wrote:
Here's a question: when you use a subscript on a vector, does it apply to the vector itself, or its members? For example, given:
my_strings = Vector(['one', 'two', 'three'])
what is:
my_strings[1 : ]
? Is it: Vector(['ne', 'wo', 'hree']) or: Vector(['two', 'three']) ?

On 2019-02-02 17:31, Adrien Ricocotam wrote:
I personally would the first option to be the case. But then vectors shouldn't be list-like but more generator like.
OK, here's another one: if you use 'list(...)' on a vector, does it apply to the vector itself or its members?
list(my_strings)
You might be wanting to convert a vector into a list: ['one', 'two', 'three'] or convert each of its members onto lists: Vector([['one'], ['two'], ['three']]) present for that hardly

That's tough. I'd say conver the vector to a list. But : my_vector.list() Would apply list on each element of the vector. Globally, I'd say if the vector is used as an argument, it's a usual iterable, if you use a member function (or any other notation like @ or .. or whatever) it's like map. Note that it's just my opinion. Le sam. 2 févr. 2019 à 19:46, MRAB <python@mrabarnett.plus.com> a écrit :

On 02/02/2019 18:44, MRAB wrote:
More likely you mean:
[list(i) for i in ['one', 'two', 'three']] [['o', 'n', 'e'], ['t', 'w', 'o'], ['t', 'h', 'r', 'e', 'e']]
The problem, of course, is that list() now has to understand Vector specially, and so does any function you think of applying to it. Operators are easier (even those like [1:]) because Vector can make its own definition of each through (a finite set of) dunder methods. To make a Vector accept an arbitrarily-named method call like my_strings.upper() to mean:
[i.upper() for i in ['one', 'two', 'three']] ['ONE', 'TWO', 'THREE']
is perhaps just about possible by manipulating __getattribute__ to resolve names matching methods on the underlying type to a callable that loops over the content. Jeff

On 2019-02-02 12:31, David Mertz wrote:
I still haven't seen any examples that aren't already spelled 'map(fun, it)'
The problem with this is the same problem with having a function called "add" instead of an operator. There is little gain when you're applying ONE function, but if you're applying multiple functions you get a thicket of parentheses. I would rather see this: some_list @ str.lower @ tokenize @ remove_stopwords . . .than this: map(remove_stopwords, map(tokenize, map(str.lower, some_list))) That said, I don't necessarily think this needs to be added to the language. Things like pandas already provide this and so much more that it's unclear whether the gain from adding vectorization on its own would be worth it. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Sat, 2 Feb 2019, 21:46 Brendan Barnwell <brenbarn@brenbarn.net wrote: Yeah, it's called pip install funcoperators :
some_list @ str.lower @ tokenize @ remove_stopwords
→ some_list @ to(str.lower) @ to(tokenize) @ to(remove_stopwords) Where from funcoperators import postfix as to

On Sat, Feb 02, 2019 at 03:31:29PM -0500, David Mertz wrote:
I still haven't seen any examples that aren't already spelled 'map(fun, it)'
You might be right. But then there's nothing that map() can do that couldn't be written as a comprehension, and nothing that you can't do with a comprehension that can't be written as a for-loop. And nothing that can't be written as a for-loop that couldn't be written as a while-loop. The only loop construct we really need is a while loop. And even that is redundant if we had GOTO. Its not about the functionality, but expressibility and readability. This hypothetical vectorization syntax might have a performance advantage as well. My understanding is that Julia is able to efficiently vectorize code, bringing it to within 10% of the speed of unrolled C loops. It may be that CPython cannot do anything that fast, but there may be some opportunities for optimization that we cannot apply to for-loops or comprehensions due to the way they are defined. But primarily it is about the readability of the code: result = process.(vector .+ sequence) .* items versus: # Ouch! result = map(operator.mul, zip(map(process, map(operator.add, zip(vector, sequence)), items)) Here's the comprehension version: result = [a*b for a, b in zip( [process(c) for c in [d+e for d, e in zip(vector, sequence)]], items)] We can improve that comprehension a tiny bit by splitting it into multiple steps: temp1 = [d+e for d, e in zip(vector, sequence)] temp2 = [process(c) for x in temp1] result = [a*b for a, b in zip(temp2, items)] but none of these are as elegant or readable as the vectorized syntax result = process.(vector .+ sequence) .* items -- Steve

On 2019-02-02 18:11, Steven D'Aprano wrote:
The following reads a little better: | result = [ | process(v+s)*i | for v, s, i in zip(vector, sequence, items) | ] Vector operations will promote the use of data formats that work well with vector operations. So, I would expect data to appear like rows in a table, rather than in the columnar form shown above. Even if columnar form must be dealt with, we can extend our Vector class (or whatever abstraction you are using to enter vector space) to naturally zip() columns. | Vector(zip(vector, sequence, items)) | .map(lambda v, s, i: process(v+s)*i) If we let Vector represent a list of tuples instead of a list of values, we can make construction simpler: | Vector(vector, sequence, items) | .map(lambda v, s, i: process(v+s)*i) If we have zip() to extend the tuples in the Vector, then we can be verbose to demonstrate how to use columnar data: | Vector(vector) | .zip(sequence) | .map(operator.add) | .map(process) | .zip(items) | .map(operator.mul) This looks verbose, but it is not too far from the vectorized syntax: the Vector() brings us to vector mode, and the two zip()s convert from columnar form. This verbose form may be *better* than the vectorized syntax because the operations are in order, rather than the mixing infix and functional forms seen in the vectorized syntax form. I suggest this discussion include vector operations on (frozen) dicts/objects and (frozen) lists/tuples. Then we can have an interesting discussion about the meaning of group_by, join, and window functions, plus other operations we find in database query languages. I am interested in vector operations. I have situations where I want to perform some conceptually simple operations on a series of not-defined-by-me objects to make a series of conclusions. The calculations can be done succinctly in SQL, but Python makes them difficult. Right now, my solution is to describe the transformations in JSON, and have an interpreter do the processing: https://github.com/klahnakoski/SpotManager/blob/65f2c5743f3a9cfd1363cafec258...

On Sun, Feb 10, 2019 at 10:06 AM Kyle Lahnakoski <klahnakoski@mozilla.com> wrote:
I am interested in vector operations. I have situations where I want to
So I want to point out that is was proposed way back when for numpy: MATLAB for instance, has a the usual operators: *, +, etc meaning "matrix math", and then another set of "itemwise" operators with a "dot form: .*, .+ . - for "itemwise" math. numpy, on the other other had uses the regular operators for itemwise operations (what we're calling vectorized here), and Python lacked an extra set of operators that could be used for matrix math. Adding another full set (.*, .+, etc) was discussed A LOT and the Python community did not want that. Then someone had the brilliant observation that matrix multiplication was teh only one that was really useful and presto! the @ operator was born. Anyway -- just suggesting that a full set of "vectorized" operators will liley see a lot of resistance. And for my part, having mean the opposite of what it does for numpy would be unfortunate as well. perform some conceptually simple operations on a series of not-defined-by-me objects to make a series of conclusions. The calculations can be done succinctly in SQL, but Python makes them difficult. Bringing real world examples of this would be a good idea for this discussion I'm inclined to think that something like pandas (maybe more generally SQL -like that the number crunching focus of Pandas) might be better than new syntax for the language -- but only real examples will tell. I don't work with data like that much, but I"m pretty sure I've seen Python packages that to attempt to address these use cases. (that being querying and processing tabular data) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 2019-02-10 18:30, Steven D'Aprano wrote:
Can you post a simplified example of how you would do it in SQL, compared to what you would have to do in standard Python?
Can I do the same in standard Python? If I did, then I would use Pandas: it has groupby, and some primitive joining, and window functions may come naturally because of its imperative nature, but I have not tried it. If I can not use Pandas, then I would write the groupby and window functions and call them in sequence. This is similar to what you see in my code now: a number of properties who's values get dispatched to Python functions. My code is more complicated only because those structures can be dispatched to translators for databases too. I am certain there are many variations of groupby out in the wild, and it would be nice to have the concept standardized when/if Python has vector operations. Join would be nice to have too, but I do not use it much; dictionary lookup seems to fill that need. Window functions (which are like mini queries) are powerful, but like Pandas, may end up end up being free because Python is imperative. My code I pointed to has two parts. Here is the first part in SQL (well, an approximation of SQL since I did not test this, and now I am rusty). A detailed description is below | WITH time_range AS ( | SELECT | num | FROM | all_integers | WHERE | num % 60 =0 AND | num >= floor(<<now>>/60/60)*60*60-<<start_Of_history>> AND | num < floor(<<now>>/60/60) + 60*60 | ) | SELECT | availability_zone, | instance_type, | time_range.num AS time | MAX(price) as PRICE, | COUNT(1) AS `COUNT`, | LAST(current_price) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS current_price | FROM | ( | SELECT | *, | COALESCE(LAG(timestampvalue, 1), <<end_of_day>>) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS expire, | timestamp-<<expected_uptime>> AS effective | FROM | prices | ) temp | RIGHT JOIN | time_range ON time_range.num BETWEEN temp.effective AND temp.expire | GROUP BY | availability_zone, | instance_type, | time_range.num AS time | WHERE | expire > floor(<<now>>/60/60)*60*60 - <<start_Of_history>> Now, for the same, with description: This WITH clause is not real SQL; it is meant to stand in for a temporary table that contains all hours of the time range I am interested. Definitely easier to do in Python. All time is assumed to be in seconds since epoch. | WITH time_range AS ( | SELECT | num | FROM | all_integers | WHERE | num % 60 =0 AND | num >= floor(<<now>>/60/60)*60*60-<<start_of_history>> AND | num < floor(<<now>>/60/60) + 60*60 | ) We will select the three dimensions we are interested in (see GROUP BY below), along with the MAX price we have seen in the given hour, and the current_price for any (availability_zone, instance_type) pair. | SELECT | availability_zone, | instance_type, | time_range.num AS time | MAX(price) as PRICE, | COUNT(1) AS `COUNT`, | LAST(current_price) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS current_price | FROM The prices coming from Amazon only have a timestamp for when that price is effective; so this sub-query adds an `effective` start time, and an `expire` time so the rest of the query need only deal with ranges. The timestamp-<<expected_uptime>> is putting the start time back further into the past so the past can "see" future pricing. | ( | SELECT | *, | COALESCE(LAG(timestamp, 1), <<end_of_day>>) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS expire, | timestamp-<<expected_uptime>> AS effective | FROM | prices | ) temp This is the point where we use the time_range from above and find every hour a price is effective. This could have been a sub-query, but I am rusty at SQL | RIGHT JOIN | time_range ON time_range.num BETWEEN temp.effective AND temp.expire These are the three dimensions we are interested in | GROUP BY | availability_zone, | instance_type, | time_range.num AS time and we are only interested in calculating back to a certain point | WHERE | expire > floor(<<now>>/60/60)*60*60 - <<start_Of_history>>

Do take a look in the fairly recent archives of this list for a big discussion of groupby -- it kind of petered out but there were a couple options on the table. -CHB On Sun, Feb 10, 2019 at 9:23 PM Kyle Lahnakoski <klahnakoski@mozilla.com> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

CHB, Thank you! I had forgotten that discussion at the beginning of July [1]. Googling the list [2] also shows mention of PythonQL [3], which may point to use cases that can guide a Vectorization idea. [1] groupby discussion - https://mail.python.org/pipermail/python-ideas/2018-July/051786.html [2] google search - https://www.google.ca/search?q=group+by+site%3Ahttps%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-ideas%2F&oq=group+by+site%3Ahttps%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-ideas%2F [3] PythonQL - https://github.com/pythonql/pythonql On 2019-02-11 10:43, Christopher Barker wrote:

On Sat, Feb 02, 2019 at 07:58:34PM +0000, Jeff Allen wrote: [MRAB asked]
OK, here's another one: if you use 'list(...)' on a vector, does it apply to the vector itself or its members?
With the Julia vectorization operator, there is no puzzle there. list(vector) applies list to the vector itself. list.(vector) applies list to each component of vector.
The problem, of course, is that list() now has to understand Vector specially, and so does any function you think of applying to it.
*The whole point* of the Julia syntax is that no function has to understand any sequence. When we write: for item in vector: func(item) func only has to understand item, not vector. The same applies to the Julia syntax func.(vector) There's no puzzle here, no tricky cases, because it is completely deterministic and explicit: func(x) always calls func with x as argument, func.(x) always calls func with each of x's items as arguments.
With the Julia syntax, there is no need for vectors (or lists, or generators, or tuples, or sets, or any other iterator...) to accept arbitrary method calls. So long as vectors can be iterated over, func.(vector) will work.

Beyond possibly saving 3-5 characters, I continue not to see anything different from map in this discussion. list(vector) applies list to the vector itself.
list.(vector) applies list to each component of vector.
In Python: list(seq) applies list to the sequence itself map(list, seq) applies list to each component of seq In terms of other examples: map(str.upper, seq) uppercases each item map(operator.attrgetter('name'), seq) gets the name attribute of each item map(lambda a: a*2, seq) doubles each item (lambda a: a*2)(seq) doubles the sequence itself ... Last two might enjoy named function 'double'

On Sat, Feb 02, 2019 at 06:08:24PM -0500, David Mertz wrote:
Now compose those operations: ((seq .* 2)..name)..upper() versus # Gag me with a spoon! map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq))) The comprehension version isn't awful: [(a*2).name.upper() for a in seq] but not all vectorized operations can be written as a chain of calls on a single sequence. There are still some open issues that I don't have good answers for. Consider ``x .+ y``. In Julia, I think that the compiler has enough type information to distinguish between the array plus scalar and array plus array cases, but I don't think Python will have that. So possibly there will still be some runtime information needed to make this work. The dot arguably fails the "syntax should not look like grit on Tim's monitor" test (although attribute access already fails that test). I think the double-dot syntax looks like a typo, which is unfortunate. -- Steve

On Sun, Feb 3, 2019 at 10:31 AM Steven D'Aprano <steve@pearwood.info> wrote:
Agreed, so I would like to see a different spelling of it. Pike has an automap syntax that looks a lot like subscripting: numbers[*] * 2 Borrowing that syntax would pass the grit test, and it currently isn't valid syntax. ChrisA

On Sat, Feb 2, 2019 at 3:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
If they are strictly parallel (no dot products) and you know when writing the code which variables hold vectors, then (denoting the vector variables by v1, ..., vn) you can always write [(expr with x1, ..., xn substituted for v1, ..., vn) for x1, ..., xn in zip(v1, ..., vn)] which seems not much worse than the auto-vectorized version (with or without special syntax). Haskell (GHC) has parallel list comprehension syntax ( https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts...) so you don't have to explicitly call zip. I wouldn't mind having that in Python but I don't know what the syntax would be.

On 1/31/2019 12:51 PM, Chris Barker via Python-ideas wrote:
To me, thinking of strings as being in lists is Python 1 thinking. Interactive applications work with *streams* of input strings.
I think an iterator (stream) of strings would be better. Here is a start. class StringIt: """Iterator of strings. A StringIt wraps an iterator of strings to provide methods that apply the corresponding string method to each string in the iterator. StringIt methods do not enforce the positional-only restrictions of some string methods. The join method reverses the order of the arguments. Except for join(joiner), which returns a single string, the return values are iterators of the return value of the string methods. An iterator of strings is returned as a StringIt so that further methods can be applied. """ def __init__(self, objects, nogen=False): """Return a wrapped iterator of strings. Objects must be an iterator of strings or an iterable of objects with good __str__ methods. All builtin objects have a good __str__ methods and all non-buggy user-defined objects should. When *objects* is an iterator of strings, passing nogen=True avoids an layer of wrapping by claiming that str calls are not needed. StringIt methods that return a StringIt do this. An iterable of strings, such as ['a', 'b', 'c'], can be turned into an iterator with iter(iterable). Users who pass nogen=True do so at their own risk because checking the claim would empty the iterable. """ if not hasattr(objects, '__iter__'): raise ValueError('objects is not an iterable') if nogen and not hasattr(objects, '__next__'): raise ValueError('objects is not an iterator') if nogen: self.it = objects else: self.it = (str(ob) for ob in objects) def __iter__(self): return self.it.__iter__() def __next__(self): return self.it.__next__() def upper(self): return StringIt((s.upper() for s in self.it), nogen=True) def count(self, sub, start=0, end=None): return (s.count(sub, start, end or len(s)) for s in self.it) def join(self, joiner): return joiner.join(self.it) for si, out in ( (StringIt(iter(('a', 'b', 'c')), nogen=True), ['a', 'b', 'c']), (StringIt((1, 2, 3)), ['1', '2', '3']), (StringIt((1, 2, 3)).count('1'), [1, 0, 0]), (StringIt(('a', 'b', 'c')).upper(), ['A', 'B', 'C']), ): assert list(si) == out assert StringIt(('a', 'b', 'c')).upper().join('-') == 'A-B-C' # asserts all pass -- Terry Jan Reedy

I really don't get the "two different signatures" concern. The two functions do different things, why would we expect them to automatically share a signature. There are a zillion different open() functions or methods in the standard library, and far more in third party software. They each have various different signatures and functionality because they "open" different things. So what? Use the interface to the function you are using, not to something else that happens to share a name (in a different namespace). On Wed, Jan 30, 2019, 5:06 AM Jamesie Pic <jpic@yourlabs.org wrote:

I'm just saying assembling strings is
a common programing task and that we have two different methods with the same name and inconsistent signatures
No, we don’t — one is for assembling paths, one for generic strings. And in recent versions, there is a new totally different way to assemble paths. Also: the primary use cases are diffferent — when I use os.path.join(), I usually have the components in variables or literals, so the *args convention is most natural. When I am assembling text with str.join() I usually have the parts in an iterable, so that is the most natural. And besides, Python (necessarily) has some inconsistencies — we don’t need to correct them all. There have been multiple changes to str.join() discussed in this thread. Mostly orthogonal to each other. If anyone wants to move them forward, I suggest you be clear about which you are advocating for. 1) that there be a join() method on lusts ( or sequences) — frankly, I think that’s a non-starter, I wouldn’t waste any more time on it. 2) that str.join() take multiple positional arguments to join (similar to os.path.join) — This could probably be added without much disruption, so if you really want it, make your case. I personally don’t think it’s worth it — it would make the API more confusing, with little gain. 3) that str.join() (or some new method/function) “stringify” (probably by calling str() ) the items so that non strings could be joined in one call — we’ve had a fair bit of discussion on this one, and given Python’s strong typing and the many ways one might want to convert an arbitrary type to a string, this seems like a bad idea. Particularly bad to add to str.join() (Or was “stringify” supposed to only do the string conversion, not the joining? If so, even more pointless) Any others? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Thank you Christopher for the heads up. Using paths as an example were really poor, they distract readers from the actual problem that is assembling a human readable string. Maybe pointless, but on the basis of 30 seconds to run my test suite, see my failure, correct it, and run it again to be at the same point as if I didn't mistake, 300 workdays a year, spans 25 hours over 10 years, and I have already strategized my R&D to capitalize on python for another 10 years. So spending less than 25 hours on this would seem profitable, despite how pointless it is to actual programmers. Anyway, at this point the proposal could also look like str.joinmap(*args, key=str). But I don't know, I can iterate on mapjoin for a while and open a new topic when I stabilize it. Meanwhile, I'm always happy to read y'all so feel free to keep posting :P Have a great day

On 1/28/2019 8:40 PM, Jamesie Pic wrote:
0. os.path.join takes *args
Since at least 1 path is required, the signature is join(path, *paths). I presume that this is the Python version of the Unix version of the system call that it wraps.. The hidden argument is os.sep. It is equivalent to os.sep.join((path,)+paths) (though one would not write it this way).
1. str.join takes a list argument,
This premise behind the (repeated) request is a false. str.joins arguments are a string (the joiner) and an *iterable of strings*, which is an abstract subclass of the abstract concept 'iterable'. And only a small fraction of lists are lists of strings and therefore iterables of strings. -- Terry Jan Reedy

On 2019-01-29 23:30, Terry Reedy wrote:
One the examples given was writing:
'/'.join('some', 'path')
To me, this suggests that what the OP _really_ wants is for str.join to accept multiple arguments, much as os.path.join does. I thought that there would be a problem with that because currently the single argument is an iterable, and you wouldn't want to iterate the first argument of '/'.join('some', 'path'). However, both min and max will accept either a single argument that's iterated over or multiple arguments that are not, so there's a precedent there.

On 1/29/2019 7:12 PM, MRAB wrote:
I have done things like this in private code, but it makes for messy signatures. The doc pretends that min has two signatures, given in the docstring: min(iterable, *[, default=obj, key=func]) -> value min(arg1, arg2, *args, *[, key=func]) -> value I believe that the actual signature is the uninformative min(*args, **kwargs). The arg form, without key, is the original. If min were being written today, I don't think it would be included. -- Terry Jan Reedy

PS: sorry for my silly example, i know that example could also be written f'cancel_{name}', which is awesome, thank you for that ! But for more complex strings I'm trying to avoid: def foo(): return textwrap.dedent(f''' some {more(complex)} {st.ri("ng")} ''').strip() For some reason, I prefer: def foo(): return '\n'.join(['some', more(complex), st.ri('ng')]) But that would be even more readable (less nesting of statements): def foo(): return ['some', more(complex), st.ri('ng')].join('\n') Hope this makes sense Have a great day

On Mon, Jan 28, 2019 at 8:44 PM Jamesie Pic <jpic@yourlabs.org> wrote:
['cancel', name].join('_')
This is a frequent suggestion. It is also one that makes no sense whatsoever if you think about Python's semantics. What would you expect to happen with this line: ['foo', b'foo', 37, re.compile('foo')].join('_') List are not restricted to containing only strings (or things that are string-like enough that they might play well with joining). Growing a method that pertains only to that specialized sort of list breaks the mental model of Python. Moreover, there is no way to TELL if a particular list is a "list of strings" other than checking each item inside it (unlike in many languages). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2019-01-28 18:22, David Mertz wrote:
That problem already exists with str.join though. It's just currently spelled this way: ','.join(['foo', b'foo', 37, re.compile('foo')]) . . . and the result is an error. I don't see how it's semantically any less sensible to call list.join on a list of non-string things than it is to pass a list of non-string things to str.join. Personally what I find is perverse is that .join is a method of strings but does NOT call str() on the items to be joined. The cases where I would have been surprised or bitten by something accidentally being converted to a string are massively outweighed by the cases where I want everything to be converted into a string, because, dangit, I'm joining them into a bigger string. I agree that a list method would be nice, but we then have to think about should we add similar methods to all iterable types, since str.join can take any iterable (not just a list). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

If there is a more Pythonic way of joining lists, tuples, sets, etc., it is by using a keyword and not a method. For example, using a keyword, say *joins*: '-' joins ['list', 'of', 'strings']
This is more readable than using the method join() since you can read this as "dash joins a list of strings". Although, the current method of joining lists is almost similar to this, the current method is somewhat "confusing" for beginners or for people who came from other languages. BTW, this is just what comes in my mind and not supported by Python. On Tue, Jan 29, 2019 at 1:22 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:

One could always write str.join('_', ['list', 'of', 'strings']) I'm not advocating for this syntax, but perhaps it is clarifying. Also, a quick search finds this thread from 20 years ago on this very issue: https://mail.python.org/pipermail/python-dev/1999-June/095366.html On Mon, Jan 28, 2019 at 9:37 PM Ronie Martinez <ronmarti18@gmail.com> wrote:

On Tue, Jan 29, 2019, 12:22 AM Brendan Barnwell
This feels like an important asymmetry to me. There is a difference between to object itself being the wrong kind of thing and the arguments to a method being wrong. In the first case, the object (a heterogenous list) can NEVER support a .join() method. It's simply the wrong kind of object. Of course, it's right as far as the basic type system goes, but its deeper (maybe "structural") type cannot support that method. On the other hand, sure, almost any function, including methods, will choke on bad arguments. But no string *object* rules out joining if good arguments can be found. I am sure readers will immediately reply, "what about list.sort()?" Unfortunately, that really will simply fail on lists of the wrong "type." After all these years, I still think that change in Python 2.3 or so was the wrong choice (for those with fewer gray hairs: when the hills were young, Python objects were arbitrarily comparable under inequality, even when the answer didn't "mean" anything). I actually agree that a 'cast_to_string_and_join()' function sounds useful. Of course, you can write one easily enough, it doesn't need to be a method. For that matter, I think I'd probably rather that str.join() was simply a function in the string module or somewhere similar, with a signature like 'join(delim, iter_of_strings)'

On Tue, Jan 29, 2019 at 4:48 PM David Mertz <mertz@gnosis.cx> wrote:
Considering that you can provide a key function to sort(), there is by definition no list of objects which utterly cannot be sorted. That said, though, I don't think this is an overly strong argument. The main reason lists don't have a join method is that str.join() can take *any iterable*, so it's perfectly legal to join tuples or generators without needing to listify them. Consider: # Join the parts, ignoring empty ones "_".join(filter(None, parts)) c = collections.Counter(...) "_".join(item for item, count in c.most_common()) # solving Brendan's complaint of perversity "_".join(map(str, stuff)) If these were flipped around, you'd have to explicitly call list() on them just to get a join method. BTW, Ronie: I would disagree. Python uses syntactic elements only where functions are incapable of providing equivalent functionality. That's why print became a function in 3.0 - it didn't need to be magical syntax any more. ChrisA

Yeah, that's a good reason to use .format when you have a fixed number of arguments. "{}, {}, {}, {}".format(some, random, stuff, here) And then there is map. Otherwise .join is very common on iterables like '\n'.join(make_string(object) for object in something) '\n'.join(map(make_string, something)) '\n'.join(map(str, nonstr)) '\n'.join('{}: {}'.format(x, y) for x,y in blabla) '\n'.join(map('[{}]'.format, stuff)) A "join format" construct is very typical in codes producing strings from iterable. I agree on the part "a list doesn't always contain string so why would it have a join method".

Thanks for your feedback ! So, do you think anything can be done to make when assembling strings less confusing / fix the inconsistency between the syntax of of os.path.join and str.join ? Have a great day

Thanks for the advice Jonathan, can you clarify the documentation topic you think should be improved or created ? "Assembling strings" or "inconsistencies between os.path.join and str.join" ? I've written an article to summarize but I don't want to publish it because my blog serves my lobbying for python and not against it. Also I don't feel confident about it because I never had the luck to work closely with core-devs or other people with a lot more experience than me like I can so easily find on internet (thank you all, I love you !). So, I deliver it here under WTFPL license. The mistake I'm still doing after 10 years of Python I love Python really, but there's a mistake I've been doing over and over again while assembling strings of all sorts in Python and that I have unconsciously ignored until now. Love it or hate it, but when you start with python it's hard to be completely indifferent to: '\n'.join(['some', 'thing']) But then you read the kilometers of justifications that the python devs have already had for the past 20 years about it and, well, grow indifference about it "that's the way it's gonna be if I want to use python". But recently, I started to tackle one of the dissatisfaction I have with my own code: I think how I assemble strings doesn't make me feel great compared to the rest of what I'm doing with Python. However, it strikes me that assembling strings in python is something I do many times a day, for 10 years, so, taking some time to question my own doing could prove helpful on the long run. The little story of a little obsession... ## `os.path.join(*args)` vs. `str.join(arg)` I'm living a dream with os.path.join: >>> os.path.join('some', 'path') 'some/path' But then I decide that cross platform is going to be to much work so why not join with slashes directly and only support free operating systems: >>> '/'.join('some', 'path') TypeError: join() takes exactly one argument (2 given) "Well ! I forgot about this for a minute, let's "fix" it and move on": >>> '/'.join(['some', 'path']) 'some/path' Ohhh, I'm not really sure in this case, isn't my code going to look more readable with the os.path.join notation after all ? Ten years later, I still make the same mistake, because 2 seconds before doing a str join I was doing a path join. The fix is easy because the error message is clear, so it's easier to ignore the inconsistency and just fix it and move on. But, what if, this was an elephant in the room that it was so easy to look away from ? ## Long f-strings vs. join The new python format syntax with f-strings is pretty awesome, let's see how we can assemble a triple quoted f-string: foo = f''' some {more(complex)} {st.ri("ng")} '''.strip() Pretty cool right ? In a function it would look like this: def foo(): return f''' some {more(complex)} {st.ri("ng")} ''').strip() Ok so that would also work but we're going to have to import a module from the standard library to restore visual indentation on that code: import textwrap def foo(): return textwrap.dedent(f''' some {more(complex)} {st.ri("ng")} ''').strip() Let's compare this to the join notation: def foo(): return '\n'.join('some', more(complex), st.ri('ng')) Needless to say, I prefer the join notation for this use case. Not only does it fit in a single line but it doesn't require to dedent the text with an imported function, nor does it require to juggle with quotes, but also it sorts of look like it would be more performant. All in all, I prefer the join notation to assemble longer strings. Note that in practice, using f-strings for the "pieces" that I want to assemble and that works great: def foo(): return '\n'.join('some', more(complex), f'_{other}_') Anyway, ok good-enough looking code ! Let's see what you have to say: TypeError: join() takes exactly one argument (2 given) Oh, that again, kk gotfix: def foo(): return '\n'.join(['some', more(complex), f'_{other}_']) I should take metrics about the number of times were I make this mistake during a day, cause it looks like it would be a lot (i switch between os.path.join to str.join a lot). ## The 20-yr old jurisprudence So, what looks more ergonomic between those two syntax: [ 'some', more(complex), f'_{other}_' ].join('\n') '\n'.join([ 'some', more(complex), f'_{other}_' ]) It seems there is a lot of friction when proposing to add a convenience join method to the list method. I won't go over the reasons for this here, there's already a lot to read about it on internet, that's been written during the last 20 years. ## Conclusion I have absolutely no idea what should be done about this, the purpose of this article was just to share a bit of one of my obsessions with string assembling. Maybe it strikes me assembling strings multiple times a day with a language I've got 10 years of full-time experience and still repeating the same mistakes. Not because I don't understand the jurisprudence, not because I don't understand the documentation, or because the documentation is wrong, but probably just because i switch from os.path.join and str.join which take different syntax, i think. Perhaps the most relevant proposal here would be to extend str.join signature, which currently supports this notation: str.join(iterable) To support also this notation: str.join(arg1, ...argN) So at least, people won't be doing mistakes when switching over from os.path.join and str.join. Perhaps, something else ? Have a great day

A couple notes: On Tue, Jan 29, 2019 at 5:31 AM Jamesie Pic <jpic@yourlabs.org> wrote:
can you clarify the documentation topic you think should be improved or created ? "Assembling strings"
I would think "assembling strings", though there is a lot out there already.
or "inconsistencies between os.path.join and str.join" ?
well, if we're talking about moving forward, then the Path object is probably the "right" way to join paths anyway :-) a_path / "a_dir" / "a_filename" But to the core language issue -- I started using Python with 1.5.* and back then join() was in the string module (and is there in 2.7 still) And yes, I did expect it to be a list method... Then it was added as a method of the string object. And I thought THAT was odd -- be really appreciated that I didn't need to import a module to do something fundamental. But the fact is, that joining strings is fundamentally a string operation, so it makes sense for it to be there. In earlier py2, I would have thought, maybe it should be a list method -- it's pretty darn common to join lists of strings, yes? But what about tuples? Python was kind of all about sequences -- so maybe all sequences could have that method -- i.e part of the sequence ABC. But with > py3k, Python is more about iterables than sequences -- and join (and many other methods and functions) operate on any iterable -- and this is a really good thing. So add join to ALL iterables? That makes little sense, and really isn't possible -- an iterable is something that conforms to the iterator protocol -- it's not a type, or even an ABC. So in the end, join really does only make sense as string method. Or Maybe as a built in -- but we really don't need any more of those. If you want to argue that str.join() should take multiple arguments, like os.path.join does, then, well we could do that -- it currently takes one and only one argument, so it could be extended to join multiple arguments -- but I have hardly ever seem a use case for that. The mistake I'm still doing after 10 years of Python
hmm -- I've seen a lot of newbies struggle with this, but haven't had an issue with it for years myself.
>>> '/'.join('some', 'path') TypeError: join() takes exactly one argument (2 given)
pathlib aside, that really isn't the right way to join paths ..... os.path.jon exists for a (good) reasons. One of which is this: In [22]: os.path.join("this/", "that") Out[22]: 'this/that' -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

From here, by all means repackage for your own convenience in your own
I've not been following closely, so please forgive me if I'm repeating something already said in this thread. Summary: str.join allows us to easily avoid, when assembling strings, 1. Quadratic running time. 2. Redundant trailing comma syntax error. The inbuilt help(str.join) gives: S.join(iterable) -> str Return a string which is the concatenation of the strings in the iterable. The separator between elements is S. This is different from sum in two ways. The first is the separator S. The second is performance related. Consider s = 0 for i in range(100): s += 1 and s = '' for i in range(100): s += 'a' The first has linear running time (in the parameter represented by 100). The second has quadratic running time (unless string addition is doing something clever, like being lazy in evaluation). The separator S is important. In Python a redundant trailing comma, like so, val = [0, 1, 2, 3,] is both allowed and useful. (For example, when the entries are each on a simple line, it reduces the noise that arises when an entry is added at the end. And when the entries are reordered.) For some languages, the redundant trailing comma is a syntax error. To serialise data for such languages, you can do this: >>> '[{}]'.format(', '.join(map(str, v))) '[0, 1, 2, 3]' library, or use a third party library that already has what you want. (A widely used pypi package has, I think, a head start for adoption into the standard library.) By the way, as search for "python strtools" gives me https://pypi.org/project/extratools/ https://www.chuancong.site/extratools/functions/strtools/ https://pypi.org/project/str-tools/. # This seems to be an empty stub. -- Jonathan

On Tue, Jan 29, 2019 at 09:21:48PM +0000, Jonathan Fine wrote:
The lack of a syntax error for trailing commas is a language-wide feature that has nothing to do with str.join.
Three ways. sum() intentionally doesn't support strings at all: py> sum(['a', 'b', 'c'], '') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sum() can't sum strings [use ''.join(seq) instead] unless you cleverly defeat this intentional limitation. (How to do this is left as an exercise for the reader.)
In CPython, string addition does often do something clever. But not by being lazy -- it optimizes the string concatenation by appending to the strings in place if and only if it is safe to do so. -- Steve

Thank you Jonathan, performance is one of the various reasons I prefer join to assembles strings, than, say, triple-quote dedent'ed f-strings or concatenation. It also plays well syntaxically, even though there is still a little room for improvement. For example, in PHP implode('-', array(2, 'a')) returns '2-a', and now that I think of it, it's the only thing i regret from php's stdlib... And assembling a string like that really looks like a common problem programmers face every day of their journey... The chuacong.site design for extratools documentation is really beautiful ! I found the smartplit function but no smartjoin. On my side I have requested comments on a PR in the boltons repo already, let's see if they find refutation before proposing a smartjoin implementation to extratools. https://github.com/mahmoud/boltons/pull/197 Would you recommend to release it on its own ? Ie. from implode import implode ? Thanks

On Tue, Jan 29, 2019 at 9:50 PM Chris Barker via Python-ideas <python-ideas@python.org> wrote:
I would think "assembling strings", though there is a lot out there already.
Which one do you prefer ?
So in the end, join really does only make sense as string method.
What do you think of list.stringify(delim) ? Thanks for your reply, I recon using paths does make the article more confusing, it was meant as an example to illustrate common problems that a programmer caring about user experience are like. It makes the article look like the point was to build crossplatform paths, and distracts the user from the whole purpose of assembling a string with code. Have a great day ;)

On Tue, Jan 29, 2019 at 10:51:26PM +0100, Jamesie Pic wrote:
What do you think of list.stringify(delim) ?
What's so special about lists? What do you think of: tuple.stringify deque.stringify iterator.stringify dict.keys.stringify etc. And what's so special about strings that lists have to support a stringify method and not every other type? list.floatify list.intify list.tuplify list.setify list.iteratorify Programming languages should be more about composable, re-usable general purpose components more than special cases. -- Steve

1) I'm in favor of adding a stringify method to all collections 2) strings are special and worthy of a "special case" because strings tend to be human readable and are used in all kinds of user interface. -------- Original Message -------- On Jan 29, 2019, 16:04, Steven D'Aprano wrote:

"Not every five line function needs to be in the standard library" ... even more true for every one line function. I can think of a few dozen variations of similar but not quite identical behavior to my little stringify() that "could be useful." Python gives us easy composition to create each of them. It's not PHP, after all. On Tue, Jan 29, 2019 at 8:52 PM Alex Shafer <ashafer@pm.me> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Frankly this sounds like resistance to adaptation and evolution. How long ago was that adage written? Or perhaps this is a pathological instance of the snowball fallacy? Adding one widely requested feature does not imply that all requested features will be added. -------- Original Message -------- On Jan 29, 2019, 18:57, David Mertz wrote:

Of course not! The request was for something that worked on Python *collections*. If the OP wanted something that worked on iterables in general, we'd need a different function with different behavior. Of course, it also doesn't work on dictionaries. I don't really have any ideas what the desired behavior might be for dicts. Various things are conceivable, none obvious. But it's fine on lists, sets, tuples, deques, and some other things that are roughly sequence-like. On Tue, Jan 29, 2019, 10:38 PM Robert Vanden Eynde <robertve92@gmail.com wrote:

The point really is that something called 'stringify()' could do a lot of different reasonable and useful things. None of them are universally what users would want. Unless you have to function scads if optional keyword arguments, is behavior would surprise many users and not for their purpose. On Tue, Jan 29, 2019, 10:46 PM David Mertz <mertz@gnosis.cx wrote:

I love it when the discussion goes fast like here! :D The messages are short or long-structured-and-explaining, I love it :) -- Sorry if I may look like a troll sometimes, I truly like the conversation and I want to share the excitement :)

On Wed, Jan 30, 2019 at 2:45 AM David Mertz <mertz@gnosis.cx> wrote:
Done! Does that really need to be in the STDLIB?
Well, Robert suggested to define it in the python startup script. The issue I'm having with that is that it will make my software harder to distribute: it will require the user to hack their startup script, or even worse : do it ourself in setup.py ! Jonathan suggested to add it to an external package like strtools that has a smartsplit() function, but not smartjoin(). So far I have a PR in boltons, I've requested their comments, so, I'll let you know if they have a refutation to provide. Otherwise, I will try to submit it to the strtools package. Otherwise, I can make a custom package for that one-liner, like it's fairly common to do in NPM packages. Do you have any suggestions on the API ? I see that the implode name is available on PyPi, do you think this would be nice to import the one-liner ? from implode import implode Thanks for your reply -- ∞

To be fair, we could add an implementation to the sequence ABC, and get pretty far. Not that I’m suggesting that — as I said earlier, Python is all about iterables, not sequences, anyway. Also, despite some folks’ instance that this “stringify’ method is something many folks want -.I’m not even sure what it is. I was thinking it was: def stringify(self, sep): return sep.join(str(i) for i in self) Which, by the way would work for any iterable :-) If you want a language designed specifically for text processing, use Perl. Python is deliberately strongly typed, so that: 2 + “2” Raises an error. Why should: “”.join([2, “2”]) not raise an error as well? And aside from repr or ascii, when I turn numbers into text, I usually want to control the formatting anyway: “ “.join(f”{n:.2f}” for n in seq) So having str() called automatically for a join wouldn’t be that useful. -CHB

def stringify(self, sep): return sep.join(str(i) for i in self)
= map(sep.join(map(str, self)) However some folks want: def stringify(*args, *, sep:str=SomeDefault): return sep.join(map(str, args)) In order to have:
stringify(1, 2, "3", sep="-") 1-2-3
And I agree about the formatting, we know that str(x) and format(x) are synonyms so I'd suggest: def stringify(*args, *, sep:str=SomeDefault, fmt=''): return sep.join(format(x, fmt) for x in args) And the implicit call to str is really not surprising for a function called stringify IMO If you want a language designed specifically for text processing, use Perl.
True ! However typing python -cp "1+1" is really tempting...
Python is deliberately strongly typed, so that:
I agree

def stringify(*args, *, sep:str=SomeDefault):
I meant def stringify(*args, sep:str=SomeDefault) So an idea would use duck typing to find out if we have 1 iterable or a multiple stuff : def stringify(*args, sep:str=SomeDefault, fmt=''): it = args[0] if len(args) == 1 and hasattr(args[0], '__iter__') else args return sep.join(format(x, fmt) for x in it) But 🦆 duck typing is nasty... I don't want that in the stdlib (but in a pip package, sure!)

On Wed, Jan 30, 2019 at 7:14 AM Robert Vanden Eynde <robertve92@gmail.com> wrote:
But 🦆 duck typing is nasty... I don't want that in the stdlib (but in a pip package, sure!)
Not only do I actually like your implementation, but I also love duck typing. For me duck typing means freedom, not barrier. -- ∞

On Wed, Jan 30, 2019 at 7:03 AM Robert Vanden Eynde <robertve92@gmail.com> wrote:
What do you think could be the developer intent when they do ",".join([2, "2']) ? If the intent is clearly to assemble a string, as it looks like, then I don't find any disadvantage to automate this task for them. -- ∞

On 1/30/2019 5:07 AM, Jamesie Pic wrote:
Your examples show literals, but I literally (heh) never use str.join this way. I always pass it some variable. And 100% of the time, if that variable (say it's a list) contains something that's not a string, I want it to raise an exception. I do not want this to succeed: lst = ['hello', None] ', '.join(lst) lst is usually computed a long way from where the join happens. So, I do not want this task automated for me. Eric

On Wed, Jan 30, 2019 at 11:24 AM Eric V. Smith <eric@trueblade.com> wrote:
That's a really good point ! So, maybe we have a parameter for that ... from implode import implode assert implode('-', [3, None, 2], none_str='') == '3-2' Even that still seems pretty fuzzy to me, please, can you share an idea for improvement ? -- ∞

On Wed, Jan 30, 2019 at 11:07:52AM +0100, Jamesie Pic wrote:
What do you think could be the developer intent when they do ",".join([2, "2']) ?
I don't know what your intent was, although I can guess, but I do know that I sure as hell don't want a dumb piece of software like the interpreter running code that I didn't write because it tried to guess what I possibly may have meant. http://www.catb.org/jargon/html/D/DWIM.html And from the Zen: Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. Don't think about toy examples where you put literals in the code. Sure, we want a string, but that's trivial. What sort of string and what should it look like? Think about non-trivial code like this: header = generate_header() body = template.format(','.join(strings)) document = make(header, body) and imagine that somehow, a non-string slips into something which is supposed to be a string. Now what do you think my intent is? It isn't enough to just say "I want a string dammit, and I don't care what's in it!". If a non-string slips in there, I sure as hell want to know how and why, because that's a bug, not a minor inconvenience. The most junior developer in the team could easily paper over the bug by adding in a call to map(str, strings) but that doesn't fx the bug, it just hides it and all but guarantees the document generated is corrupt, or worse, wrong. "I find it amusing when novice programmers believe their main job is preventing programs from crashing. ... More experienced programmers realize that correct code is great, code that crashes could use improvement, but incorrect code that doesn’t crash is a horrible nightmare." -- Chris Smith If we look at where the strings come from: strings = [format_record(obj) for obj in datasource if condition(obj)] we're now two steps away from the simplistic "we want a string" guess of your example. When we look at format_record and find this: def format_record(record): if record.count < 2: ... elif record.type in ('spam', 'eggs'): ... elif record.next() is None: ... # and so on for half a page we're just getting further and further away from the trivial cases of "just give me a string dammit!". Going back to your example (correcting the syntax error): ",".join([2, "2"]) To save you about a quarter of a second by avoiding having to type quote marks around the first item, you would cost me potentially hours or days of hair-tearing debugging trying to work out why the document I'm generating is occasionally invalid or corrupt in hard to find ways. That's not a trade off I have any interest in making. -- Steve

Wow, thanks for your great reply Steven ! It really helps me get a better understanding of what I'm trying to do and move forward in my research ! Some values are not going to be nice as strings, so I think I'm more going to try to make a convenience shortcut for str map join, for when I want to generate a human readable string. Ie.: mapjoin(*args, sep='\n', key=str). Then I could replace: readable = '\n'.join(map(str, [ 'hello', f'__{name}__', etc... ])) OR def foo(): readable = textwrap.dedent(f''' hello __{name}__ ''').strip() With: readable = mapjoin( 'hello', f'__{name}__' sep='\n', # map=format_record could be used ) That removes the "fuzzy" feeling I get from my previous proposals. So, after a while if people are using that mapjoin that we could have on PyPi, we could perhaps consider it to improve str.join. Or, do you think adding such features to str.join is still discussable ?

Oops, fixing my last example: readable = mapjoin( 'hello', f'__{name}__', sep='\n', # key=format_record, could be used here ) Signature would be like (illustrating defaults): mapjoin(*args, sep='\n', key=str)

The intent is not clear. How is the 2 to be formatted? I fixed a nasty bug recently where a join of a list of strings contained a non-string in some cases. If the str(bad_value) had been the default I would not have been able to track this down from the traceback in a few minutes. I'm -1 on this idea as it would hide bugs in my experience. Barry

Thanks for your email Barry. This is indeed a good point and the proposal has changed a bit since then. It's more "add a key kwarg to str.join where you can set key=str yourself if you want".

Let's see if this gets any download at all: https://pypi.org/project/mapjoin/ Sorry for this obscenity xD Thank you all for your replies ! Have a great day Best regards

On Wed, Jan 30, 2019 at 12:09:55AM +0000, Alex Shafer wrote:
2) strings are special and worthy of a "special case" because strings tend to be human readable and are used in all kinds of user interface.
So are ints, floats, bools, lists, tuples, sets, dicts, etc. We already have a "stringify" function that applies to one object at a time. It's spelled str(), or if you prefer a slightly different format, repr(). To apply the stringify function of your choice to more than one object, you can use a for-loop, or a list comprehension, or a set comprehension, or map(). This is called composition of re-usable components, and it is a Good Thing. If you don't like the two built-in stringify functions, you can write your own, and they still work with for-loops, comprehensions and map(). Best of all, we're not even limited to strings. Change your mind and want floats instead of strings? Because these are re-usable, composable components, you don't have to wait for Python 4.3 to get a list floatify() method, you can just unplug the str() component and replace it with the float() component. -- Steve

On Wed, Jan 30, 2019 at 9:21 AM Steven D'Aprano <steve@pearwood.info> wrote:
If you don't like the two built-in stringify functions, you can write your own, and they still work with for-loops, comprehensions and map().
I don't disagree, after all, there are many NPM packages that contain really short functions, we could package the function on its own. I see that the "implode" namespace is not taken on PyPi, so, what do you suggest it would look like ? from implode import implode ? Or can you suggest better names ?
Best of all, we're not even limited to strings. Change your mind and want floats instead of strings?
To be user friendly software will need to build proper text output. And most of the time joining a sequence is the best way to go. But, I often mistake because switching over from os.path.join and str.join. -- ∞

On Wed, Jan 30, 2019 at 11:17 AM Jamesie Pic <jpic@yourlabs.org> wrote:
often mistake because switching over from os.path.join and str.join.
I didn't mean "replacing an os.path.join call with an str.join call", I mean that I'm calling str.join 2 seconds after os.path.join, and forgot about the inconsistency we have between the two. Does this make any sense? Thanks for your great replies -- ∞

Thanks Steven for your reply. For me, assembling a string from various variables is a much more common programing task, because that's how users except software to communicate with them, be it on CLI, GUI, or through Web. It doesn't matter if your software works and the user doesn't understand it. It doesn't matter if your software doesn't work, as long as the user understands it. I wonder what makes my use case so special, perhaps because when I make software it's always on the purpose to serve an actual human being need ?

On Wed, Jan 30, 2019 at 8:50 PM Jamesie Pic <jpic@yourlabs.org> wrote:
Most places where you need to talk to humans, you'll end up either interpolating the values into a template of some sort (see: percent formatting, the format method, and f-strings), or plug individual values straight into method calls (eg when building a GUI). I'm not sure why or how your use-case is somehow different here. It's generally best to provide simple low-level functionality, and then let people build it into whatever they like. For example, VLC Media Player and Counter-Strike: Global Offensive don't have any means of interacting, but with some simple Python programming in between, it's possible to arrange it so that the music automatically pauses while you're in a match. But there does NOT need to be a game feature "automatically pause VLC while in a match". Joining a collection of strings is possible. Stringifying a collection of arbitrary objects is possible. There doesn't need to be a single feature that does both at once. ChrisA

On Wed, Jan 30, 2019 at 11:06 AM Chris Angelico <rosuav@gmail.com> wrote:
The new python format syntax with f-strings is pretty awesome, let's see how we can assemble a triple quoted f-string: foo = f''' some {more(complex)} {st.ri("ng")} '''.strip() Pretty cool right ? In a function it would look like this: def foo(): return f''' some {more(complex)} {st.ri("ng")} ''').strip() Ok so that would also work but we're going to have to import a module from the standard library to restore visual indentation on that code: import textwrap def foo(): return textwrap.dedent(f''' some {more(complex)} {st.ri("ng")} ''').strip() Let's compare this to the join notation: def foo(): return '\n'.join('some', more(complex), st.ri('ng')) Needless to say, I prefer the join notation for this use case. Not only does it fit in a single line but it doesn't require to dedent the text with an imported function, nor does it require to juggle with quotes, but also it sorts of look like it would be more performant. All in all, I prefer the join notation to assemble longer strings. Note that in practice, using f-strings for the "pieces" that I want to assemble and that works great: def foo(): return '\n'.join('some', more(complex), f'_{other}_') Anyway, ok good-enough looking code ! Let's see what you have to say: TypeError: join() takes exactly one argument (2 given) Oh, that again, kk gotfix: def foo(): return '\n'.join(['some', more(complex), f'_{other}_']) I should take metrics about the number of times were I make this mistake during a day, cause it looks like it would be a lot (i switch between os.path.join to str.join a lot). It seems there is a lot of friction when proposing to add a convenience join method to the list method. I won't go over the reasons for this here, there's already a lot to read about it on internet, that's been written during the last 20 years. ## Conclusion I have absolutely no idea what should be done about this, the purpose of this article was just to share a bit of one of my obsessions with string assembling. Maybe it strikes me assembling strings multiple times a day with a language I've got 10 years of full-time experience and still repeating the same mistakes because I coded an os.path.join call 3 seconds before assembling a string with str.join, silly me ^^ Not because I don't understand the jurisprudence, not because I don't understand the documentation, or because the documentation is wrong, but probably just because i switch from os.path.join and str.join which take different syntax, i think.
Even for a program without user interface: you still want proper logs in case your software crashes for example . So even if you're not building a user interface, you still want to assemble human readable strings. If it's such a common task, why not automate what's obvious to automate ? -- ∞

On Wed, Jan 30, 2019 at 11:06 AM Chris Angelico <rosuav@gmail.com> wrote:
Actually we're moving away from templates, in favor of functional decorating component-based pattern pretty much like React, in some R&D open source project. Not only do we get much better performance than with a template rendering engine, but we also get all the power of a good programing language: Python :) -- ∞

On Wed, Jan 30, 2019 at 10:33 PM Jamesie Pic <jpic@yourlabs.org> wrote:
Well, I've no idea how your component-based system works, but in React itself, under the covers, the values end up going straight into function calls, which was the other common suggestion I gave :) There's a reason that those two styles, rather than join() itself, will tend to handle most situations. ChrisA

On 2019-01-29 16:14, MRAB wrote:
Then you can still convert them yourself beforehand, and any stringifying that .join did would be a no-op. If you want to call repr on all your stuff beforehand, great, then you'll get strings and you can join them just like anything else. But you'll ADDITIONALLY be able to not pre-stringify them in a custom way, in which case they'll be stringified in the default way. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 2019-01-29 15:38, Greg Ewing wrote:
Oh please. Because it also RETURNS a string. Of course count won't return a string, it returns a count. But str.join is for "I want to join these items into a single string separated by this delimiter". If the output is to a be a string obtained by combining other items, there is nothing lost by converting them to strings. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

So you'd propose to add some kind of def Join(sep, *args): return sep.join(map(str, args)) To the standard lib ? Or to add another method to str class that do that ? class str: ... def Join(self, *args): return self.join(map(str, args)) I agree such a function is super convenient but does it need to be added to the standard lib I have it in my custom utils.py and my PYTHONTARTUP.py file so that I can use it everywhere. Call it Join, superjoin, joinargs... On Tue, 29 Jan 2019, 02:43 Jamesie Pic <jpic@yourlabs.org wrote:

Oh and if you want to write ['a', 'b', 'c'].join('.') Check out pip install funcoperators and you can write : ['a', 'b', 'c'] |join('.') Given you defined the function below : from funcoperators import postfix def join(sep): return postfix(lambda it: sep.join(map(str, it)) You can even choose the operator : ['a', 'b', 'c'] -join('.') ['a', 'b', 'c'] /join('.') ['a', 'b', 'c'] @join('.') ... Disclaimer : I'm the creator of funcoperators On Tue, 29 Jan 2019, 02:43 Jamesie Pic <jpic@yourlabs.org wrote:

funcoperators is pretty neat ! But at this stage of the discussion I would also try to get automatic string casting since the purpose is to assemble a string. It would be great in the stdlib because switching between os.path.join and str.join is so error-prone, and assembling strings seems like a pretty common task. It's not uncommon to find str.join in arguments against Python. Monkey patching str in PYTHONTARTUP.py would work, but then that would require users pulling my package to also hack their startup script. Or even worse: we could patch the startup script upon package installation. It seems like it would make redistribution a lot harder than it should. Another approach would be to add a stringify(delim='\n') method to iterables, it would accept a delimiter argument and would return a string of all items casted to string and separated by the delimiter. That would be certainly more backward-compatible than supporting an alternate str.join(1, 'b') call. Meanwhile I've opened a PR on boltons, but, well, it looks a lot like php.net/implode, and I'm not really sure we want that :D https://github.com/mahmoud/boltons/pull/197/commits/2b4059855ab4ceae54032bff... -- ∞

On 29/01/2019 01:40, Jamesie Pic wrote:
It seems fairly consistent to make: os.path.join('a', 'b', 'c') short for: os.path.sep.join(['a', 'b', 'c'])
Please, no. This would be un-Pythonic in my view. It makes so much more sense that str should have a method that takes an iterable, returning str, than that every iterable should have a join(str) returning str. Consider you get this kind of thing for free: "-".join(str(i) for i in range(10)) I learned enough Groovy last year to use Gradle and was so disappointed to find myself having to write: excludes: exclusions.join(',') // Yes, it's that way round :o Even Java agrees (since 1.8) with Python. Jeff Allen

I'm not disagreeing by any mean. I'm just saying assembling strings is a common programing task and that we have two different methods with the same name and inconsistent signatures and that it's error-prone. I'm most certainly *not* advocating for breaking compatibility or whatnot.

Hi, At the end this long thread because 2 functions doing quite the same thing have the same name but not the same signature and it's confusing for some people (I'm one of those) |str.||join|(/iterable/) |os.path.||join|(/path/, /*paths/) There are strong arguments about why it's implemented like that and why it's very difficult to change it. Maybe some change could be giving str.join 1 iterable or many args : about str.join: a - if 0 arg : error b - if 1 arg : process or return error if not iterable c - if > 1 arg: do b using all args as one iterable maybe some performance issues could go against it. I agree with the fact that this is a minor need and it should not allow major change Le 30/01/2019 à 11:01, Jamesie Pic a écrit :

Thanks for your reply Jimmy ! As suggested by Chris and Steven, we might also want to throw in a "key" kwarg, that could be none by default to keep BC, but also allow typecasting: ' '.join('a', 2, key=str) -- ∞

On Wed, Jan 30, 2019 at 10:14 PM Chris Angelico <rosuav@gmail.com> wrote:
I didn't, but I don't know if Chris Barker did.
nope -- not me either :-)
(Can't swing a cat without hitting someone named Steve or Chris, in some spelling or another!)
good thing there aren't a lot of cats being swung around, then. One more note about this whole thread: I do a lot of numerical programming, and used to use MATLAB and now numpy a lot. So I am very used to "vectorization" -- i.e. having operations that work on a whole collection of items at once. Example: a_numpy_array * 5 multiplies every item in the array by 5 In pure Python, you would do something like: [ i * 5 for i in a_regular_list] You can imagine that for more complex expressions the "vectorized" approach can make for much clearer and easier to parse code. Also much faster, which is what is usually talked about, but I think the readability is the bigger deal. So what does this have to do with the topic at hand? I know that when I'm used to working with numpy and then need to do some string processing or some such, I find myself missing this "vectorization" -- if I want to do the same operation on a whole bunch of strings, why do I need to write a loop or comprehension or map? that is: [s.lower() for s in a_list_of_strings] rather than: a_list_of_strings.lower() (NOTE: I prefer comprehension syntax to map, but map would work fine here, too) It strikes me that that is the direction some folks want to go. If so, then I think the way to do it is not to add a bunch of stuff to Python's str or sequence types, but rather to make a new library that provides quick and easy manipulation of sequences of strings. -- kind of a stringpy -- analogous to numpy. At the core of numpy is the ndarray: a "a multidimensional, homogeneous array of fixed-size items" a strarray could be simpler -- I don't see any reason for more than 1-D, nor more than one datatype. But it could be a "vector" of strings that was guaranteed to be all strings, and provide operations that acted on the entire collection in one fell swoop. If it turned out to be useful, you could even make a version in C or Cython that might give significant performance benefits. I don't have a use case for this -- but if someone does, it's an idea. -CHB Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Jan 31, 2019 at 12:52 PM Chris Barker via Python-ideas < python-ideas@python.org> wrote:
Isn't what you want called "Pandas"? E.g.:
type(strs) <class 'pandas.core.series.Series'>
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Fri, Feb 1, 2019 at 4:51 AM Chris Barker <chris.barker@noaa.gov> wrote:
Here's a simpler and more general approach: a "vector" type. Any time you attempt to look up any attribute, it returns a vector of that attribute for each of its elements. When you call a vector, it calls each element (with the same args) and returns a vector of the results. So the vector would, in effect, have a .lower() method that returns .lower() of all its elements. (David, your mail came in as I was typing mine, so it looks fairly similar, except that this proposed vector type wouldn't require you to put ".str" in the middle of it, so it would work with any type.) ChrisA

On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas wrote:
Julia has special "dot" vectorize operator that looks like this: L .+ 1 # adds 1 to each item in L func.(L) # calls f on each item in L https://julialang.org/blog/2017/01/moredots The beauty of this is that you can apply it to any function or operator and the compiler will automatically vectorize it. The function doesn't have to be written to specifically support vectorization.
Using Julia syntax, that might become a_list_of_strings..lower(). If you don't like the double dot, perhaps str.lower.(a_list_of_strings) would be less ugly. -- Steven

I accidentally replied only to Steven - sorry! - this is what I said, with a typo corrected:
a_list_of_strings..lower()
str.lower.(a_list_of_strings)
I much prefer this solution to any of the other things discussed so far. I wonder, though, would it be general enough to simply have this new '.' operator interact with __iter__, or would there have to be new magic methods like __veccall__, __vecgetattr__, etc? Would a single __vectorize__ magic method be enough? For example, I would expect (1, 2, 3) .** 2 to evaluate as a tuple and [1, 2, 3] .** 2 to evaluate as a list, and some_generator() .** 2 to still be a generator. If there were a __vectorize__(self, func) which returned the iterable result of applying func on each element of self: class list: def __vectorize__(self, func): return [func(e) for e in self] some_list .* other becomes some_list.__vectorize__(lambda e: e * 2) some_string..lower() becomes some_string.__vectorize__(str.lower) some_list..attr becomes some_list.__vectorize__(operator.__attrgetter__('attr')) Perhaps there would be a better name for such a magic method, but I believe it would allow existing sequences to behave as one might expect, but not require each operator to require its own definition. I might also be over-complicating this, but I'm not sure how else to allow different sequences to give results of their same type. On Thu, Jan 31, 2019 at 6:24 PM Steven D'Aprano <steve@pearwood.info> wrote:

I love moredots ❤️ With pip install funcoperators, one can implement the *dotmul* iff dotmul can be implemented as a function. L *dotmul* 1 Would work. Or even a simple tweak to the library would allow L *dot* s to be [x*s for x in L] and L /dot/ s to be [x/s for x in L]" I'd implement something like "if left is iterable and right is not, apply [x*y for x in left] else if both are iterable, apply [x*y for x,y in zip(left, right)] etc." Iterble Disclaimer : I'm the creator of funcoperators On Fri, 1 Feb 2019, 00:23 Steven D'Aprano <steve@pearwood.info wrote:

пт, 1 февр. 2019 г. в 02:24, Steven D'Aprano <steve@pearwood.info>:
IMO, the beauty of vector type is that it contains homogeneous data. Therefore, it allows you to ensure that the method is present for each element in the vector. The first given example is what numpy is all about and without some guarantee that L consists of homogeneous data it hardly make sense. The second one is just `map`. So I can't catch what you are proposing: 1. To make an operator form of `map`. 2. To pull numpy into stdlib. 3. Or something else, which is not obvious to me from the examples given. With kind regards, -gdg

I think the actual proposal is having a new type of list (ie : vectors) that works like numpy but for any data. Instead of a list where the user has to be sure all the data is the same type, vectors makes him-er sure it's full of the same data than can me processed using a particular function (as s-he would do with map). I think the syntax proposed is not cool, it's kinda unique in python and doesn't feel pythonic to me. A thing I thought about but I'm not satisfy is using the new matrix-multiplication operator: my_string_vector @ str.lower def compute_grad(a_student): return "you bad" my_student_vector @ compute_grad But it's a bit confusing to me. Le ven. 1 févr. 2019 à 17:04, Kirill Balunov <kirillbalunov@gmail.com> a écrit :

On Fri, Feb 1, 2019, 6:16 PM Adrien Ricocotam <ricocotam@gmail.com wrote:
This is certainly doable. But why would it be better than: map(str.lower, my_string_vector) map(compute_grad, my_student_vector) These latter seem obvious, clear, and familiar.

On Fri, Feb 1, 2019 at 5:00 PM David Mertz <mertz@gnosis.cx> wrote:
or [s.lower() for s in my_string_vector] Side note: It's really interesting to me that Python introduced comprehension sytax some years ago, and even "hid" reduce(), and now there seems to be a big interest / revival of "map". Even numpy supports inhomogeneous data:
well, no -- it doesn't -- look carefully, that is an array or type '!S4' -- i,e, a 4 element long string --every element in that array is that same type. Also note that numpy's support for strings a not very complete. numpy does support an "object" type, that can be inhomogeneous -- it's still a single type, but that type is a python object (under the hood it's an array fo pointers to pyobjects): In [3]: a = np.array([1, 'spam'], dtype=np.object) In [4]: a Out[4]: array([1, 'spam'], dtype=object) And it does support vectorization to some extent: In [5]: a * 5 Out [5]: array([5, 'spamspamspamspamspam'], dtype=object) But not with any performance benefits. I think there are good reasons to have a "string_vector" that is known to be homogenous: Performance -- it could be significantly optimized (are there many use cases for that? I don't know. Clear API: a string_vector would have all the relevant string methods. You could easily write a list subclass that passed on method calls to the enclosed objects, but then you'd have a fair bit of confusion as to what might be a vector method vs a method on the objects. which I suppose leaves us with something like: list.elements.upper() list.elements * 5 hmm -- not sure how much I like this, but it's pretty doable. I still haven't seen any examples that aren't already spelled 'map(fun, it)' and I don't think you will -- I *think* get credit for starting this part of the the thread, and I started by saying I have often longed for essentially a more concise way to spell map() or comprehensions. performance asside, I use numpy because: c = np.sqrt(a**2 + b**2) is a heck of a lot easer to read, write, and get correct than: c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a), map(lambda x: x**2, b) ))) or: [math.sqrt(x) for x in (a + b for a, b in zip((x**2 for x in a), (x**2 for x in b) ))] Note: it took me quite a while to get those right! (and I know I could have used the operator module to get the map version maybe a bit cleaner, but the point stands) Does this apply to string processing? I'm not sure, though I do a fair bit of chaining of string operations: my_string.strip().lower().title() if you wanted to do that to a list of strings: a_list_of_strings.strip().lower().title() is a lot nicer than: [s.title() for s in (s.lower() for s in [s.strip(s) for s in a_list_of_strings])] or list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings)))) # untested How common is that use case? not common enough for me to go any further with this. -CHB -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker <pythonchb@gmail.com> wrote:
You can also write c = [math.sqrt(x**2 + y**2) for x, y in zip(a, b)] or c = list(map(lambda x, y: math.sqrt(x**2 + y**2), a, b)) or, since math.hypot exists, c = list(map(math.hypot, a, b)) In recent Python versions you can write [*map(...)] instead of list(map(...)), which I find more readable. a_list_of_strings.strip().lower().title()
In this case you can write [s.strip().lower().title() for s in a_list_of_strings] -- Ben

On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
What if it's a more complicated example? len(sorted(a_list_of_strings.casefold())[:100]) where the len() is supposed to give back a list of the lengths of the first hundred strings, sorted case insensitively? (Okay so it's a horrible contrived example. Bear with me.) With current syntax, this would need multiple map calls or comprehensions: [len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]] (Better examples welcomed.) ChrisA

Here is a very toy proof-of-concept:
My few lines are at https://github.com/DavidMertz/stringpy One thing I think I'd like to be different is to have some way of accessing EITHER the collection being held OR each element. So now I just get:
v.__len__() <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]>
Yes, that's an ugly spelling of `len(v)`, but let's bracket that for the moment. It would be nice also to be able to ask "what's the length of the vector, in a non-vectorized way" (i.e. 12 in this case). Maybe some naming convention like:
v.collection__len__() 12
This last is just a possible behavior, not in the code I just uploaded. On Sat, Feb 2, 2019 at 6:47 PM Chris Angelico <rosuav@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Slightly more on my initial behavior:
Vector(37) TypeError: Vector can only be initialized with an iterable
Vector("hello") <Vector of 'hello'>
I'm wondering if maybe making a vector out of a scalar should simply be a length-one vector. What do you think? Also, should a single string be treated like a vector of characters or like a scalar? It feels kinda pointless to make a vector of characters since I cannot think of anything it would do better than a plain string already does (largely just the same thing slower). On Sat, Feb 2, 2019 at 8:54 PM David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Trying to make iterators behave in a semi-nice way also. I kinda like this (example remains silly, but it shows idea).
On Sat, Feb 2, 2019 at 9:03 PM David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2019-02-03 02:03, David Mertz wrote:
[snip] I think it should follow the pre-existing behaviour of list, set, tuple, etc.
Vector("hello") <Vector of ['h', 'e', 'l', 'l', 'o']>
Why is it pointless for a vector, but not for a list?

I try to keep the underlying datatype of the wrapped collection as much as possible. Casting a string to a list changes that.
Strings are already a Collection, there is not firm need cast them to a list to live inside a Vector. I like the idea of maintaining the original type if someone wants it back later (possibly after transformations of the values). Why is it pointless for a vector, but not for a list?
I guess it really isn't. I was thinking of just .upper() and .lower() where upper/lower-casing each individual letter is the same as doing so to the whole string. But for .replace() or .count() or .title() or .swapcase() the meaning is very different if it is letter-at-a-time. I guess a string gets unstringified pretty quickly no matter what though. E.g. this seems like right behavior once we transform something:
I dunno... I suppose I *could* do `self._it = "".join(self._it)` whenever I do a transform on a string to keep the underlying iterable as a string. But the point of a Vector really is sequences of strings not sequences of characters. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Nice that you implemented it ! I think all the issues you have right now would go of using another operation. I proposed the @ notation that is clear and different from everything else, plus the operator is called "matmul" so it completely makes sense. The the examples would be :
We still have some issues : how to we treat operators like v[1:]. I suggest using the same syntax : if we don't use @ the operation is done on the vector and not on its elements. Therefore, v[1:] will remove "Jan" from the vector whereas v @ operator.getitem(slice
That little example shows the need of configuring functions so they only accept on argument. It's actually not a new problem since map have the same "issue". A vector of one element should still be a vector, as a list/tuple/dict of one element is a list/tuple/dict, imo. I suggested Vector objects to inherit from lists, and therefore be iterables. It would be handy to iterator over its elements and simple loops, maps, etc, should still be available to them. It might be clearer to use "old" notations for some operations. About the `Vector("A Super String")`, if we want it to be a vector of one element, we should use `Vector(["A Super String"])`, as we would do in any other function using an iterable as input. Side Note : Honestly, I don't think it's the good thread to debate whether we should use ["in", "un", "an", "non"] - homogeneous or heterogeneous. As long as it's clear, does it matter ? Le dim. 3 févr. 2019 à 04:19, David Mertz <mertz@gnosis.cx> a écrit :

On Sun, Feb 3, 2019 at 3:54 AM Adrien Ricocotam <ricocotam@gmail.com> wrote:
plus the operator is called "matmul" so it completely makes sense. The the
examples would be :
I cannot really see how using the @ operator helps anything here. If this were a language that isn't Python (or conceivably some future version of Python, but that doesn't feel likely or desirable to me), I could imagine @ as an operator to vectorize any arbitrary sequence (or iterator). But given that we've already made the sequence into a Vector, there's no need for extra syntax to say it should act in a vectorized way. Moreover, your syntax is awkward for methods with arguments. How would I spell: v.replace('foo', 'bar') In the @ syntax? I actually made an error on my first pass where simply naming a method was calling it. I thought about keeping it for a moment, but that really only allows zero argument calls. I think the principled thing to do here is add the minimal number of methods to Vector itself, and have everything else pass through as vectorized calls. Most of that minimal number are "magic method": __len__(), __contains__(), __str__(), __repr__(), __iter__(), __reversed__(). I might have forgotten a couple. All of those should not be called directly, normally, but act as magic for operators or built-in functions. I think I should then create regular methods of the same name that perform the vectorized version. So we would have: len(v) # -> 12 v.len() # -> <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]> list(v) # -> ["Jan", "Feb", "Mar", "Apr", "May", "Jul" ...] v.list() # -> <Vector of [["J", "a", "n"], ["F", "e", "b"] ... > I can't implement every single constructor that users might conceivably want, of course, but I can do it for the basic types in builtins and common standard library. E.g. I might do: v.deque() # -> <Vector of [deque(["J", "a", "n"]), deque(["F", "e", "b"]) ... > But I certainly won't manually add: v.custom_linked_list() # From my_inhouse_module.py Hmm... maybe even I could look at names of maybe-constructors in the current namespace and try them. That starts to feel too magic. Falling back to this feels better: map(custom_linked_list, v) # From my_inhouse_module.py -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I honestly don’t understand what you don’t like the @ syntax. My idea is using functions that takes on argument : an object of the type of the vector. That’s actually how map works. What I understood from your previous message is that there’s ambiguity when using magic functions on whether it’s applied to each element of the vector or the vector itself. That was the first thing I saw. While reading your examples, I noticed that you were using « my_vec.function() ». You just said that we will not code the « .function » for any function. That’s the other problem I wanted to address with the @ notation. Functions that could be used are then the same we can use in map. But I do agree it’s not easy to have functions with parameters. That’s why I used functools.partial On Sun 3 Feb 2019 at 19:23, David Mertz <mertz@gnosis.cx> wrote:

On Sun, Feb 3, 2019 at 1:38 PM Adrien Ricocotam <ricocotam@gmail.com> wrote:
I honestly don’t understand what you don’t like the @ syntax.
Can you show any single example that would work with the @ syntax that would not work in almost exactly the same way without it? I have not seen any yet, and none seem obvious. Adding new syntax for its own sake is definitely to be avoided when possible (even though technically the operator exists, so it wouldn't be actual new syntax).
My idea is using functions that takes on argument : an object of the type of the vector. That’s actually how map works.
I do not understand this. Spell my simple example using @ notation. I.e. my_vec @ replace {something? here for 'foo' with 'bar'}
I decided there really isn't. I think that any function applied to the vector should operate on the sequence as a whole. E.g. what length does it have? Cast it to a different kind of sequence. Print it out. Serialize it. Etc. The things that are vectorized should always be methods of the vector instead. And ALMOST every method should in fact be a vectorized operation. In most cases, those will be a "pass through" to the methods of the items inside of the vector. We won't write every possible method in the Vector class. My toy so far only works with methods that the items actually have. In the examples, string methods. But actually, I should add one method like this: my_vec.apply(lambda x: x*2) That is, we might want to vectorize custom functions also. Maybe in that example we should name the function 'double' for clarity: ' my_vec.apply(double)'. I do think that just a few methods need to be custom programmed because they correspond to magic methods of the items rather than regular names (or not even directly to magic methods, but more machinery). So: my_vec.list() #-> cast each item to a list my_vec.tuple() #-> cast each item to a tuple my_vec.set() #-> cast each item to a set Maybe that's doing too much though. We could always do that with map() or comprehensions; it's not clear it's a common enough use case. Functions that could be used are then the same we can use in map. But I do
agree it’s not easy to have functions with parameters. That’s why I used functools.partial
I really did not understand how that was meant to work. But it was a whole lot of lines to accomplish something very small either way.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Adrien Ricocotam wrote:
I honestly don’t understand what you don’t like the @ syntax.
Another probkem with @ is that it already has an intended meaing, i.e. matrix multiplication. What if you have two vectors of matrices and you want to multiply corresponding ones? -- Greg

вс, 3 февр. 2019 г. в 21:23, David Mertz <mertz@gnosis.cx>:
Hi David! Thank you for taking the time to implement this idea. Sorry, I'm on a trip now and can't try it. From what I've read in this thread, I think I mostly agree with your perception how the vector should work: that `len(v) # -> 12` and that `.some_method()` call must apply to elements (although pedants may argue that in this case there is not much difference). The only moment that I don’t like is `v.len(), v.list() and ...`, for the same reasons - in general this will not work. I also don't like the option with `.apply` - what if `.apply` method is already defined for elements in a vector?
Actually my thoughts on this. At first I thought that for these purposes it is possible to use __call__: len(v) # -> 12 v(len) # -> <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]> But it somehow this idea did not fit in my head. Then I found the next way and I think I even like it - to reuse the `__getitem__`, when its argument is a function it means that you apply this function to every element in the vector. len(v) # -> 12 v[len] # -> <Vector of [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]> In this case you can apply any function, even custom_linked_list from my_inhouse_module.py. From this stream I did not understand what desired behavior for unary operations like `vector + 1` and the others. Also what is the desired behaviour for `vector[1:5]`? Considering the above, I would like to take this operation on the contrary:
With kind regards, -gdg

On Mon, Feb 4, 2019 at 7:14 AM Kirill Balunov <kirillbalunov@gmail.com> wrote:
I think I really like this idea. Maybe as an extra spelling but still allow .apply() to do the same thing. It feels reasonably intuitive to me. Not *identical to* indexing in NumPy and Pandas, but sort of in the same spirit as predicative or selection based indices. What do other people on this thread think? Would you learn that easily? Could you teach it?
This feels more forced, unfortunately. Something short would be good, but not sure I like this. This is really just a short spelling of pandas.IndexSlice or numpy.s_ It came up in another thread some months ago, but there is another proposal to allow the obvious spelling `slice[start:stop:sep]` as a way of creating slices. Actually, I guess that's all halfway for the above. We'd need to do this still: v[itemgetter(IndexSlicer[1:])] That's way too noisy. I guess I just don't find the lowercase `i` to be iconic enough. I think with a better SHORT name, I'd like: v[Item[1:]] Maybe that's not the name? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Hi, I'm not sure to understand the real purpose of Vector. Is that a new collection ? Is that a list with a builtin map() function ? Is it a wrapper to other types ? Should it be iterable ? The clear need explained before is using fluent interface on a collection : MyVector.strip().replace("A","E") Why do we need Vector to behave like list. We just want to work on our strings but with a cleaner/shorter/nicer syntax. My idea (not totally clear in my mind) is that Vector should behave quite like the type it wraps so having only one type. I don't want a collection of strings, I want a MegaString (...) which I can use exactly like alone string. An iteration on Vector would iter like itertools.chain does. At the end, I would only need one more method which would return an iterable of the items like MyVector.explode() For me Vector should be something like that : class Vector: def __init__(self, a_list): self.data = a_list self._type = type(self.data[0]) for data in self.data: if type(data) != self._type: raise TypeError def __getattr__(self, name): fn = getattr(self._type, name) def wrapped(*args, **kwargs): self.data = [fn(i, *args, **kwargs) for i in self.data] return self return wrapped def explode(self): return iter(self.data) I'm not saying it should only handle strings but it seems to be the major use case. Jimmy Le 04/02/2019 à 17:12, David Mertz a écrit :
Le 04/02/2019 à 17:12, David Mertz a écrit :

Before I respond to a specific point below, I'd like to make a general observation. I changed the subject line of this sub-thread to discuss a feature of Julia, which allows one to write vectorized code in standard infix arithmetic notation, that applies to any array type, using any existing function or operator, WITHOUT having to wrap your data in a special delegate class like this "Vector". So as far as I'm concerned, this entire discussion about this wrapper class misses the point. (Aside: why is this class called "Vector" when it doesn't implement a vector?) Anyway, on to my response to a specific point: On Mon, Feb 04, 2019 at 11:12:08AM -0500, David Mertz wrote:
obj[len] already has an established meaning as obj.__getitem__(len). There's going to be clash here between key lookup and applying a function: obj[len] # look up key=len obj[len] # apply function len Mathematica does use square brackets for calling functions, but it uses ordinary arithmetic order len[obj] rather than postfix order obj[len]. At the risk of causing confusion^1, we could have a "vector call" syntax: # apply len to each element of obj, instead of obj itself len[obj] which has the advantage that it only requires that we give functions a __getitem__ method, rather than adding new syntax. But it has the disadvantage that it doesn't generalise to operators, without which I don't think this is worth bothering with. ^1 Cue a thousand Stackoverflow posts asking whether they should use round brackets or square when calling a function, and why they get weird error messages sometimes and not other times. -- Steven

On Thu, Feb 7, 2019 at 4:03 PM Steven D'Aprano <steve@pearwood.info> wrote:
Generalizing to operators is definitely going to require new syntax, since both operands can be arbitrary objects. So if that's essential to the idea, we can instantly reject anything that's based on functions (like "make multiplying a function by a tuple equivalent to blah blah blah"). In that case, we come straight to a few key questions: 1) Is this feature even worth adding syntax for? (My thinking: "quite possibly", based on matmul's success despite having an even narrower field of use than this.) 2) Should it create a list? a generator? something that depends on the type of the operand? (Me: "no idea") 2) Does the Julia-like "x." syntax pass the grit test? (My answer: "nope") 3) If not, what syntax would be more appropriate? This is a general purpose feature akin to comprehensions (and, in fact, can be used in place of some annoyingly-verbose comprehensions). It needs to be easy to type and read. Pike's automap syntax is to subscript an array with [*], implying "subscript this with every possible value". It's great if you want to do just one simple thing: f(stuff[*]) # [f(x) for x in stuff] stuff[*][1] # [x[1] for x in stuff] but clunky for chained operations: (f(stuff[*])[*] * 3)[*] + 1 # [f(x) * 3 + 1 for x in stuff] That might not be a problem in Python, since you can always just use a comprehension if vectorized application doesn't suit you. I kinda like the idea, but the devil's in the details. ChrisA

On 2019-02-07 05:27, Chris Angelico wrote:
Would it be possible, at compile time, to retain it as an automap throughout the expression? stuff[*] # [x for x in suffix] f(stuff[*]) # [f(x) for x in stuff] (f(stuff[*]) * 3) + 1 # [f(x) * 3 + 1 for x in stuff] There could also be a way to 'collapse' it again. An uncollapsed automap would be collapsed at the end of the expression. (Still a bit fuzzy about the details...)

Here are some alternate syntaxes. These are all equivalent to len(print(list)). (len | print)(list) (len |> print)(list) (print <| len)(list) print <| len << list list >> print <| len list >> len |> print ## Traditional argument order print <| len << list ## Stored functions print_lengths = len | print print_lengths = len |> print print_lengths = print <| len These can be called using callable syntax. These can be called using << syntax. These can be called using >> syntax. ## Lightweight traditional syntax order (print | len)() # Explanation The pipeline operator (|, |>, <|) create an object. That object implements, depending on the chosen implementation, some combination of the __call__ operator, the __rshift__ operator, and/or the __lshift__ operator. — I am not proposing Python has all these operators at the same time, just putting these ideas out there for discussion.

Many apologies if people got one or more encrypted versions of this. On 2/7/19 12:13 AM, Steven D'Aprano wrote: It wasn't a concrete proposal, just food for thought. Unfortunately the thinking seems to have missed the point of the Julia syntax and run off with the idea of a wrapper class. I did not miss the point! I think adding new syntax à la Julia is a bad idea—or at very least, not something we can experiment with today (and wrote as much). Therefore, something we CAN think about and experiment with today is a wrapper class. This approach is pretty much exactly the same thing I tried in a discussion of PEP 505 a while back (None-aware operators). In the same vein as that—where I happen to dislike PEP 505 pretty strongly—one approach to simulate or avoid new syntax is precisely to use a wrapper class. As a footnote, I think my demonstration of PEP 505 got derailed by lots of comments along the lines of "Your current toy library gets the semantics of the proposed new syntax wrong in these edge cases." Those comments were true (and I think I didn't fix all the issues since my interest faded with the active thread)... but none of them were impossible to fix, just small errors I had made. With my *very toy* stringpy.Vector class, I'm just experimenting with usage ideas. I have shown a number of uses that I think could be useful to capture most or all of what folks want in "string vectorization." Most of what I've but in this list is what the little module does already, but some is just ideas for what it might do if I add the code (or someone else makes a PR at https://github.com/DavidMertz/stringpy). One of the principles I had in mind in my demonstration is that I want to wrap the original collection type (or keep it an iterator if it started as one). A number of other ideas here, whether for built-in syntax or different behaviors of a wrapper, effectively always reduce every sequence to a list under the hood. This makes my approach less intrusive to move things in and out of "vector mode." For example: v1 = Vector(set_of_strings) set_of_strings = v1.lower().apply(my_str_fun)._it # Get a set back v2 = Vector(list_of_strings) list_of_strings = v2.lower().apply(my_str_fun)._it # Get a list back v3 = Vector(deque_of_strings) deque_of_strings = v3.lower().apply(my_str_fun)._it # Get a deque back v4 = Vector(iter_of_strings) iter_of_strings = v4.lower().apply(my_str_fun)._it # stays lazy! So this is round-tripping through vector-land. Small note: I use the attribute `._it` to store the "sequential thing." That feels internal, so maybe some better way of spelling "get the wrapped thing" would be desirable. I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects. Nothing I wrote is actually string-specific. That is just the main use case stated. My `stringpy.Vector` might be misnamed in that it is happy to contain any kind of items. But we hope they are all items with the particular methods we want to vectorize. I showed an example where a list might contain a custom string-like object that happens to have methods like `.lower()` as an illustration. Inasmuch as I want to handle iterator here, it is impossible to do any type check upon creating a Vector. For concrete `collections.abc.Sequence` objects we could check, in principle. But I'd rather it be "we're all adults here" ... or at most provide some `check_type_uniformity()` function or method that had to be called explicitly.

On Thu, Feb 07, 2019 at 03:17:18PM -0500, David Mertz wrote:
I'm sorry, I did not see your comment that you thought new syntax was a bad idea. If I had, I would have responded directly to that. Why is it an overtly *bad* (i.e. harmful) idea? As opposed to merely not sufficiently useful, or unnecessary? You're certainly right that we can't easily experiment in the interpreter with new syntax, but we can perform thought-experiments and we don't need anything but a text editor for that. As far as I'm concerned, the thought experiment of comparing these two snippets: ((seq .* 2)..name)..upper() versus map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq))) demonstrates conclusively that even with the ugly double dot syntax, infix syntax easily and conclusively beats map. If I recall correctly, the three maps here were originally proposed by you as examples of why map() alone was sufficient and there was no benefit to the Julia syntax. I suggested composing them together as a single operation instead of considering them in isolation.
Therefore, something we CAN think about and experiment with today is a wrapper class.
Again, I apologise, I did not see where you said that this was intended as a proof-of-concept to experiment with the concept. [...]
If the Vector class is only a proof of concept, then we surely don't need to care about moving things in and out of "vector mode". We can take it as a given that "the real thing" will work that way: the syntax will be duck-typed and work with any iterable, and there will not be any actual wrapper class involved and consequently no need to move things in and out of the wrapper. I had taken note of this functionality of the class before, and that was one of the things which lead me to believe that you thought that a wrapper class was in and of itself a solution to the problem. If you had been proposing this Vector class as a viable working solution (or at least a first alpha version towards a viable solution) then worrying about round-tripping would be important. But as a proof-of-concept of the functionality, then: set( Vector(set_of_stuff) + spam ) list( Vector(list_of_stuff) + spam ) should be enough to play around with the concept. [...]
Why do you care about type uniformity or type-checking the contents of the iterable? Comments like this suggest to me that you haven't understood the idea as I have tried to explain it. I'm sorry that I have failed to explain it better. Julia is (if I understand correctly) statically typed, and that allows it to produce efficient machine code because it knows that it is iterating over (let's say) an array of 32-bit ints. While that might be important for the efficiency of the generated machine code, that's not important for the semantic meaning of the code. In Python, we duck-type and resolve operations at runtime. We don't typically validate types in advance: for x in sequence: if not isinstance(x, Spam): raise TypeError('not Spam') for x in sequence: process(x) (except under unusual circumstances). More to the point, when we write a for-loop: result = [] for a_string in seq: result.append(a_string.upper()) we don't expect that the interpreter will validate that the sequence contains nothing but strings in advance. So if I write this using Julia syntax: result = seq..upper() I shouldn't expect the iterpreter to check that seq contains nothing but strings either. -- Steven

On Thu, Feb 7, 2019 at 6:48 PM Steven D'Aprano <steve@pearwood.info> wrote:
I'm sorry, I did not see your comment that you thought new syntax was a bad idea. If I had, I would have responded directly to that.
Well... I don't think it's the worst idea ever. But in general adding more operators is something I am generally wary about. Plus there's the "grit on Uncle Timmy's screen" test. Actually, if I wanted an operator, I think that @ is more intuitive than extra dots. Vectorization isn't matrix multiplication, but they are sort of in the same ballpark, so the iconography is not ruined.
OK... now compare: (Vec(seq) * 2).name.upper() Or: vec_seq = Vector(seq) (vec_seq * 2).name.upper() # ... bunch more stuff seq = vec_seq.unwrap() I'm not saying the double dots are terrible, but they don't read *better* than wrapping (and optionally unwrapping) to me. If we were to take @ as "vectorize", it might be: (seq @* 2) @.name @.upper() I don't hate that. demonstrates conclusively that even with the ugly double dot syntax,
infix syntax easily and conclusively beats map.
Agreed.
Well... your maps are kinda deliberately ugly. Even in that direction, I'd write: map(lambda s: (s*2).name.upper(), seq) I don't *love* that, but it's a lot less monstrous than what you wrote. A comprehension probably even better: [(s*2).name.upper() for s in seq] Again, I apologise, I did not see where you said that this was intended
as a proof-of-concept to experiment with the concept.
All happy. Puppies and flowers.
Well... I at least moderately think that a wrapper class is BETTER than new syntax. So I'd like the proof-of-concept to be at least moderately functional. In any case, there is ZERO code needed to move in/out of "vector mode." The wrapped thing is simply an attribute of the object. When we call vectorized methods, it's just `getattr(type(item), attr)` to figure out the method in a duck-typed way. one of the things which lead me to believe that you thought that a
Yes, I consider the Vector class a first alpha version of a viable solution. I haven't seen anything that makes me prefer new syntax. I feel like a wrapper makes it more clear that we are "living in vector land" for a while. The same is true for NumPy, in my mind. Maybe it's just familiarity, but I LIKE the fact that I know that when my object is an ndarray, operations are going to be vectorized ones. Maybe 15 years ago different decisions could have been made, and some "vectorize this operation syntax" could have made the ndarray structure just a behavior of lists instead. But I think the separation is nice.
That's fine. But there's no harm in the class *remembering* what it wraps either. We might want to distinguish: set(Vector(some_collection) + spam) # Make it a set after the operations (Vector(some_collection) + spam).unwrap() # Recover whatever type it was before
Why do you care about type uniformity or type-checking the contents of the iterable?
Because some people have said "I want my vector to be specifically a *sequence of strings* not of other stuff" And MAYBE there is some optimization to be had if we know we'll never have a non-footype in the sequence (after all, NumPy is hella optimized). That's why the `stringpy` name that someone suggested. Maybe we'd bypass most of the Python-land calls when we did the vectorized operations, but ONLY if we assume type uniformity. But yes, I generally care about duck-typing only. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Feb 7, 2019 at 4:27 PM David Mertz <mertz@gnosis.cx> wrote:
well, vectorization is kinda the *opposite* of matrix multiplication -- matrix multiplication is treating the matrix as a whole, rther than applying multiplication to each element. And it is certainly the opposite in the numpy case. Which gives me an idea -- we could make an object that applied operators (and methods??) to each element individually, and use the @ operator when you wanted the method to act on the whole object instead. Note: I haven't thought about the details at all -- may not be practical to use an operator for that.
(Vec(seq) * 2).name.upper()
Or:
I'm not saying the double dots are terrible, but they don't read *better*
what type would .unwrap() return? One of the strengths of the "operator" approach is that is could apply to any (appropriately mutable) sequence and keep that sequence. I"m not sure how much that actually matters, as I'm expecting this is a 99% list case anyway. and why would .unwrap() be required at all -- as opposed to say: seq = list(vec_seq) than wrapping (and optionally unwrapping) to me. nor to me.
Well... your maps are kinda deliberately ugly.
That's actually pretty key -- in fact, if you wanted to apply a handful of operations to each item in a sequence, you would probably use a single expression (If possible) in a lambda in a map, or in a comprehension, rather than chaining the map. Even if it was more complex, you could write a function, and then apply that with a map or comprehension. In the numpy case, compare: c = sqrt(a**2 + b**2) to c = [sqrt(a**2 + b**2) for a,b in zip(a,b)] so still a single comprehension. But: 1) given the familiariy of math expressions -- the first really does read a LOT better 2) the first version can be better optimized (by numpy) So the questions becomes: * For other than math with numbers (which we have numpy for), are there use cases where we'd really get that much extra clarity? * Could we better optimize, say, a sequence of strings enough to make it all worth it? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Fri, Feb 8, 2019 at 3:17 PM Christopher Barker <pythonchb@gmail.com> wrote:
The idea—and the current toy implementation/alpha—has .unwrap return whatever type went into the Vector creation. Might be a tuple, list, set, deque, or it might be an iterator. It might even be some custom collection that isn't in the standard library. But you can also explicitly make a Vector into something else by using that constructor. Pretty much as I gave example before: set(Vector(a_list)) # Get a set Vector(a_list)).unwrap() # Get a list (without needing to know type to call .unwrap()) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Has anyone thought about my proposal yet? I think because it allows chained function calls to be stored, which is probably something that is a common; if imagine people turning the same series of chained functions into a lambda of its own once it’s used more than once in a program. Arguably, the lambda syntax is more readable and puts on less visual burden. Sent from my iPhone

Just a quick idea. Wouldn't an arrow operator -> be less of an eye sore? Em sex, 8 de fev de 2019 às 18:16, Christopher Barker <pythonchb@gmail.com> escreveu:
-- Marcos Eliziário Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario@gmail.com linked-in : https://www.linkedin.com/in/eliziario/

Christopher Barker writes:
well, vectorization is kinda the *opposite* of matrix multiplication -- matrix multiplication is treating the matrix as a whole,
When I think of treating the matrix as a whole, I think of linear algebra. Matrix multiplication is repeated application of the inner product, which is in turn a sum over vectorized multiplication. I share David's intuition about this, although it might not be the common one. Steve

The @ operator is meant for matrix multiplication (see PEP 465) and is already used for that in NumPy. IMHO just that is a good enough reason for not using @ as an elementwise application operator (ignoring if having an such an operator is a good idea in the first place). Ronald

On Sun, Feb 3, 2019 at 3:16 PM Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Co-opting operators is pretty common in Python. For example, the `.__div__()` operator spelled '/' is most often used for some kind of numeric division. Some variations on that, for example vectorized in NumPy. And different numeric types operate a bit differently. The name of the magic method obvious suggests division. And yet, in the standard library we have pathlib which we can use like this (from the module documentation):
p = Path('/etc')>>> q = p / 'init.d' / 'reboot'
That use is reasonable and iconic, even if it is nothing like division. The `.__mod__()` operator spelled '%' means something very different in relation to a float or int object versus a string object. I.e. modulo division versus string interpolation. I've even seen documentation of some library that coopts `.__matmul__()` to do something with email addresses. It's not a library I use, just something I once saw the documentation on, so I'm not certain of details. But you can imagine that e.g. : email = user @ domain Could be helpful and reasonable (exact behavior and purpose could vary, but it's "something about email" iconically). In other words, I'm not opposed to using the @ operator in my stringpy.Vector class out of purity about the meaning of operators. I just am not convinced that it actually adds anything that is not easier without it. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I know, but if an element-wise operator is useful it would also be useful for libraries like NumPy that already support the @ operator for matrix multiplication. Using @ both for matrix multiplication and element-wise application could be made to work, but would be very confusing. Ronald — Twitter: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

Ronald Oussoren via Python-ideas wrote:
The way @ is defined in numpy does actually work for both. E.g. v1 @ v2 where v1 and v2 are 3-dimensional arrays is equivalent to multiplying two 1D arrays of 2D matrices elementwise. Is this confusing? Maybe, but it's certainly useful. -- Greg

A bit of history: A fair amount of inspiration (or at least experience) for numpy came from MATLAB. MATLAB has essentially two complete sets of math operators: the regular version, and the dot version. A * B Means matrix multiplication, and A .* B Means elementwise multiplication. And there is a full set of matrix and elementwise operators. Back in the day, Numeric (numpy’s predecessor”) used the math operators for elementwise operations, and doing matrix math was unwieldy. There was a lit of discussion and a number of proosals for s full set of additional operators in python that could be used for matrix operations ( side note: there was (is) a numpy.matrix class that defines __mul__ as matrix multiplication). Someone at some point realized that we didn’t need a full set, because multiplication was really the only compelling use case. So the @ operator was added. End history. Numpy, or course, is but one third party package, but it is an important one — major inconsistency with it is a bad idea. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

@David Mertz <mertz@gnosis.cx> I think I can't explain well my ideas ^^. I'll try to be really detailed so I'm not sure I'm actually saying what I'm thinking. Let's consider the idea of that Vector class this way : Vectors are list of a defined type (may be immutable ?) and adds sugar syntaxing for vectorized operations. Based on this small and not complete enough definition, we should be able to apply any function to that vector. I identify two ways functions are used with vectors : it's either applied on the vector as an iterable/list, or on the elements of this vector. Thus, we need to be have different notations for those two uses. To keep it coherent with Python, if a functions is applied on the vector as an iterable, the vector is given as a parameter :
len(v) # Number of elements in the Vector `v`
If we want to apply a function on each element of the list, we should then use another notations. So far, several have been proposed. In the following example showing the different notations, we use the generic way so we can apply it to user-defined functions :
Another example with parameters
My personal opinion is that the two notations feel good. One is standard, the other is not but is less verbose and it's a good point. Now that I detailed everything in my brain and by mail, I guess we are just saying the same thing ! There's something I didn't mention on purpose, it's the use of : `v.lower()` I think having special cases of how vectors works is not a good idea : it's confusing. If we want the user to be able to use user-defined functions we need a notation. Having something different for some of the functions feels weird to me. And obviously, if the user can't use its own functions, this whole thing is pretty useless. Tell me if I got anything wrong. Nb : I found a way to simplify my previous example using lambda instead of partial. Le dim. 3 févr. 2019 à 21:34, David Mertz <mertz@gnosis.cx> a écrit :

len(v) # Number of elements in the Vector `v`
Agreed, this should definitely be the behavior. So how do we get a vector of lengths of each element?
Also possible is: v.len() We couldn't do that for every possible function, but this one is special inasmuch as we expect the items each to have a .__len__() but don't want to spell the dunders. Likewise for just a handful of other methods/functions. The key different though is that *I* would want to a way to use both methods already attached to the objects/items. in a vector and also a generic user-provided function that operates on the items. I guess you disagree about "method pass-through" but it reads more elegantly to me:
Compare these with: v.replace("a", "b") Since we already know v is a Vector, we kinda expect methods to be vectorized. This feels like the "least surprise" and also the least extra code. Moreover, spelling chained methods with many .appy() calls (even if spelled '@') feels very cumbersome: (A) v.apply(lambda s: s.replace("a", "b")).apply(str.upper).apply(lambda s: s.count("B")) (B) v @ lambda s: s.replace("a", "b") @ str.upper @ lambda s: s.count("B") (C) v.replace("a","b").upper().count("B") Between these, (C) feels a heck of a lot more intuitive and readable to me. Here we put an emphasis on the methods already attached to objects. But this isn't terrible: def double(x): return x*2 v.apply(double).replace("a","b").upper().count("B") In @ notation it would be: v @ double @ lambda s: s.replace("a", "b") @ str.upper @ lambda s: s.count("B") The 'double' is slightly easier, but the method calls are much worse. MOREOVER, the model of "everything is apply/@" falls down terribly once we have duck typing. This is a completely silly example, but it's one that apply/@ simply cannot address because it assumes it is the SAME function/method applied to each object:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2019-02-03 22:58, David Mertz wrote:
Do they need multiple uses of apply and @?
(A) v.apply(lambda s: s.replace("a", "b")).apply(str.upper).apply(lambda s: s.count("B"))
v.apply(lambda s: s.replace("a", "b").upper().count("B"))
(B) v @ lambda s: s.replace("a", "b") @ str.upper @ lambda s: s.count("B")
v @ lambda s: s.replace("a", "b").upper().count("B")
(C) v.replace("a","b").upper().count("B")
Between these, (C) feels a heck of a lot more intuitive and readable to me.
[snip]

I've lost track if who is advocating what, but:
# Replace all "a" by "b"
v.apply(lambda s: s.replace("a", "b"))
I do not get the point of this at all -- we already have map" map(v, lambda s s.replace()"a,", "b") these seem equally expressive an easy to me, and map doesn't require a custom class of anything new at all.
v.replace("a", "b")
This is adding something - maybe just compactness, but I also think readability. I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects. I think a vector strings could be useful and then it would be easier to decide which string methods should be applied to items vs the vector as a whole. If you want to do any generic items, it becomes a lot harder. I think numpy has had the success it has because it assumes all dytpes are numerical and thus support (mostly) the same operations. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Feb 4, 2019, 12:47 AM Christopher Barker
I've lost track if who is advocating what, but:
Well, I made a toy implementation of a Vector class. I'm not sure what that means I advocate other than the existence of a module on GitHub. FWIW, I called the repo 'stringpy' as a start, so that expressed some interest in it being about vectors of strings. But so-far, I haven't found anything that actually needs to be string-like. In general, methods get passed through to their underlying objects and deliberately duck typed, like: v.replace("a", "b")
As an extra, we could enforce homogeneity, or even string-nesss specifically. I don't really know what homogeneity means though, once we consider ABCs, subclasses, and duck types that don't use inheritance on r ABC registration. At least so far, I haven't coded anything that would get a performance gain from enforcing the string-nesss of items (but all pure Python so far, no Cython or C) This is adding something - maybe just compactness, but I also think
readability.
I think with changed methods the win gets greater: v.replace("a", "b").upper().apply(myfun) If you want to do any generic items, it becomes a lot harder.
So far, generic has been a lot easier to code than hand-rolled methods.

On Sun, Feb 03, 2019 at 09:46:44PM -0800, Christopher Barker wrote:
I've lost track if who is advocating what, but:
Ironically, I started this sub-thread in response to your complaint that you didn't like having to explicitly write loops/maps. So I pointed out that in Julia, people can use (almost) ordinary infix syntax using operators and function calls and have it apply automatically to each item in arrays. It wasn't a concrete proposal, just food for thought. Unfortunately the thinking seems to have missed the point of the Julia syntax and run off with the idea of a wrapper class. [...]
I do not get the point of this at all -- we already have map"
map(v, lambda s s.replace()"a,", "b")
The order of arguments is the other way around. And you did say you didn't like map. Wouldn't you rather write: items.replace("a", "b") rather than map(lambda s: s.replace("a", "b"), items) or [s.replace("a", "b") for s in items] I know I would. Provided of course we could distinguish between operations which apply to a single string, and those which apply to a generic collection of strings. Beside, while a single map or comprehension is bearable, more complex operations are horrible to read when written that way, but trivially easy to read when written in standard infix arithmetic notation. See my earlier posts for examples.
Indeed. In Julia that also offers opportunities for the compiler to optimize the code, bringing it to within 10% or so of a C loop. Maybe PyPy could get there as well, but CPython probably can't.
I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects.
Not me. -- Steven

On Sun, Feb 3, 2019, 6:36 PM Greg Ewing
What syntax would you like? Not necessarily new syntax per se, but what calling convention. I can think of a few useful cases. vec1.replace("PLACEHOLDER", vec2) Maybe that would transform one vector using the corresponding strings from another vector. What should happen if the vector length mismatch? I think this should probably be an exception... unlike what zip() and itertools.zip_longest() do. But maybe not. concat = vec1 + vec2 Again the vector length question is there. But assuming the same length, this seems like a reasonable way to get a new vector concatenating each corresponding element. Other uses? Are they different in general pattern?

On Sat, Feb 2, 2019 at 10:00 PM MRAB <python@mrabarnett.plus.com> wrote:
I like that! But I'm not sure if '.self' is misleading. I use an attribute called '._it' already that does exactly this. But since we're asking the length of the list or tuple or set or deque or etc that the Vector wraps, does it feel like it would be deceptive to call them all '.self'? I'm really not sure. I could just rename '._it' to '.self' and get the behavior you show (well, I still need a little checking whether the thing wrapped is a collection or an iterator ... I guess a '.self' property. Or some other name to do that). You remind me that I need to at .__getitem__() too so I can slice and index Vectors. But I know how to do that easily enough. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sat, Feb 02, 2019 at 03:22:12PM -0800, Christopher Barker wrote: [This bit was me]
So it is. I wondered what the cryptic '|S4' symbol meant, and I completely missed the '' quotes around the 1. Thanks for the correction. [...]
Indeed. This hypothetical syntax brings the readability advantages of infix operators to code that operates on iterables, without requiring every iterable to support arbitrary functions and methods. -- Steve

On Sat, Feb 2, 2019, 6:23 PM Christopher Barker
I'm warming up some. But is this imagined as vectors of strings, or as generically homogeneous objects? And what is homogeneity exactly in the face of duck typing? Absent the vector wrapping, I think I might write this for your example: map(lambda s: s..strip().lower().title(), a_list_of_strings) That's slightly longer, but just by the length of the word lambda. One could write a wrapper to vectorize pretty easily. So maybe: Vector(a_list_of_strings).strip().lower().title() This would just pass along the methods to the individual items, and wouldn't need to think about typing per se. Maybe other objects happen to have those three methods, so are string-like in a duck way.

On Fri, Feb 01, 2019 at 07:02:30PM +0300, Kirill Balunov wrote:
I didn't say anything about a vector type. "Vectorization" does not mean "vector type". Please read the link I posted, it talks about what Julia does and how it works. There are two relevant meanings for vectorization here: https://en.wikipedia.org/wiki/Vectorization - a style of computer programming where operations are applied to whole arrays instead of individual elements - a compiler optimization that transforms loops to vector operations Given that none of my examples involved writing loops by hand, I could only be talking about the first. The link I posted has examples which should be fairly clear even if you don't know Julia well.
Of course it makes sense. Even numpy supports inhomogeneous data: py> a = np.array([1, 'spam']) py> a array(['1', 'spam'], dtype='|S4') Inhomogeneous data may rule out some optimizations, but that hardly means that it "doesn't make sense" to use it. Again, if you read the link I posted, they make it clear that Julia can vectorize code which supports any type: "Our function f accepts any x type" I don't know Julia well enough to tell whether it supports inhomogeneous arrays. My experiments suggest that it forces all the elements to a single type. But that's not really the point: you can call the function f on an array of one type (say, Spam), then call it again on an array of another type (say, Eggs). So long as the function supports both Spam and Eggs types, it will just work, without having to re-write your array handling code.
The second one is just `map`. So I can't catch what you are proposing:
I'm not proposing anything, I'm drawing people's attention to something another language does to solve an annoyance that Chris has. If someone else likes that solution and wishes to make a concrete proposal for Python, we can consider it. Otherwise it is just food for thought. It may or may not lead anywhere.
1. To make an operator form of `map`. 2. To pull numpy into stdlib.
I cannot imagine how you got that conclusion from anything I said. I was talking about syntax for vectorization, and didn't mention numpy once. I didn't mention django or beautifulsoup either. I hope that you didn't conclude that I wanted to pull them into the stdlib too. -- Steven

On 2019-02-02 04:32, Steven D'Aprano wrote: [snip]
Of course it makes sense. Even numpy supports inhomogeneous data:
[snip] "inhomogeneous"? Who came up with that? <pendantic> "in-" is a negative prefix in Latin words, but "homogeneous" comes from Greek, where the negative prefix is "a-" (or "an-" before a vowel). I'd go with either "heterogeneous" or "non-homogeneous". </pedantic>

On Sat, Feb 02, 2019 at 02:06:56AM -0500, Alex Walters wrote:
"Television" as a word must annoy you :) I mentally replaced "inhomogeneous" with "heterogeneous"
They don't mean the same thing. https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomoge... -- Steven

On Sat, Feb 02, 2019 at 05:10:14AM +0000, MRAB wrote:
I don't know, but it has been used since at least the early 1920s https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomoge... and the Oxford dictionary describes "inhomogenity" as being used from the late 19th century. So my guess is, probably people who were more familiar with Latin and Greek than we are. There are many words that are derived from both Latin and Greek. There's no rule that says that because a word was derived from Greek, we must use Greek grammatical forms for it. We are speaking English, not Greek, and in English, we can negate words using the "in" prefix. -- Steven

сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano <steve@pearwood.info>:
I didn't say anything about a vector type.
I agree you did not say. But since you started a new thread from the one where the vector type was a little discussed, it seemed to me that it is appropriate to mention it here. Sorry about that.
Yes, numpy, at some degree, supports heterogeneous arrays. But not in the way you brought it. Your example just shows homogeneous array of type `'|S4'`. In the same way as `np.array([1, 1.234])` will be homogeneous. Of course you can say - np.array([1, 'spam'], dtype='object'), but in this case it will also be homogeneous array, but of type `object`.
Inhomogeneous data may rule out some optimizations, but that hardly means that it "doesn't make sense" to use it.
I did not say that it "doesn't make sense". I only said that you should be lucky to call `..method()` on collections of heterogeneous data. And therefore, usually this kind of operations imply that you are working with a "homogeneous data". Unfortunately, built-in containers cannot provide such a guarantee without self-checking. Therefore, in my opinion that at the moment such an operator is not needed. With kind regards, -gdg

@D’aprano I think you’re misleading by what I said, sorry for not being crystal clear. I just read the link on Julia (which I didn’t do) and I get what you mean now and it’s not quite different from what I said. I proposed introducing a new type : « vector » A few steps have been made in Python for typing and I think the next step is having typed collections. Keeping with nothing checked is better imo. So if we take this next step, we’ll get a vector type with *not-guaranteed* homogeneous data. Whether its type is « object » « int » or anything else doesn’t matter as long as it’s supposed to be the same. This doesn’t change anything in term of usage. Of course we should/could use map and usual operators on collections. What I was then proposing, to complete what you suggested and because I don’t like the dot notation, is using the matrix-multiplication the same way it is used in Julia with the dots. But I have a question. I never coded anything at C-level nor a compiler, is this possible for user defined types to make the vectorieation optimized the same way it’s done with numbers in numpy ? If yes, I think it would benefit the community. If no, it’s less likely, though it’s pursuing the steps made with typing On Sat 2 Feb 2019 at 10:23, Kirill Balunov <kirillbalunov@gmail.com> wrote:

On 2019-02-02 09:22, Kirill Balunov wrote:
Here's a question: when you use a subscript on a vector, does it apply to the vector itself, or its members? For example, given:
my_strings = Vector(['one', 'two', 'three'])
what is:
my_strings[1 : ]
? Is it: Vector(['ne', 'wo', 'hree']) or: Vector(['two', 'three']) ?

On 2019-02-02 17:31, Adrien Ricocotam wrote:
I personally would the first option to be the case. But then vectors shouldn't be list-like but more generator like.
OK, here's another one: if you use 'list(...)' on a vector, does it apply to the vector itself or its members?
list(my_strings)
You might be wanting to convert a vector into a list: ['one', 'two', 'three'] or convert each of its members onto lists: Vector([['one'], ['two'], ['three']]) present for that hardly

That's tough. I'd say conver the vector to a list. But : my_vector.list() Would apply list on each element of the vector. Globally, I'd say if the vector is used as an argument, it's a usual iterable, if you use a member function (or any other notation like @ or .. or whatever) it's like map. Note that it's just my opinion. Le sam. 2 févr. 2019 à 19:46, MRAB <python@mrabarnett.plus.com> a écrit :

On 02/02/2019 18:44, MRAB wrote:
More likely you mean:
[list(i) for i in ['one', 'two', 'three']] [['o', 'n', 'e'], ['t', 'w', 'o'], ['t', 'h', 'r', 'e', 'e']]
The problem, of course, is that list() now has to understand Vector specially, and so does any function you think of applying to it. Operators are easier (even those like [1:]) because Vector can make its own definition of each through (a finite set of) dunder methods. To make a Vector accept an arbitrarily-named method call like my_strings.upper() to mean:
[i.upper() for i in ['one', 'two', 'three']] ['ONE', 'TWO', 'THREE']
is perhaps just about possible by manipulating __getattribute__ to resolve names matching methods on the underlying type to a callable that loops over the content. Jeff

On 2019-02-02 12:31, David Mertz wrote:
I still haven't seen any examples that aren't already spelled 'map(fun, it)'
The problem with this is the same problem with having a function called "add" instead of an operator. There is little gain when you're applying ONE function, but if you're applying multiple functions you get a thicket of parentheses. I would rather see this: some_list @ str.lower @ tokenize @ remove_stopwords . . .than this: map(remove_stopwords, map(tokenize, map(str.lower, some_list))) That said, I don't necessarily think this needs to be added to the language. Things like pandas already provide this and so much more that it's unclear whether the gain from adding vectorization on its own would be worth it. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Sat, 2 Feb 2019, 21:46 Brendan Barnwell <brenbarn@brenbarn.net wrote: Yeah, it's called pip install funcoperators :
some_list @ str.lower @ tokenize @ remove_stopwords
→ some_list @ to(str.lower) @ to(tokenize) @ to(remove_stopwords) Where from funcoperators import postfix as to

On Sat, Feb 02, 2019 at 03:31:29PM -0500, David Mertz wrote:
I still haven't seen any examples that aren't already spelled 'map(fun, it)'
You might be right. But then there's nothing that map() can do that couldn't be written as a comprehension, and nothing that you can't do with a comprehension that can't be written as a for-loop. And nothing that can't be written as a for-loop that couldn't be written as a while-loop. The only loop construct we really need is a while loop. And even that is redundant if we had GOTO. Its not about the functionality, but expressibility and readability. This hypothetical vectorization syntax might have a performance advantage as well. My understanding is that Julia is able to efficiently vectorize code, bringing it to within 10% of the speed of unrolled C loops. It may be that CPython cannot do anything that fast, but there may be some opportunities for optimization that we cannot apply to for-loops or comprehensions due to the way they are defined. But primarily it is about the readability of the code: result = process.(vector .+ sequence) .* items versus: # Ouch! result = map(operator.mul, zip(map(process, map(operator.add, zip(vector, sequence)), items)) Here's the comprehension version: result = [a*b for a, b in zip( [process(c) for c in [d+e for d, e in zip(vector, sequence)]], items)] We can improve that comprehension a tiny bit by splitting it into multiple steps: temp1 = [d+e for d, e in zip(vector, sequence)] temp2 = [process(c) for x in temp1] result = [a*b for a, b in zip(temp2, items)] but none of these are as elegant or readable as the vectorized syntax result = process.(vector .+ sequence) .* items -- Steve

On 2019-02-02 18:11, Steven D'Aprano wrote:
The following reads a little better: | result = [ | process(v+s)*i | for v, s, i in zip(vector, sequence, items) | ] Vector operations will promote the use of data formats that work well with vector operations. So, I would expect data to appear like rows in a table, rather than in the columnar form shown above. Even if columnar form must be dealt with, we can extend our Vector class (or whatever abstraction you are using to enter vector space) to naturally zip() columns. | Vector(zip(vector, sequence, items)) | .map(lambda v, s, i: process(v+s)*i) If we let Vector represent a list of tuples instead of a list of values, we can make construction simpler: | Vector(vector, sequence, items) | .map(lambda v, s, i: process(v+s)*i) If we have zip() to extend the tuples in the Vector, then we can be verbose to demonstrate how to use columnar data: | Vector(vector) | .zip(sequence) | .map(operator.add) | .map(process) | .zip(items) | .map(operator.mul) This looks verbose, but it is not too far from the vectorized syntax: the Vector() brings us to vector mode, and the two zip()s convert from columnar form. This verbose form may be *better* than the vectorized syntax because the operations are in order, rather than the mixing infix and functional forms seen in the vectorized syntax form. I suggest this discussion include vector operations on (frozen) dicts/objects and (frozen) lists/tuples. Then we can have an interesting discussion about the meaning of group_by, join, and window functions, plus other operations we find in database query languages. I am interested in vector operations. I have situations where I want to perform some conceptually simple operations on a series of not-defined-by-me objects to make a series of conclusions. The calculations can be done succinctly in SQL, but Python makes them difficult. Right now, my solution is to describe the transformations in JSON, and have an interpreter do the processing: https://github.com/klahnakoski/SpotManager/blob/65f2c5743f3a9cfd1363cafec258...

On Sun, Feb 10, 2019 at 10:06 AM Kyle Lahnakoski <klahnakoski@mozilla.com> wrote:
I am interested in vector operations. I have situations where I want to
So I want to point out that is was proposed way back when for numpy: MATLAB for instance, has a the usual operators: *, +, etc meaning "matrix math", and then another set of "itemwise" operators with a "dot form: .*, .+ . - for "itemwise" math. numpy, on the other other had uses the regular operators for itemwise operations (what we're calling vectorized here), and Python lacked an extra set of operators that could be used for matrix math. Adding another full set (.*, .+, etc) was discussed A LOT and the Python community did not want that. Then someone had the brilliant observation that matrix multiplication was teh only one that was really useful and presto! the @ operator was born. Anyway -- just suggesting that a full set of "vectorized" operators will liley see a lot of resistance. And for my part, having mean the opposite of what it does for numpy would be unfortunate as well. perform some conceptually simple operations on a series of not-defined-by-me objects to make a series of conclusions. The calculations can be done succinctly in SQL, but Python makes them difficult. Bringing real world examples of this would be a good idea for this discussion I'm inclined to think that something like pandas (maybe more generally SQL -like that the number crunching focus of Pandas) might be better than new syntax for the language -- but only real examples will tell. I don't work with data like that much, but I"m pretty sure I've seen Python packages that to attempt to address these use cases. (that being querying and processing tabular data) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 2019-02-10 18:30, Steven D'Aprano wrote:
Can you post a simplified example of how you would do it in SQL, compared to what you would have to do in standard Python?
Can I do the same in standard Python? If I did, then I would use Pandas: it has groupby, and some primitive joining, and window functions may come naturally because of its imperative nature, but I have not tried it. If I can not use Pandas, then I would write the groupby and window functions and call them in sequence. This is similar to what you see in my code now: a number of properties who's values get dispatched to Python functions. My code is more complicated only because those structures can be dispatched to translators for databases too. I am certain there are many variations of groupby out in the wild, and it would be nice to have the concept standardized when/if Python has vector operations. Join would be nice to have too, but I do not use it much; dictionary lookup seems to fill that need. Window functions (which are like mini queries) are powerful, but like Pandas, may end up end up being free because Python is imperative. My code I pointed to has two parts. Here is the first part in SQL (well, an approximation of SQL since I did not test this, and now I am rusty). A detailed description is below | WITH time_range AS ( | SELECT | num | FROM | all_integers | WHERE | num % 60 =0 AND | num >= floor(<<now>>/60/60)*60*60-<<start_Of_history>> AND | num < floor(<<now>>/60/60) + 60*60 | ) | SELECT | availability_zone, | instance_type, | time_range.num AS time | MAX(price) as PRICE, | COUNT(1) AS `COUNT`, | LAST(current_price) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS current_price | FROM | ( | SELECT | *, | COALESCE(LAG(timestampvalue, 1), <<end_of_day>>) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS expire, | timestamp-<<expected_uptime>> AS effective | FROM | prices | ) temp | RIGHT JOIN | time_range ON time_range.num BETWEEN temp.effective AND temp.expire | GROUP BY | availability_zone, | instance_type, | time_range.num AS time | WHERE | expire > floor(<<now>>/60/60)*60*60 - <<start_Of_history>> Now, for the same, with description: This WITH clause is not real SQL; it is meant to stand in for a temporary table that contains all hours of the time range I am interested. Definitely easier to do in Python. All time is assumed to be in seconds since epoch. | WITH time_range AS ( | SELECT | num | FROM | all_integers | WHERE | num % 60 =0 AND | num >= floor(<<now>>/60/60)*60*60-<<start_of_history>> AND | num < floor(<<now>>/60/60) + 60*60 | ) We will select the three dimensions we are interested in (see GROUP BY below), along with the MAX price we have seen in the given hour, and the current_price for any (availability_zone, instance_type) pair. | SELECT | availability_zone, | instance_type, | time_range.num AS time | MAX(price) as PRICE, | COUNT(1) AS `COUNT`, | LAST(current_price) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS current_price | FROM The prices coming from Amazon only have a timestamp for when that price is effective; so this sub-query adds an `effective` start time, and an `expire` time so the rest of the query need only deal with ranges. The timestamp-<<expected_uptime>> is putting the start time back further into the past so the past can "see" future pricing. | ( | SELECT | *, | COALESCE(LAG(timestamp, 1), <<end_of_day>>) OVER ( | PARTITION BY | availability_zone, | instance_type | ORDER BY | timestamp | ) AS expire, | timestamp-<<expected_uptime>> AS effective | FROM | prices | ) temp This is the point where we use the time_range from above and find every hour a price is effective. This could have been a sub-query, but I am rusty at SQL | RIGHT JOIN | time_range ON time_range.num BETWEEN temp.effective AND temp.expire These are the three dimensions we are interested in | GROUP BY | availability_zone, | instance_type, | time_range.num AS time and we are only interested in calculating back to a certain point | WHERE | expire > floor(<<now>>/60/60)*60*60 - <<start_Of_history>>

Do take a look in the fairly recent archives of this list for a big discussion of groupby -- it kind of petered out but there were a couple options on the table. -CHB On Sun, Feb 10, 2019 at 9:23 PM Kyle Lahnakoski <klahnakoski@mozilla.com> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

CHB, Thank you! I had forgotten that discussion at the beginning of July [1]. Googling the list [2] also shows mention of PythonQL [3], which may point to use cases that can guide a Vectorization idea. [1] groupby discussion - https://mail.python.org/pipermail/python-ideas/2018-July/051786.html [2] google search - https://www.google.ca/search?q=group+by+site%3Ahttps%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-ideas%2F&oq=group+by+site%3Ahttps%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-ideas%2F [3] PythonQL - https://github.com/pythonql/pythonql On 2019-02-11 10:43, Christopher Barker wrote:

On Sat, Feb 02, 2019 at 07:58:34PM +0000, Jeff Allen wrote: [MRAB asked]
OK, here's another one: if you use 'list(...)' on a vector, does it apply to the vector itself or its members?
With the Julia vectorization operator, there is no puzzle there. list(vector) applies list to the vector itself. list.(vector) applies list to each component of vector.
The problem, of course, is that list() now has to understand Vector specially, and so does any function you think of applying to it.
*The whole point* of the Julia syntax is that no function has to understand any sequence. When we write: for item in vector: func(item) func only has to understand item, not vector. The same applies to the Julia syntax func.(vector) There's no puzzle here, no tricky cases, because it is completely deterministic and explicit: func(x) always calls func with x as argument, func.(x) always calls func with each of x's items as arguments.
With the Julia syntax, there is no need for vectors (or lists, or generators, or tuples, or sets, or any other iterator...) to accept arbitrary method calls. So long as vectors can be iterated over, func.(vector) will work.

Beyond possibly saving 3-5 characters, I continue not to see anything different from map in this discussion. list(vector) applies list to the vector itself.
list.(vector) applies list to each component of vector.
In Python: list(seq) applies list to the sequence itself map(list, seq) applies list to each component of seq In terms of other examples: map(str.upper, seq) uppercases each item map(operator.attrgetter('name'), seq) gets the name attribute of each item map(lambda a: a*2, seq) doubles each item (lambda a: a*2)(seq) doubles the sequence itself ... Last two might enjoy named function 'double'

On Sat, Feb 02, 2019 at 06:08:24PM -0500, David Mertz wrote:
Now compose those operations: ((seq .* 2)..name)..upper() versus # Gag me with a spoon! map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq))) The comprehension version isn't awful: [(a*2).name.upper() for a in seq] but not all vectorized operations can be written as a chain of calls on a single sequence. There are still some open issues that I don't have good answers for. Consider ``x .+ y``. In Julia, I think that the compiler has enough type information to distinguish between the array plus scalar and array plus array cases, but I don't think Python will have that. So possibly there will still be some runtime information needed to make this work. The dot arguably fails the "syntax should not look like grit on Tim's monitor" test (although attribute access already fails that test). I think the double-dot syntax looks like a typo, which is unfortunate. -- Steve

On Sun, Feb 3, 2019 at 10:31 AM Steven D'Aprano <steve@pearwood.info> wrote:
Agreed, so I would like to see a different spelling of it. Pike has an automap syntax that looks a lot like subscripting: numbers[*] * 2 Borrowing that syntax would pass the grit test, and it currently isn't valid syntax. ChrisA

On Sat, Feb 2, 2019 at 3:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
If they are strictly parallel (no dot products) and you know when writing the code which variables hold vectors, then (denoting the vector variables by v1, ..., vn) you can always write [(expr with x1, ..., xn substituted for v1, ..., vn) for x1, ..., xn in zip(v1, ..., vn)] which seems not much worse than the auto-vectorized version (with or without special syntax). Haskell (GHC) has parallel list comprehension syntax ( https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts...) so you don't have to explicitly call zip. I wouldn't mind having that in Python but I don't know what the syntax would be.

On 1/31/2019 12:51 PM, Chris Barker via Python-ideas wrote:
To me, thinking of strings as being in lists is Python 1 thinking. Interactive applications work with *streams* of input strings.
I think an iterator (stream) of strings would be better. Here is a start. class StringIt: """Iterator of strings. A StringIt wraps an iterator of strings to provide methods that apply the corresponding string method to each string in the iterator. StringIt methods do not enforce the positional-only restrictions of some string methods. The join method reverses the order of the arguments. Except for join(joiner), which returns a single string, the return values are iterators of the return value of the string methods. An iterator of strings is returned as a StringIt so that further methods can be applied. """ def __init__(self, objects, nogen=False): """Return a wrapped iterator of strings. Objects must be an iterator of strings or an iterable of objects with good __str__ methods. All builtin objects have a good __str__ methods and all non-buggy user-defined objects should. When *objects* is an iterator of strings, passing nogen=True avoids an layer of wrapping by claiming that str calls are not needed. StringIt methods that return a StringIt do this. An iterable of strings, such as ['a', 'b', 'c'], can be turned into an iterator with iter(iterable). Users who pass nogen=True do so at their own risk because checking the claim would empty the iterable. """ if not hasattr(objects, '__iter__'): raise ValueError('objects is not an iterable') if nogen and not hasattr(objects, '__next__'): raise ValueError('objects is not an iterator') if nogen: self.it = objects else: self.it = (str(ob) for ob in objects) def __iter__(self): return self.it.__iter__() def __next__(self): return self.it.__next__() def upper(self): return StringIt((s.upper() for s in self.it), nogen=True) def count(self, sub, start=0, end=None): return (s.count(sub, start, end or len(s)) for s in self.it) def join(self, joiner): return joiner.join(self.it) for si, out in ( (StringIt(iter(('a', 'b', 'c')), nogen=True), ['a', 'b', 'c']), (StringIt((1, 2, 3)), ['1', '2', '3']), (StringIt((1, 2, 3)).count('1'), [1, 0, 0]), (StringIt(('a', 'b', 'c')).upper(), ['A', 'B', 'C']), ): assert list(si) == out assert StringIt(('a', 'b', 'c')).upper().join('-') == 'A-B-C' # asserts all pass -- Terry Jan Reedy

I really don't get the "two different signatures" concern. The two functions do different things, why would we expect them to automatically share a signature. There are a zillion different open() functions or methods in the standard library, and far more in third party software. They each have various different signatures and functionality because they "open" different things. So what? Use the interface to the function you are using, not to something else that happens to share a name (in a different namespace). On Wed, Jan 30, 2019, 5:06 AM Jamesie Pic <jpic@yourlabs.org wrote:

I'm just saying assembling strings is
a common programing task and that we have two different methods with the same name and inconsistent signatures
No, we don’t — one is for assembling paths, one for generic strings. And in recent versions, there is a new totally different way to assemble paths. Also: the primary use cases are diffferent — when I use os.path.join(), I usually have the components in variables or literals, so the *args convention is most natural. When I am assembling text with str.join() I usually have the parts in an iterable, so that is the most natural. And besides, Python (necessarily) has some inconsistencies — we don’t need to correct them all. There have been multiple changes to str.join() discussed in this thread. Mostly orthogonal to each other. If anyone wants to move them forward, I suggest you be clear about which you are advocating for. 1) that there be a join() method on lusts ( or sequences) — frankly, I think that’s a non-starter, I wouldn’t waste any more time on it. 2) that str.join() take multiple positional arguments to join (similar to os.path.join) — This could probably be added without much disruption, so if you really want it, make your case. I personally don’t think it’s worth it — it would make the API more confusing, with little gain. 3) that str.join() (or some new method/function) “stringify” (probably by calling str() ) the items so that non strings could be joined in one call — we’ve had a fair bit of discussion on this one, and given Python’s strong typing and the many ways one might want to convert an arbitrary type to a string, this seems like a bad idea. Particularly bad to add to str.join() (Or was “stringify” supposed to only do the string conversion, not the joining? If so, even more pointless) Any others? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Thank you Christopher for the heads up. Using paths as an example were really poor, they distract readers from the actual problem that is assembling a human readable string. Maybe pointless, but on the basis of 30 seconds to run my test suite, see my failure, correct it, and run it again to be at the same point as if I didn't mistake, 300 workdays a year, spans 25 hours over 10 years, and I have already strategized my R&D to capitalize on python for another 10 years. So spending less than 25 hours on this would seem profitable, despite how pointless it is to actual programmers. Anyway, at this point the proposal could also look like str.joinmap(*args, key=str). But I don't know, I can iterate on mapjoin for a while and open a new topic when I stabilize it. Meanwhile, I'm always happy to read y'all so feel free to keep posting :P Have a great day

On 1/28/2019 8:40 PM, Jamesie Pic wrote:
0. os.path.join takes *args
Since at least 1 path is required, the signature is join(path, *paths). I presume that this is the Python version of the Unix version of the system call that it wraps.. The hidden argument is os.sep. It is equivalent to os.sep.join((path,)+paths) (though one would not write it this way).
1. str.join takes a list argument,
This premise behind the (repeated) request is a false. str.joins arguments are a string (the joiner) and an *iterable of strings*, which is an abstract subclass of the abstract concept 'iterable'. And only a small fraction of lists are lists of strings and therefore iterables of strings. -- Terry Jan Reedy

On 2019-01-29 23:30, Terry Reedy wrote:
One the examples given was writing:
'/'.join('some', 'path')
To me, this suggests that what the OP _really_ wants is for str.join to accept multiple arguments, much as os.path.join does. I thought that there would be a problem with that because currently the single argument is an iterable, and you wouldn't want to iterate the first argument of '/'.join('some', 'path'). However, both min and max will accept either a single argument that's iterated over or multiple arguments that are not, so there's a precedent there.

On 1/29/2019 7:12 PM, MRAB wrote:
I have done things like this in private code, but it makes for messy signatures. The doc pretends that min has two signatures, given in the docstring: min(iterable, *[, default=obj, key=func]) -> value min(arg1, arg2, *args, *[, key=func]) -> value I believe that the actual signature is the uninformative min(*args, **kwargs). The arg form, without key, is the original. If min were being written today, I don't think it would be included. -- Terry Jan Reedy
participants (31)
-
Adrien Ricocotam
-
Alex Shafer
-
Alex Walters
-
Anders Hovmöller
-
Barry Scott
-
Ben Rudiak-Gould
-
Brendan Barnwell
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Christopher Barker
-
David Allemang
-
David Mertz
-
Eric V. Smith
-
Greg Ewing
-
Henry Chen
-
James Lu
-
Jamesie Pic
-
Jeff Allen
-
Jimmy Girardet
-
Jonathan Fine
-
Kirill Balunov
-
Kyle Lahnakoski
-
Marcos Eliziario
-
MRAB
-
Robert Vanden Eynde
-
Ronald Oussoren
-
Ronie Martinez
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy