More alternate constructors for builtin type

Constructors for builtin types is too overloaded. For example, int constructor: * Converts a number (with truncation) to an integer. * Parses human readable representation of integer from string or bytes-like object. Optional base can be specified. Note that there is an alternate constructor for converting bytes to int using other way: int.frombytes(). * Without arguments returns 0. str constructor: * Converts an object to human-readable representation. * Decodes a bytes-like object using the specified encoding. * Without arguments returns an empty string. bytes constructor: * Converts a bytes-like object to a bytes object. * Creates a bytes object from an iterable if integers. * Encodes a string using the specified encoding. The same as str.encode(). * Creates a bytes object of the specified length consisting of zeros. Equals to b'\0' * n. dict constructor: * Creates a dict from a mapping. * Creates a dict from an iterable of key-value pairs. * Without arguments returns an empty dict. The problem of supporting many different types of input is that we can get wrong result instead of error, or that we can get error later, far from the place where we handle input. For example, if our function should accept arbitrary bytes-like object, and we call bytes() on the argument because we need the length and indexing, and we pass an integer instead, we will get an unexpected result. If our function expects a number, and we call int() on the argument, we may prefer to get an error if pass a string. I suggest to add limited versions of constructors as named constructors: * int.parse() -- parses string or bytes to integer. I do not know whether separate int.parsestr() and int.parsebytes() are needed. I think round(), math.trunc(), math.floor() and math.ceil() are enough for lossy converting numbers to integers. operator.index() should be used for lossless conversion. * bytes.frombuffer() -- accepts only bytes-like objects. * bytes.fromvalues() -- accepts only an iterable if integers. * dict.frommapping() -- accepts only mapping, but not key-value pairs. Uses __iter__() instead of keys() for iterating keys, and can take an optional iterable of keys. Equals to {k: m[k] for k in m} or {k: m[k] for k in keys}. * dict.fromitems() -- accepts only key-value pairs. Equals to {k: v for k, v in iterable}.

20-25 years ago this might have been a good idea. Unfortunately there's so much code (including well-publicized example code) that I'm not sure it's a good use of anyone's time to try and fix this. Exception: I am often in need of a constructor for a bytes object from an integer using the decimal representation, e.g. bytes.fromint(42) == b"42". (Especially when migrating code from Python 2, where I've found a lot of str(n) that cannot be translated to bytes(n) but must instead be written as b"%d" % n, which is ugly and unintuitive when coming from Python 2.) On Mon, May 6, 2019 at 2:50 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

The other bytes object constructor I often find myself in need of without being able to remember how to do it is creating a a length 1 bytes object from a known ordinal. The "obvious": someordinal = ... bytes(someordinal) creates a zeroed bytes of that length, which is clearly wrong. I eventually remember that wrapping it in a tuple (or list) before passing to the bytes constructor works, but it's far from intuitive: bytes((someordinal,)) Unfortunately, the most obvious name for the alternate constructor to fill this niche is *also* bytes.fromint, which conflicts with Guido's use case. On Mon, May 6, 2019 at 2:40 PM Guido van Rossum <guido@python.org> wrote:

06.05.19 17:49, Guido van Rossum пише:
I do not propose to change the current behavior. I propose to add new named constructors. In most cases default constructors can be used, but in cases when we use type check or different tricks to limit the type of the argument, we could use named constructors. Current named constructors: * dict.fromkeys() * int.from_bytes() * float.fromhex() * bytes.fromhex() * bytearray.fromhex()

On Mon, May 6, 2019 at 11:14 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
Understood. My point is that we won't be able to remove the original behavior, so we'd end up with two ways to do it. :-( -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Mon, May 6, 2019 at 7:29 PM Guido van Rossum <guido@python.org> wrote:
With all respect, I disagree. There are ways to evolve Python such as deprecation policies which proven to be effective. There are ways to monitor current features adoption on PyPI to see whether it is safe to remove deprecated things. I'd understand if some feature is not accepted to Python if it is kinda bad. What I refuse to accept as a user is that behavior considered bad and ready to be improved is preserved through time just because it is there already. Please, get me right. I totally agree that this will bring up two ways of performing the same thing but we can deprecate one of them, keep track of the new way adoption and finally get Python to a better state if it is really desired.

On Mon, May 6, 2019 at 7:48 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 6 May 2019 19:39:39 +0300 Serge Matveenko <s@matveenko.ru> wrote:
I have no intention to start a long hypothetical discussion here, really. There are a lot of things which were broken at some point even despite 2to3 crusade. Not to count: `except` syntax, restriction of `async` keyword, u-strings forth and back. Usually, It doesn't matter much why one cannot upgrade the interpreter to the next version. Often, It just stops working and forces a user to dig into dependencies mess. I agree that there is no hope in making a change when there is no intention to make this change. If this change is needed there are ways to achieve that. The path could be almost infinite but it surely cannot be walked if nobody willing to take it.

пн, 6 мая 2019 г. в 19:48, Antoine Pitrou <solipsis@pitrou.net>:
Especially `bytes` constructor: `bytes(int)`, it is so convenient and obvious that in the entire Python standard library it is used only in tests of its own existence: 'cpython-master/Lib/test/test_bytes.py' on line: 1023' 'cpython-master/Lib/test/test_bytes.py' on line: 1234'. Currently, `bytes` is the most ambiguous resident in Python 3. It seems that in the final version (which is very different from the initial idea), it was created to surprise: 1. Those wishing to take it as a string in Python 2 2. Those who wish to perceive it as an array of integers from 0 to 255. 3. And those who want to see `bytes` as something else... with kind regards, -gdg

On 06May2019 18:47, Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't find that compelling. I for one would welcome a small suite of unambiguous factories that can't be misused. bytes() can easily be misused by accident, introducing bugs and requiring debug work. I'd be very happy for my own future code to be able to take advantage of hard to misuse constructors. Of course we could all write tiny factories for these modes but (a) we'd all have to write and debug them and (b) they'de all have different spellings and signatures and (c) everyone else would not have easy access to them (yes, PyPI notwithstanding: finding a specific module in there can be haphazard) and (d) the bytes type is a natural place to have these constructors/factories. All these arguments apply to the other types too. Cheers, Cameron Simpson <cs@cskk.id.au>

On Tue, May 07, 2019 at 09:54:03AM +1000, Cameron Simpson wrote:
There is a difference between *adding* new constructor methods, and what Antoine is saying: that we cannot realistically remove existing uses of the current constructors. I think that Antoine is right: short of another major 2to3 backwards- incompatible version, the benefit of actually removing any of the built-in constructor behaviours is too small and the cost is too great. So I think removal of existing behaviour should be off the table. Or at least, taken on a case-by-case basis. Propose a specific API you want to remove, and we'll discuss that specific API. As for adding *new* constructors:
Probably because everyone will want them to do something different. We've already seen two different semantics for the same desired constructor call: bytes(10) -> b'10' # like str(), but returning bytes bytes(10) -> b'\x0A' # like ord(), but returning a byte That suggests a possible pair of constructors: bytes.from_int(n) -> equivalent to b'%d' % n bytes.ord(n) -> equivalent to bytes((n,)) The proposal in this thread seems to me to be a blanket call to add new constructors everywhere, and I don't think that's appropriate. I think that each proposed new constructor should live or die on its own merits. The two above for bytes seem like simple, obvious APIs that do something useful which is otherwise a small pain point. Both are syntactic sugar for something otherwise ugly or hard to discover. I think that, if somebody is willing to do the work (it can't be me, sorry) adding two new class methods to bytes for the above two cases would be a nett win, and they should be minor enough that it doesn't need a PEP. Thoughts? -- Steven

bytes.ord is a bad name, given the behavior would be the opposite of ord (ord converts length one str to int, not int to length one str). PEP467 (currently deferred to 3.9 or later) does have proposals for this case, either bytes.byte (old proposal: https://legacy.python.org/dev/peps/pep-0467/#addition-of-explicit-single-byt... ) or bytes.fromord/a top level built-in named bchr in the new version of the PEP ( https://www.python.org/dev/peps/pep-0467/#addition-of-bchr-function-and-expl... ). So if that's the way we want to go, we could just push forward on PEP467. It's only a subset of Serhiy's broader proposal, though admittedly one of the cases where the existing design is unusually weak and improvements would better fill niches currently occupied by non-obvious solutions. On Tue, May 7, 2019 at 12:23 AM Steven D'Aprano <steve@pearwood.info> wrote:

Oddly, it seems everyone in this thread thinks it would be "Better" to have a bunch of constructors, ratehr than the overloading, of only we didn't have backward compatibility to worry about. I disagree -- these efficiencies are what I LIKE about python: int() tries to turn whatever you pass into into an integer -- a float, a string, whatever. Do people really think that it would be better if we all had to write: int.parse("123") Rather than just int("123") ? I know my newbie students wouldn't like that. The one exception to all this in my mind is bytes -- which has come up the most in this discussion. But that's not because the bytes constructor is overloaded but rather that the bytes object is overloaded: is is storage for binary data, which its name seems to imply? or is an old-style text type? It's probably too late to do anything about that, but if we do want to "clean up" something, that would be it. """ str(n) that cannot be translated to bytes(n) but must instead be written as b"%d" % n, """ if bytes were really well, bytes, then that would be: bytes((ord(c) for c in str(n))) or str(n).encode('ascii') ugly, I know. I kinda wish we'd simply kept a single_byte_string object (essentially a py2 string) as its own thing, separate from a bytes object, which could then keep its nominal meaning -- a sequence of individual arbitrary bytes. Oh well, lots of water under that bridge..... -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, 7 May 2019 at 06:42, Christopher Barker <pythonchb@gmail.com> wrote:
It depends whether you want to have an exception if the wrong type of input is encountered. The alternative to x=int.parse(s) might instead be if not isinstance(s, str): raise TypeError('need a string') x = int(s) One of the things I like about Python compared to e.g. C is that it often tells me when something is going wrong in some code rather than silently doing the wrong thing. The int function accepts all kinds of things e.g. >>> int('๒') 2 However in my own code if that character ever got passed to int then it would definitely indicate either a bug in the code or data corruption so I'd rather have an exception. Admittedly the non-ASCII unicode digit example is not one that has actually caused me a problem but what I have had a problem with is floats. Given that a user of my code can pass in a float in place of a string the fact that int(1.5) gives 1 can lead to bugs or confusion. For precisely these reasons there is the __index__ method to get around this for the situations where you want to convert a non-int integer type to int (without accepting strings or floats). There is math.trunc for the situation where you are not parsing strings and actually do want to accept non-integer numbers and truncate them. There is no function to parse a decimal integer string without also accepting floats though. -- Oscar

On Tue, May 07, 2019 at 07:17:00PM +0100, Oscar Benjamin wrote:
If you ever get Thai users who would like to enter numbers in their own language, they may be a tad annoyed that you consider that a bug.
Admittedly the non-ASCII unicode digit example is not one that has actually caused me a problem
I don't see why it should cause a problem. An int is an int, regardless of how it was spelled before conversion. You probably don't lose any sleep over the relatively high probability of a single flipped bit changing a '7' digit into a '6' digit, say. It seems strange to worry about the enormously less likely data corruption which just so happens to result in valid non-ASCII digits. If your application can support user-data of "123" for the int 123, why would it matter if the user spelled it '๑๒๓' instead? You're not obligated to output Thai digits if you don't have many Thai users, but it just seems mean to reject Thai input if It Just Works.
So write a helper function and use that. Or specify a base: int(string, 0) will support the usual Python formats (e.g. any of '123', '0x7b', '0o173', '0b1111011') without converting non-strings. If for some reason you only want to support a single base, say, base 7, you can specify a non-zero argument as the base: py> int('234', 7) 123 But user input seems like a good place to apply Postel's Law: "Be conservative in what you output, be liberal in what you accept." It shouldn't be any skin off your nose to accept '0x7b' or '๑๒๓' as well as '123'. -- Steven

On Tue, May 07, 2019 at 12:57:49AM +0000, Josh Rosenberg wrote:
bytes.ord is a bad name, given the behavior would be the opposite of ord (ord converts length one str to int, not int to length one str).
D'oh! I mean, well done, that was a test and you passed! *wink* Sorry for the confusion, yes, I got ord/chr confused in my head.
My personal sense is that generating a single byte from an ordinal is not common enough to justify a new builtin, but I don't have a strong opposition to the concept as such. But I am very adverse to the suggested name. Not only is "bchr" inelegant, but now that regular strings are Unicode, we really ought to avoid perpetuating the idea that single bytes are characters and text strings are bytes more than we really have to. -- Steven

On Wed, May 08, 2019 at 12:18:34AM +1200, Greg Ewing wrote:
Don't be shy, give us your suggested name. I'm not married to the suggestion of "from_int", if somebody with better skills at naming than me can come up with a short but descriptive name that would be great. But my other suggestion was going to be: bytes.return_bytes_representing_the_integer_input_as_an_ascii_bytestring() *wink* But seriously, its nice when a method of function name is completely self-explanatory, and some day I hope to find one which is. But until then, names are more like mnemonics: - dict.fromkeys returns a dict created from keys in what way? - str.encode encodes a string into what kind of data? - bytes.fromhex does what precisely? - what on earth is a glob? We say "import spam", not "import module spam and bind it to the name spam". We use "import" as a short name that, once we've learned what it does, reminds us what it does. It doesn't have to document every bit of functionality, just enough to associate the name with the semantics. In this case, the problem we are trying to solve is this: - convert an int like 12345 into a byte-string like b'12345'. Imagine you're looking at the dir(bytes) to find an appropriate method that perhaps does what you want, and you see methods like: - count - decode - fromhex - isdigit - split etc, plus the suggested "fromint". Which method are you going to associate with the problem you're trying to solve? -- Steven

On 08May2019 00:18, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
How about from_size(n) and from_ord(n)? The former to make a NUL filled bytes of size n, the latter to make a single byte bytes with element 0 having value n. Preemptively one could argue for from_size having an optional fill value, default 0. I think I'm -0 on that because I can't imagine a likely use case and it would slightly slow down every use. I think I'd argue for a from_iter instead, to support an arbitrary fill pattern/sequence. That avoids slowing the common case and provides flexibility at the same time. Cheers, Cameron Simpson <cs@cskk.id.au>

On Wed, May 08, 2019 at 08:14:50AM +1000, Cameron Simpson wrote:
We already have from_size, it's just spelled bytes(n). I don't dislike from_ord as a name, although perhaps it ought to be fromord to match fromhex.
Wanting to fill a bytes object with something other than zeroes is probably uncommon. But for those who need it '\xFF'*n should do the job.
We already have that, its spelled bytes(iterable). py> bytes(range(5)) b'\x00\x01\x02\x03\x04' -- Steven

On Mon, May 06, 2019 at 07:39:39PM +0300, Serge Matveenko wrote:
Monitoring adoption on PyPI tells you absolutely nothing about the millions of lines of Python code which is not on PyPI. Not every Python programmer writes open source software available to the public. They are not second-class citizens -- we have a responsibility to them too, and that's why we don't break backwards-compatibility lightly.
Oh, you "refuse to accept" it do you? How nice. Compared to languages like C that have ISO standards, Python's attitude towards removing old features might be seen as recklessly fast. You should read this to get another perspective: https://www.curiousefficiency.org/posts/2011/04/musings-on-culture-of-python...
Everything you say there is true in principle. That doesn't mean it will happen in practice. For what it's worth, I'm less concerned than Guido is about having two ways to get the same result. There's two (or three, or ...) ways to do many things in Python and "Only One Way To Do It" has never been part of the Zen of Python, it was a put-down from Perl users misrepresenting Python's philosophy. But duplication has costs: more to learn, more decisions to make, more code, more tests, more documentation. Duplicated APIs can become bloat, and bloat is not free. If a function or method doesn't bring *at least* enough benefit to outweigh the costs, then it is harmful. Code churn is not free. Forcing people to change code that works because we felt like breaking it should not be done lightly. Keeping old, suboptimal APIs versus forcing code churn is a matter of balance, and choosing the right balance is not a black and white decision to make. None of this is to say that we will never decide to deprecate or remove a feature. But it isn't clear that this proposal brings enough benefit to justify such deprecations. -- Steven

One pain point in Python that constantly rears its head (at least for me) is this: def eat_iterable(yummy_iterable): if isinstance(yummy_iterable, str): raise TypeError("BLECH! Strings are ATOMIC in this context, mmkay??") tasty_list = list(yummy_iterable) # digest list... The design decision to make strs sequence-like was definitely a good one with a huge number of advantages. But we all know that sometimes, treating strs just like all other kinds of sequences is very inconvenient and isn't really what we want. This leads to isinstance() checking, which feels very non-python every time I do it. It would be kind of nice if list, tuple, and set had alt constructors that disallowed things that aren't REALLY sequences of individual atomic items-- like strs, and perhaps dicts as well (which is another instance check often needed to guard against errors). Perhaps: list.atomics tuple.atomics set.atomics But naming things is hard and I'm terrible at it.

On Tue, May 7, 2019 at 11:43 AM Ricky Teachey <ricky@teachey.org> wrote:
I've always thought this was a wart in Python, and the one type error that does show up a lot. (with true division now, it's the only type error that shows up a lot...) But the "problem" isn't that strings are a sequence -- as you say that is handy. The problem is that there is now character type -- so a string is not a sequence of characters, it's a sequence of length-1 strings -- which leads to potentially infinite recursion as you drill down to get the single item. But introducing a character type has been rejected multiple times. Oh well.
This leads to isinstance() checking, which feels very non-python every time I do it.
Agreed. And it's only an adequate solution because for all intents and purposes, there is only one string type -- it would be very rare to duck type a string.
It would be kind of nice if list, tuple, and set had alt constructors that disallowed things that aren't REALLY sequences of individual atomic items
yeach! as you said, a string is a sequence, and that's a good thing. and dicts being an iterable of the keys is kind of nice, too -- and once you have that, they they should be able to be passed into into sequence constructors. (and I actually have done list(a_dict) on purpose when I needed a list of the keys in a dict. -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

(Don't seem to be getting much interest or traction on this so I'm about ready to give up on it, but I should still respond)
yeach! as you said, a string is a sequence, and that's a good thing.
Yes but as you said: it's a sequence of infinitely recursive length 1 sequences. It's definitely a different animal. And the type checking is a chore regardless of whether it's the only string type (though there's also the byte string). and dicts being an iterable of the keys is kind of nice, too -- and once you have that, they they should be able to be passed into into sequence constructors. But the other side to this is you end up on occasion also guarding against dict with a type check, or both dict AND str-- and I could even see guarding against set. Though certainly not nearly as often as str by itself. (and I actually have done list(a_dict) on purpose when I needed a list of the keys in a dict. I think that is done all the time by lots of people. But if you are dealing with dicts, a strict alt constructor would simply be the wrong choice. Thinking about it more: if such a strict non-str-sequence constructor existed, perhaps it should also exclude sets, which aren't even sequences.

20-25 years ago this might have been a good idea. Unfortunately there's so much code (including well-publicized example code) that I'm not sure it's a good use of anyone's time to try and fix this. Exception: I am often in need of a constructor for a bytes object from an integer using the decimal representation, e.g. bytes.fromint(42) == b"42". (Especially when migrating code from Python 2, where I've found a lot of str(n) that cannot be translated to bytes(n) but must instead be written as b"%d" % n, which is ugly and unintuitive when coming from Python 2.) On Mon, May 6, 2019 at 2:50 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

The other bytes object constructor I often find myself in need of without being able to remember how to do it is creating a a length 1 bytes object from a known ordinal. The "obvious": someordinal = ... bytes(someordinal) creates a zeroed bytes of that length, which is clearly wrong. I eventually remember that wrapping it in a tuple (or list) before passing to the bytes constructor works, but it's far from intuitive: bytes((someordinal,)) Unfortunately, the most obvious name for the alternate constructor to fill this niche is *also* bytes.fromint, which conflicts with Guido's use case. On Mon, May 6, 2019 at 2:40 PM Guido van Rossum <guido@python.org> wrote:

06.05.19 17:49, Guido van Rossum пише:
I do not propose to change the current behavior. I propose to add new named constructors. In most cases default constructors can be used, but in cases when we use type check or different tricks to limit the type of the argument, we could use named constructors. Current named constructors: * dict.fromkeys() * int.from_bytes() * float.fromhex() * bytes.fromhex() * bytearray.fromhex()

On Mon, May 6, 2019 at 11:14 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
Understood. My point is that we won't be able to remove the original behavior, so we'd end up with two ways to do it. :-( -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Mon, May 6, 2019 at 7:29 PM Guido van Rossum <guido@python.org> wrote:
With all respect, I disagree. There are ways to evolve Python such as deprecation policies which proven to be effective. There are ways to monitor current features adoption on PyPI to see whether it is safe to remove deprecated things. I'd understand if some feature is not accepted to Python if it is kinda bad. What I refuse to accept as a user is that behavior considered bad and ready to be improved is preserved through time just because it is there already. Please, get me right. I totally agree that this will bring up two ways of performing the same thing but we can deprecate one of them, keep track of the new way adoption and finally get Python to a better state if it is really desired.

On Mon, May 6, 2019 at 7:48 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 6 May 2019 19:39:39 +0300 Serge Matveenko <s@matveenko.ru> wrote:
I have no intention to start a long hypothetical discussion here, really. There are a lot of things which were broken at some point even despite 2to3 crusade. Not to count: `except` syntax, restriction of `async` keyword, u-strings forth and back. Usually, It doesn't matter much why one cannot upgrade the interpreter to the next version. Often, It just stops working and forces a user to dig into dependencies mess. I agree that there is no hope in making a change when there is no intention to make this change. If this change is needed there are ways to achieve that. The path could be almost infinite but it surely cannot be walked if nobody willing to take it.

пн, 6 мая 2019 г. в 19:48, Antoine Pitrou <solipsis@pitrou.net>:
Especially `bytes` constructor: `bytes(int)`, it is so convenient and obvious that in the entire Python standard library it is used only in tests of its own existence: 'cpython-master/Lib/test/test_bytes.py' on line: 1023' 'cpython-master/Lib/test/test_bytes.py' on line: 1234'. Currently, `bytes` is the most ambiguous resident in Python 3. It seems that in the final version (which is very different from the initial idea), it was created to surprise: 1. Those wishing to take it as a string in Python 2 2. Those who wish to perceive it as an array of integers from 0 to 255. 3. And those who want to see `bytes` as something else... with kind regards, -gdg

On 06May2019 18:47, Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't find that compelling. I for one would welcome a small suite of unambiguous factories that can't be misused. bytes() can easily be misused by accident, introducing bugs and requiring debug work. I'd be very happy for my own future code to be able to take advantage of hard to misuse constructors. Of course we could all write tiny factories for these modes but (a) we'd all have to write and debug them and (b) they'de all have different spellings and signatures and (c) everyone else would not have easy access to them (yes, PyPI notwithstanding: finding a specific module in there can be haphazard) and (d) the bytes type is a natural place to have these constructors/factories. All these arguments apply to the other types too. Cheers, Cameron Simpson <cs@cskk.id.au>

On Tue, May 07, 2019 at 09:54:03AM +1000, Cameron Simpson wrote:
There is a difference between *adding* new constructor methods, and what Antoine is saying: that we cannot realistically remove existing uses of the current constructors. I think that Antoine is right: short of another major 2to3 backwards- incompatible version, the benefit of actually removing any of the built-in constructor behaviours is too small and the cost is too great. So I think removal of existing behaviour should be off the table. Or at least, taken on a case-by-case basis. Propose a specific API you want to remove, and we'll discuss that specific API. As for adding *new* constructors:
Probably because everyone will want them to do something different. We've already seen two different semantics for the same desired constructor call: bytes(10) -> b'10' # like str(), but returning bytes bytes(10) -> b'\x0A' # like ord(), but returning a byte That suggests a possible pair of constructors: bytes.from_int(n) -> equivalent to b'%d' % n bytes.ord(n) -> equivalent to bytes((n,)) The proposal in this thread seems to me to be a blanket call to add new constructors everywhere, and I don't think that's appropriate. I think that each proposed new constructor should live or die on its own merits. The two above for bytes seem like simple, obvious APIs that do something useful which is otherwise a small pain point. Both are syntactic sugar for something otherwise ugly or hard to discover. I think that, if somebody is willing to do the work (it can't be me, sorry) adding two new class methods to bytes for the above two cases would be a nett win, and they should be minor enough that it doesn't need a PEP. Thoughts? -- Steven

bytes.ord is a bad name, given the behavior would be the opposite of ord (ord converts length one str to int, not int to length one str). PEP467 (currently deferred to 3.9 or later) does have proposals for this case, either bytes.byte (old proposal: https://legacy.python.org/dev/peps/pep-0467/#addition-of-explicit-single-byt... ) or bytes.fromord/a top level built-in named bchr in the new version of the PEP ( https://www.python.org/dev/peps/pep-0467/#addition-of-bchr-function-and-expl... ). So if that's the way we want to go, we could just push forward on PEP467. It's only a subset of Serhiy's broader proposal, though admittedly one of the cases where the existing design is unusually weak and improvements would better fill niches currently occupied by non-obvious solutions. On Tue, May 7, 2019 at 12:23 AM Steven D'Aprano <steve@pearwood.info> wrote:

Oddly, it seems everyone in this thread thinks it would be "Better" to have a bunch of constructors, ratehr than the overloading, of only we didn't have backward compatibility to worry about. I disagree -- these efficiencies are what I LIKE about python: int() tries to turn whatever you pass into into an integer -- a float, a string, whatever. Do people really think that it would be better if we all had to write: int.parse("123") Rather than just int("123") ? I know my newbie students wouldn't like that. The one exception to all this in my mind is bytes -- which has come up the most in this discussion. But that's not because the bytes constructor is overloaded but rather that the bytes object is overloaded: is is storage for binary data, which its name seems to imply? or is an old-style text type? It's probably too late to do anything about that, but if we do want to "clean up" something, that would be it. """ str(n) that cannot be translated to bytes(n) but must instead be written as b"%d" % n, """ if bytes were really well, bytes, then that would be: bytes((ord(c) for c in str(n))) or str(n).encode('ascii') ugly, I know. I kinda wish we'd simply kept a single_byte_string object (essentially a py2 string) as its own thing, separate from a bytes object, which could then keep its nominal meaning -- a sequence of individual arbitrary bytes. Oh well, lots of water under that bridge..... -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, 7 May 2019 at 06:42, Christopher Barker <pythonchb@gmail.com> wrote:
It depends whether you want to have an exception if the wrong type of input is encountered. The alternative to x=int.parse(s) might instead be if not isinstance(s, str): raise TypeError('need a string') x = int(s) One of the things I like about Python compared to e.g. C is that it often tells me when something is going wrong in some code rather than silently doing the wrong thing. The int function accepts all kinds of things e.g. >>> int('๒') 2 However in my own code if that character ever got passed to int then it would definitely indicate either a bug in the code or data corruption so I'd rather have an exception. Admittedly the non-ASCII unicode digit example is not one that has actually caused me a problem but what I have had a problem with is floats. Given that a user of my code can pass in a float in place of a string the fact that int(1.5) gives 1 can lead to bugs or confusion. For precisely these reasons there is the __index__ method to get around this for the situations where you want to convert a non-int integer type to int (without accepting strings or floats). There is math.trunc for the situation where you are not parsing strings and actually do want to accept non-integer numbers and truncate them. There is no function to parse a decimal integer string without also accepting floats though. -- Oscar

On Tue, May 07, 2019 at 07:17:00PM +0100, Oscar Benjamin wrote:
If you ever get Thai users who would like to enter numbers in their own language, they may be a tad annoyed that you consider that a bug.
Admittedly the non-ASCII unicode digit example is not one that has actually caused me a problem
I don't see why it should cause a problem. An int is an int, regardless of how it was spelled before conversion. You probably don't lose any sleep over the relatively high probability of a single flipped bit changing a '7' digit into a '6' digit, say. It seems strange to worry about the enormously less likely data corruption which just so happens to result in valid non-ASCII digits. If your application can support user-data of "123" for the int 123, why would it matter if the user spelled it '๑๒๓' instead? You're not obligated to output Thai digits if you don't have many Thai users, but it just seems mean to reject Thai input if It Just Works.
So write a helper function and use that. Or specify a base: int(string, 0) will support the usual Python formats (e.g. any of '123', '0x7b', '0o173', '0b1111011') without converting non-strings. If for some reason you only want to support a single base, say, base 7, you can specify a non-zero argument as the base: py> int('234', 7) 123 But user input seems like a good place to apply Postel's Law: "Be conservative in what you output, be liberal in what you accept." It shouldn't be any skin off your nose to accept '0x7b' or '๑๒๓' as well as '123'. -- Steven

On Tue, May 07, 2019 at 12:57:49AM +0000, Josh Rosenberg wrote:
bytes.ord is a bad name, given the behavior would be the opposite of ord (ord converts length one str to int, not int to length one str).
D'oh! I mean, well done, that was a test and you passed! *wink* Sorry for the confusion, yes, I got ord/chr confused in my head.
My personal sense is that generating a single byte from an ordinal is not common enough to justify a new builtin, but I don't have a strong opposition to the concept as such. But I am very adverse to the suggested name. Not only is "bchr" inelegant, but now that regular strings are Unicode, we really ought to avoid perpetuating the idea that single bytes are characters and text strings are bytes more than we really have to. -- Steven

On Wed, May 08, 2019 at 12:18:34AM +1200, Greg Ewing wrote:
Don't be shy, give us your suggested name. I'm not married to the suggestion of "from_int", if somebody with better skills at naming than me can come up with a short but descriptive name that would be great. But my other suggestion was going to be: bytes.return_bytes_representing_the_integer_input_as_an_ascii_bytestring() *wink* But seriously, its nice when a method of function name is completely self-explanatory, and some day I hope to find one which is. But until then, names are more like mnemonics: - dict.fromkeys returns a dict created from keys in what way? - str.encode encodes a string into what kind of data? - bytes.fromhex does what precisely? - what on earth is a glob? We say "import spam", not "import module spam and bind it to the name spam". We use "import" as a short name that, once we've learned what it does, reminds us what it does. It doesn't have to document every bit of functionality, just enough to associate the name with the semantics. In this case, the problem we are trying to solve is this: - convert an int like 12345 into a byte-string like b'12345'. Imagine you're looking at the dir(bytes) to find an appropriate method that perhaps does what you want, and you see methods like: - count - decode - fromhex - isdigit - split etc, plus the suggested "fromint". Which method are you going to associate with the problem you're trying to solve? -- Steven

On 08May2019 00:18, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
How about from_size(n) and from_ord(n)? The former to make a NUL filled bytes of size n, the latter to make a single byte bytes with element 0 having value n. Preemptively one could argue for from_size having an optional fill value, default 0. I think I'm -0 on that because I can't imagine a likely use case and it would slightly slow down every use. I think I'd argue for a from_iter instead, to support an arbitrary fill pattern/sequence. That avoids slowing the common case and provides flexibility at the same time. Cheers, Cameron Simpson <cs@cskk.id.au>

On Wed, May 08, 2019 at 08:14:50AM +1000, Cameron Simpson wrote:
We already have from_size, it's just spelled bytes(n). I don't dislike from_ord as a name, although perhaps it ought to be fromord to match fromhex.
Wanting to fill a bytes object with something other than zeroes is probably uncommon. But for those who need it '\xFF'*n should do the job.
We already have that, its spelled bytes(iterable). py> bytes(range(5)) b'\x00\x01\x02\x03\x04' -- Steven

On Mon, May 06, 2019 at 07:39:39PM +0300, Serge Matveenko wrote:
Monitoring adoption on PyPI tells you absolutely nothing about the millions of lines of Python code which is not on PyPI. Not every Python programmer writes open source software available to the public. They are not second-class citizens -- we have a responsibility to them too, and that's why we don't break backwards-compatibility lightly.
Oh, you "refuse to accept" it do you? How nice. Compared to languages like C that have ISO standards, Python's attitude towards removing old features might be seen as recklessly fast. You should read this to get another perspective: https://www.curiousefficiency.org/posts/2011/04/musings-on-culture-of-python...
Everything you say there is true in principle. That doesn't mean it will happen in practice. For what it's worth, I'm less concerned than Guido is about having two ways to get the same result. There's two (or three, or ...) ways to do many things in Python and "Only One Way To Do It" has never been part of the Zen of Python, it was a put-down from Perl users misrepresenting Python's philosophy. But duplication has costs: more to learn, more decisions to make, more code, more tests, more documentation. Duplicated APIs can become bloat, and bloat is not free. If a function or method doesn't bring *at least* enough benefit to outweigh the costs, then it is harmful. Code churn is not free. Forcing people to change code that works because we felt like breaking it should not be done lightly. Keeping old, suboptimal APIs versus forcing code churn is a matter of balance, and choosing the right balance is not a black and white decision to make. None of this is to say that we will never decide to deprecate or remove a feature. But it isn't clear that this proposal brings enough benefit to justify such deprecations. -- Steven

One pain point in Python that constantly rears its head (at least for me) is this: def eat_iterable(yummy_iterable): if isinstance(yummy_iterable, str): raise TypeError("BLECH! Strings are ATOMIC in this context, mmkay??") tasty_list = list(yummy_iterable) # digest list... The design decision to make strs sequence-like was definitely a good one with a huge number of advantages. But we all know that sometimes, treating strs just like all other kinds of sequences is very inconvenient and isn't really what we want. This leads to isinstance() checking, which feels very non-python every time I do it. It would be kind of nice if list, tuple, and set had alt constructors that disallowed things that aren't REALLY sequences of individual atomic items-- like strs, and perhaps dicts as well (which is another instance check often needed to guard against errors). Perhaps: list.atomics tuple.atomics set.atomics But naming things is hard and I'm terrible at it.

On Tue, May 7, 2019 at 11:43 AM Ricky Teachey <ricky@teachey.org> wrote:
I've always thought this was a wart in Python, and the one type error that does show up a lot. (with true division now, it's the only type error that shows up a lot...) But the "problem" isn't that strings are a sequence -- as you say that is handy. The problem is that there is now character type -- so a string is not a sequence of characters, it's a sequence of length-1 strings -- which leads to potentially infinite recursion as you drill down to get the single item. But introducing a character type has been rejected multiple times. Oh well.
This leads to isinstance() checking, which feels very non-python every time I do it.
Agreed. And it's only an adequate solution because for all intents and purposes, there is only one string type -- it would be very rare to duck type a string.
It would be kind of nice if list, tuple, and set had alt constructors that disallowed things that aren't REALLY sequences of individual atomic items
yeach! as you said, a string is a sequence, and that's a good thing. and dicts being an iterable of the keys is kind of nice, too -- and once you have that, they they should be able to be passed into into sequence constructors. (and I actually have done list(a_dict) on purpose when I needed a list of the keys in a dict. -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

(Don't seem to be getting much interest or traction on this so I'm about ready to give up on it, but I should still respond)
yeach! as you said, a string is a sequence, and that's a good thing.
Yes but as you said: it's a sequence of infinitely recursive length 1 sequences. It's definitely a different animal. And the type checking is a chore regardless of whether it's the only string type (though there's also the byte string). and dicts being an iterable of the keys is kind of nice, too -- and once you have that, they they should be able to be passed into into sequence constructors. But the other side to this is you end up on occasion also guarding against dict with a type check, or both dict AND str-- and I could even see guarding against set. Though certainly not nearly as often as str by itself. (and I actually have done list(a_dict) on purpose when I needed a list of the keys in a dict. I think that is done all the time by lots of people. But if you are dealing with dicts, a strict alt constructor would simply be the wrong choice. Thinking about it more: if such a strict non-str-sequence constructor existed, perhaps it should also exclude sets, which aren't even sequences.
participants (13)
-
Antoine Pitrou
-
Cameron Simpson
-
Chris Angelico
-
Christopher Barker
-
Greg Ewing
-
Guido van Rossum
-
Josh Rosenberg
-
Kirill Balunov
-
Oscar Benjamin
-
Ricky Teachey
-
Serge Matveenko
-
Serhiy Storchaka
-
Steven D'Aprano