Re: Integer concatenation to byte string

We where no longer on the ideas list...
You mean the above is what you timed? It is not a realistic problem you are measuring. You also did not share how you measured the the code. If you are not experienced in benchmarking there are variables that you must control for to have meaningful results.
All function calls take time and resources, it would be impossible to streamline a function call to make it faster than building it in. This goes for most languages, including C.
All python byte code is interpreted by calling functions. They take time and resources. Barry

Memz, Please keep your responses on the mailing list. On Tue, Mar 02, 2021 at 08:07:39PM +0000, Barry Scott wrote:
"Lack of efficiency" doesn't speak for itself. You haven't shown how you benchmarked this, so we don't know if it is a valid comparison or not, but generally speaking I will allow that there is some function call overhead in Python. In this case you have: - create a bytes string object; - look up the name bytesarray, which requires two dict lookups (one in the global scope that fails, one in the builtins scope that succeeds); - then call the function with the bytes string object as argument; - and finally the bytes object is garbage collected. So it's reasonable to assume that this has some overhead. The overhead might even be significant if, for example, you create a temporary 10 GB byte string so you can append one byte to the end. But we don't typically care about optimizing for such unusual and extreme cases. If you are trying to squeeze out every last nanosecond of performance, you're probably using the wrong language. Or at least the wrong interpreter. You might like to try PyPy, or some of the other specialising interpreters. Or write your critical code in Cython, or use ctypes, or write it as a C extension. But honestly, I expect that you are falling into the trap of premature optimization. I presume that once you have your mutable bytearray object, you're actually going to do some work with it. It is quite likely that for any real example, not made-up Mickey-Mouse toy code, the time it takes to initialise the byte array object will be a negligible fraction of the time it takes your application to actually process the byte array object. Who cares if it takes 130 nanoseconds to initialise the byte array object, if you then go on to spend ten million nanoseconds working with it? We don't typically make large language changes for the sake of micro-benchmarks. [steve ~]$ python3.9 -m timeit "bytearray(b'abcdefghijklmnop')" 2000000 loops, best of 5: 131 nsec per loop It's not inconceivable that in a tight loop where you have to make many bytearrays but do very little with them, the initialisation cost is significant. But to justify adding literal syntax to the language we would need to see some strong justification that the function call overhead not only is significant, but it is *frequently* a bottleneck in the code. [Barry]
All python byte code is interpreted by calling functions. They take time and resources.
That's not entirely correct. Literals such as text strings, ints and floats get compiled directly into the byte-code. Now of course there is some overhead while executing the byte-code, but that doesn't include the heavy cost of a Python function call. -- Steve

On Wed, Mar 3, 2021, at 04:09, Barry Scott wrote:
I was thinking of the C functions that are executed in ceval.c to run the interpreter for any byte code.
In that case, it's not clear how your proposed syntax would not have the same overhead [especially your suggestion of a += operator]

Memz, Please keep your responses on the mailing list. On Tue, Mar 02, 2021 at 08:07:39PM +0000, Barry Scott wrote:
"Lack of efficiency" doesn't speak for itself. You haven't shown how you benchmarked this, so we don't know if it is a valid comparison or not, but generally speaking I will allow that there is some function call overhead in Python. In this case you have: - create a bytes string object; - look up the name bytesarray, which requires two dict lookups (one in the global scope that fails, one in the builtins scope that succeeds); - then call the function with the bytes string object as argument; - and finally the bytes object is garbage collected. So it's reasonable to assume that this has some overhead. The overhead might even be significant if, for example, you create a temporary 10 GB byte string so you can append one byte to the end. But we don't typically care about optimizing for such unusual and extreme cases. If you are trying to squeeze out every last nanosecond of performance, you're probably using the wrong language. Or at least the wrong interpreter. You might like to try PyPy, or some of the other specialising interpreters. Or write your critical code in Cython, or use ctypes, or write it as a C extension. But honestly, I expect that you are falling into the trap of premature optimization. I presume that once you have your mutable bytearray object, you're actually going to do some work with it. It is quite likely that for any real example, not made-up Mickey-Mouse toy code, the time it takes to initialise the byte array object will be a negligible fraction of the time it takes your application to actually process the byte array object. Who cares if it takes 130 nanoseconds to initialise the byte array object, if you then go on to spend ten million nanoseconds working with it? We don't typically make large language changes for the sake of micro-benchmarks. [steve ~]$ python3.9 -m timeit "bytearray(b'abcdefghijklmnop')" 2000000 loops, best of 5: 131 nsec per loop It's not inconceivable that in a tight loop where you have to make many bytearrays but do very little with them, the initialisation cost is significant. But to justify adding literal syntax to the language we would need to see some strong justification that the function call overhead not only is significant, but it is *frequently* a bottleneck in the code. [Barry]
All python byte code is interpreted by calling functions. They take time and resources.
That's not entirely correct. Literals such as text strings, ints and floats get compiled directly into the byte-code. Now of course there is some overhead while executing the byte-code, but that doesn't include the heavy cost of a Python function call. -- Steve

On Wed, Mar 3, 2021, at 04:09, Barry Scott wrote:
I was thinking of the C functions that are executed in ceval.c to run the interpreter for any byte code.
In that case, it's not clear how your proposed syntax would not have the same overhead [especially your suggestion of a += operator]
participants (3)
-
Barry Scott
-
Random832
-
Steven D'Aprano