When to remove BytesWarning?

Hi, all. To avoid BytesWarning, the compiler needs to do some hack when they need to store bytes and str constants in one dict or set. BytesWarning has maintenance costs. It is not huge, but significant. When can we remove it? My idea is: 3.10: Deprecate the -b option. 3.11: Make the -b option no-op. Bytes warning never emits. 3.12: Remove the -b option. BytesWarning will be deprecated in the document, but not to be removed. Users who want to use the -b option during 2->3 conversion need to use Python ~3.10 for a while. Regards, -- Inada Naoki <songofacandy@gmail.com>

On 24/10/2020 05.19, Inada Naoki wrote:
In my experience it would be useful to keep the bytes warning for implicit representation of bytes in string formatting. It's still a common source of issues in code. Bytes / str comparison or dict lookup is a less common issue.
Christian

On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes <christian@python.org> wrote:
I am with Christian here. Still notice a possibility of people running into this because all the Python2 code is not dead yet. Perhaps this warning might stay for a long time.
BytesWarning has maintenance costs. It is not huge, but significant.
Should we know by how much so that the proposal of `-b` switch can be weighted against? Thank you, Senthil

On Mon, Oct 26, 2020 at 4:35 PM Senthil Kumaran <senthil@uthcode.com> wrote:
Do you mean you are OK to remove BytesWarning from b"abc" == u"def" and b"abc" == 42?
Still notice a possibility of people running into this because all the Python2 code is not dead yet. Perhaps this warning might stay for a long time.
I never proposed to remove it "now", but 3.11. 3.10 will become security only mode at 2022-04, and EOL at 2026-10. But you can use Python 3.10 after EOL for porting Python 2 code, because security fix is not required while porting.
BytesWarning has maintenance costs. It is not huge, but significant.
Should we know by how much so that the proposal of `-b` switch can be weighted against?
It is difficult to say "how much". We need to keep it in mind that `a == b` is not safe even for builtin types everytime we write a patch or review pull request. Especially, when u"foo" and b"bar" are used as keys of the same dict, BytesWarnings happens only when (randomized) hash collision. It is very hard to find this bug. Of course, there are some runtime costs too. https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380... https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380... (maybe more, but I'm not sure) Regards, -- Inada Naoki <songofacandy@gmail.com>

Le sam. 24 oct. 2020 à 15:13, Christian Heimes <christian@python.org> a écrit :
IMO it's not a big deal to investigate such bugs without the -b / -bb command line option. It should be easy to identify where bytes are formatted as string. Victor -- Night gathers, and now my watch begins. It shall not end until my death.

24.10.20 06:19, Inada Naoki пише:
I agree that it should be removed, and that BytesWarning should be kept (maybe we will reuse it for other purposes in future). But I do not see how deprecating it before removing could help. Using it with -We will no longer work, and without -We it will just add a noise. We can just make -b a no-op at any moment and remove it few versions later. Or maybe first make it no-op, then deprecate, then remove. But it looks too much. -b is still usable in 3.9, so it can be removed not earlier than EOL of 3.9. Users that use it should be able to use it with all maintained Python versions if it makes sense with at least one of them. 3.x: Make the -b option no-op. Bytes warning never emits. 3.x+4: Remove the -b option.

Hi, Which operations are impacted by -b and -bb? str == bytes, bytes == str, dict lookup using str or bytes keys? 'unicode' < b'bytes' always raises a TypeError. Le sam. 24 oct. 2020 à 05:20, Inada Naoki <songofacandy@gmail.com> a écrit :
os.get_exec_path() must modify temporarily warnings filters to ignore BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in the 'env' dictionary which may contain unicode or bytes strings. Modifying warnings filters impact all threads which is bad. I dislike having to workaround this annoying behavior for dict lookup when -b or -bb is used. I'm quite sure that almost nobody uses -b or -bb when running their test suite or to develop. I expect that nobody uses it. According to replies, it seems like porting Python 2 code to Python 3 is the only use case. Python 3.9 and older can be used for that, no?
When can we remove it? My idea is:
3.10: Deprecate the -b option.
Do you mean writing a message into stderr? Or just deprecate it in the documentation?
3.11: Make the -b option no-op. Bytes warning never emits. 3.12: Remove the -b option.
There is no _need_ to raise an error when -b is used. The -t option was kept even after the feature was removed (in Python 3.0 ?). -J ("used by Jython" says a comment) is a second command line option which is silently ignored.
BytesWarning will be deprecated in the document, but not to be removed.
I don't see what you mean here. I dislike the idea of deprecating a feature without scheduling its removal. I don't see the point of deprecating it in this case. I only see that as an annoyance. I'm fine with removing the exception. If you don't plan to remove it, just leave it unchanged (not deprecated), no? Victor -- Night gathers, and now my watch begins. It shall not end until my death.

On Tue, Oct 27, 2020 at 5:23 AM Victor Stinner <vstinner@python.org> wrote:
Completely agree with you.
I think so. But I became a bit conservative when writing this proposal.
I thought document only.
I see.
Document only deprecation is useful for readers. Readers can know "I can just ignore this.".
I'm fine with removing the exception. If you don't plan to remove it, just leave it unchanged (not deprecated), no?
OK, my new proposal is: 3.10: Stop emitting BytesWarning for bytes == unicode case, because this is the most annoying part. 3.11: Stop emitting BytesWarning in core and stdlib. 4.0: Remove `-b` option, `sys.flags.bytes_warning`, and `BytesWarning`. Regards, -- Inada Naoki <songofacandy@gmail.com>

I'm quite sure that almost nobody uses -b or -bb when running their test suite or to develop
I noticed this thread and just thought I'd give one example of someone who does use -bb: https://github.com/johnthagen/python-blueprint/blob/210b89fe011d172104e9f1ba... I admit I am in a very small minority however. That being said, I have discovered a few minor bugs in my code or in third party libraries over the years using -bb. But I would understand still wanting to remove this feature to lower maintenance burden.

On Wed, Nov 11, 2020 at 10:19 AM John Hagen <johnthagen@gmail.com> wrote:
Which warning helped you? str(bytes)? bytes == str? or bytes == int? I am not much concerned about removing str(bytes) warning anytime soon. Only bytes== warning is significant maintenance burden. Regards, -- Inada Naoki <songofacandy@gmail.com>

If I recall, it was str(bytes) warning that flagged in a few places and was missing a .decode() call or similar. It seems like the bytes== warnings could be implemented in a type checker such as mypy, if it doesn't already do this. Assuming you have correct type coverage/inference on your project, you could potentially catch this at static analysis time rather than runtime.

11.11.20 15:05, John Hagen пише:
There were several bugs like sep=='/' (where sep can be bytes) in the stdlib. These cases were not covered by tests, so they were fixed only in 3.3 or even later. I hope all such bugs are already fixed, but I cannot guarantee. And there were bugs with str(bytes) too.

On 24/10/2020 05.19, Inada Naoki wrote:
In my experience it would be useful to keep the bytes warning for implicit representation of bytes in string formatting. It's still a common source of issues in code. Bytes / str comparison or dict lookup is a less common issue.
Christian

On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes <christian@python.org> wrote:
I am with Christian here. Still notice a possibility of people running into this because all the Python2 code is not dead yet. Perhaps this warning might stay for a long time.
BytesWarning has maintenance costs. It is not huge, but significant.
Should we know by how much so that the proposal of `-b` switch can be weighted against? Thank you, Senthil

On Mon, Oct 26, 2020 at 4:35 PM Senthil Kumaran <senthil@uthcode.com> wrote:
Do you mean you are OK to remove BytesWarning from b"abc" == u"def" and b"abc" == 42?
Still notice a possibility of people running into this because all the Python2 code is not dead yet. Perhaps this warning might stay for a long time.
I never proposed to remove it "now", but 3.11. 3.10 will become security only mode at 2022-04, and EOL at 2026-10. But you can use Python 3.10 after EOL for porting Python 2 code, because security fix is not required while porting.
BytesWarning has maintenance costs. It is not huge, but significant.
Should we know by how much so that the proposal of `-b` switch can be weighted against?
It is difficult to say "how much". We need to keep it in mind that `a == b` is not safe even for builtin types everytime we write a patch or review pull request. Especially, when u"foo" and b"bar" are used as keys of the same dict, BytesWarnings happens only when (randomized) hash collision. It is very hard to find this bug. Of course, there are some runtime costs too. https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380... https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380... (maybe more, but I'm not sure) Regards, -- Inada Naoki <songofacandy@gmail.com>

Le sam. 24 oct. 2020 à 15:13, Christian Heimes <christian@python.org> a écrit :
IMO it's not a big deal to investigate such bugs without the -b / -bb command line option. It should be easy to identify where bytes are formatted as string. Victor -- Night gathers, and now my watch begins. It shall not end until my death.

24.10.20 06:19, Inada Naoki пише:
I agree that it should be removed, and that BytesWarning should be kept (maybe we will reuse it for other purposes in future). But I do not see how deprecating it before removing could help. Using it with -We will no longer work, and without -We it will just add a noise. We can just make -b a no-op at any moment and remove it few versions later. Or maybe first make it no-op, then deprecate, then remove. But it looks too much. -b is still usable in 3.9, so it can be removed not earlier than EOL of 3.9. Users that use it should be able to use it with all maintained Python versions if it makes sense with at least one of them. 3.x: Make the -b option no-op. Bytes warning never emits. 3.x+4: Remove the -b option.

Hi, Which operations are impacted by -b and -bb? str == bytes, bytes == str, dict lookup using str or bytes keys? 'unicode' < b'bytes' always raises a TypeError. Le sam. 24 oct. 2020 à 05:20, Inada Naoki <songofacandy@gmail.com> a écrit :
os.get_exec_path() must modify temporarily warnings filters to ignore BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in the 'env' dictionary which may contain unicode or bytes strings. Modifying warnings filters impact all threads which is bad. I dislike having to workaround this annoying behavior for dict lookup when -b or -bb is used. I'm quite sure that almost nobody uses -b or -bb when running their test suite or to develop. I expect that nobody uses it. According to replies, it seems like porting Python 2 code to Python 3 is the only use case. Python 3.9 and older can be used for that, no?
When can we remove it? My idea is:
3.10: Deprecate the -b option.
Do you mean writing a message into stderr? Or just deprecate it in the documentation?
3.11: Make the -b option no-op. Bytes warning never emits. 3.12: Remove the -b option.
There is no _need_ to raise an error when -b is used. The -t option was kept even after the feature was removed (in Python 3.0 ?). -J ("used by Jython" says a comment) is a second command line option which is silently ignored.
BytesWarning will be deprecated in the document, but not to be removed.
I don't see what you mean here. I dislike the idea of deprecating a feature without scheduling its removal. I don't see the point of deprecating it in this case. I only see that as an annoyance. I'm fine with removing the exception. If you don't plan to remove it, just leave it unchanged (not deprecated), no? Victor -- Night gathers, and now my watch begins. It shall not end until my death.

On Tue, Oct 27, 2020 at 5:23 AM Victor Stinner <vstinner@python.org> wrote:
Completely agree with you.
I think so. But I became a bit conservative when writing this proposal.
I thought document only.
I see.
Document only deprecation is useful for readers. Readers can know "I can just ignore this.".
I'm fine with removing the exception. If you don't plan to remove it, just leave it unchanged (not deprecated), no?
OK, my new proposal is: 3.10: Stop emitting BytesWarning for bytes == unicode case, because this is the most annoying part. 3.11: Stop emitting BytesWarning in core and stdlib. 4.0: Remove `-b` option, `sys.flags.bytes_warning`, and `BytesWarning`. Regards, -- Inada Naoki <songofacandy@gmail.com>

I'm quite sure that almost nobody uses -b or -bb when running their test suite or to develop
I noticed this thread and just thought I'd give one example of someone who does use -bb: https://github.com/johnthagen/python-blueprint/blob/210b89fe011d172104e9f1ba... I admit I am in a very small minority however. That being said, I have discovered a few minor bugs in my code or in third party libraries over the years using -bb. But I would understand still wanting to remove this feature to lower maintenance burden.

On Wed, Nov 11, 2020 at 10:19 AM John Hagen <johnthagen@gmail.com> wrote:
Which warning helped you? str(bytes)? bytes == str? or bytes == int? I am not much concerned about removing str(bytes) warning anytime soon. Only bytes== warning is significant maintenance burden. Regards, -- Inada Naoki <songofacandy@gmail.com>

If I recall, it was str(bytes) warning that flagged in a few places and was missing a .decode() call or similar. It seems like the bytes== warnings could be implemented in a type checker such as mypy, if it doesn't already do this. Assuming you have correct type coverage/inference on your project, you could potentially catch this at static analysis time rather than runtime.

11.11.20 15:05, John Hagen пише:
There were several bugs like sep=='/' (where sep can be bytes) in the stdlib. These cases were not covered by tests, so they were fixed only in 3.3 or even later. I hope all such bugs are already fixed, but I cannot guarantee. And there were bugs with str(bytes) too.
participants (6)
-
Christian Heimes
-
Inada Naoki
-
John Hagen
-
Senthil Kumaran
-
Serhiy Storchaka
-
Victor Stinner