[issue9873] Allow bytes in some APIs that use string literals internally

Thu Sep 30 12:54:33 CEST 2010

Nick Coghlan <ncoghlan at gmail.com> added the comment:

>From a function *user* perspective, the latter API (bytes->bytes, str->str) is exactly what I'm doing.

Antoine's point is that there are two ways to achieve that:

Option 1 (what my patch currently does):
- provide bytes and str variants of all constants
- choose which set to use at the start of each function
- be careful never to index, only slice (even for single characters)
- a few other traps that I don't remember off the top of my head

Option 2 (the alternative Antoine suggested and I'm considering):
- "decode" the ASCII compatible bytes to str objects by treating them as nominally latin-1
- use the same str-based constants as are used to handle actual str inputs
- be able to index to your heart's content inside the algorithm
- *ensure* that any bytes-as-pseudo-str objects are "encoded" back to actual bytes before they are returned

>From outside the function, a user shouldn't be able to tell which approach we're using internally.

The nice thing about option 2 is to make sure you're doing it correctly, you only need to check three kinds of location:
- the initial parameter handling in each function
- any return statements, raise statements that allow a value to leave the function
- any yield expressions (both input and output)

The effects of option 1 are scattered all over your algorithms, so it's hard to be sure you've caught everything.

The downside of option 2 is if you make a mistake and let your bytes-as-pseudo-str objects escape from the confines of your function, you're going to see some very strange behaviour.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9873>
_______________________________________