[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

15 Nov 2021

      On Mon, Nov 15, 2021 at 10:22 PM Abdur-Rahmaan Janhangeer
 wrote:
...
Greetings,
...
Now what happens? where do you go from there to a vunerability or
backdoor? I think it might be a bit obvious that there is something
funny going on if I see:
if (user.admin == "root" and check_password_securely()
            or user.admin == "root"
            # Second string has hidden characters, do not remove it.
            ):
        elevate_privileges()
Well, it's not so obvious. From Ross Anderson and Nicholas Boucher
src: https://trojansource.codes/trojan-source.pdf
See appendix H. for Python.
with implementations:
https://github.com/nickboucher/trojan-source/tree/main/Python
Rely precisely on bidirectional control chars and/or replacing look alikes
The point of those kinds of attacks is that syntax highlighters and
related code review tools would misinterpret them. So I pulled them
all up in both GitHub's view and the editor I personally use (SciTE,
albeit a fairly old version now). GitHub specifically flags it as a
possible exploit in a couple of cases, but also syntax highlights the
return keyword appropriately. SciTE doesn't give any sort of warnings,
but again, correctly highlights the code - early-return shows "return"
as a keyword, invisible-function shows the name "is_" as the function
name and the rest not, homoglyph-function shows a quite
distinct-looking letter that definitely isn't an H.

The problems here are not Python's, they are code reviewers', and that
means they're really attacks against the code review tools. It's no
different from using the variable m in one place and rn in another,
and hoping that code review uses a proportionally-spaced font that
makes those look similar. So to count as a viable attack, there needs
to be at least one tool that misparses these; so far, I haven't found
one, but if I do, wouldn't it be more appropriate to raise the bug
report against the tool?
...
...
There is no reason why linters and code checkers shouldn't check for
invisible characters, Unicode confusables or mixed script identifiers
and flag them. The interpreter shouldn't concern itself with such purely
stylistic issues unless there is a concrete threat that can only be
handled by the interpreter itself.
I mean current linters. But it will be good to check those for sure.
As a programmer, i don't want a language which bans unicode stuffs.
If there's something that should be fixed, it's the unicode standard, maybe
defining a sane mode where weird unicode stuffs are not allowed. Can also
be from language side in the event where it's not being considered in the standard
itself.
Uhhm..... "weird unicode stuffs"? Please clarify.
...
I don't see it as a language fault nor as a client fault as they are considering
the unicode docs but the response was mixed with some languages decided to patch it
from their side, some linters implementing detection for it as well as some editors flagging
it and rendering it as the exploit intended.
I see it as an editor issue (or code review tool, as the case may be).
You'd be hard-pressed to get something past code review if it looks to
everyone else like you slipped a "return" statement at the end of a
docstring.

So far, I've seen fewer problems from "weird unicode stuffs" than from
the quoted-printable encoding, and that's an attack that involves
nothing but ASCII text. It's also an attack that far more code review
tools seem to be vulnerable to.

ChrisA

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Chris Angelico