On Mon, Nov 15, 2021 at 10:22 PM Abdur-Rahmaan Janhangeer
Greetings,
Now what happens? where do you go from there to a vunerability or backdoor? I think it might be a bit obvious that there is something funny going on if I see:
if (user.admin == "root" and check_password_securely() or user.admin == "root" # Second string has hidden characters, do not remove it. ): elevate_privileges()
Well, it's not so obvious. From Ross Anderson and Nicholas Boucher src: https://trojansource.codes/trojan-source.pdf
See appendix H. for Python.
with implementations:
https://github.com/nickboucher/trojan-source/tree/main/Python
Rely precisely on bidirectional control chars and/or replacing look alikes
The point of those kinds of attacks is that syntax highlighters and related code review tools would misinterpret them. So I pulled them all up in both GitHub's view and the editor I personally use (SciTE, albeit a fairly old version now). GitHub specifically flags it as a possible exploit in a couple of cases, but also syntax highlights the return keyword appropriately. SciTE doesn't give any sort of warnings, but again, correctly highlights the code - early-return shows "return" as a keyword, invisible-function shows the name "is_" as the function name and the rest not, homoglyph-function shows a quite distinct-looking letter that definitely isn't an H. The problems here are not Python's, they are code reviewers', and that means they're really attacks against the code review tools. It's no different from using the variable m in one place and rn in another, and hoping that code review uses a proportionally-spaced font that makes those look similar. So to count as a viable attack, there needs to be at least one tool that misparses these; so far, I haven't found one, but if I do, wouldn't it be more appropriate to raise the bug report against the tool?
There is no reason why linters and code checkers shouldn't check for invisible characters, Unicode confusables or mixed script identifiers and flag them. The interpreter shouldn't concern itself with such purely stylistic issues unless there is a concrete threat that can only be handled by the interpreter itself.
I mean current linters. But it will be good to check those for sure. As a programmer, i don't want a language which bans unicode stuffs. If there's something that should be fixed, it's the unicode standard, maybe defining a sane mode where weird unicode stuffs are not allowed. Can also be from language side in the event where it's not being considered in the standard itself.
Uhhm..... "weird unicode stuffs"? Please clarify.
I don't see it as a language fault nor as a client fault as they are considering the unicode docs but the response was mixed with some languages decided to patch it from their side, some linters implementing detection for it as well as some editors flagging it and rendering it as the exploit intended.
I see it as an editor issue (or code review tool, as the case may be). You'd be hard-pressed to get something past code review if it looks to everyone else like you slipped a "return" statement at the end of a docstring. So far, I've seen fewer problems from "weird unicode stuffs" than from the quoted-printable encoding, and that's an attack that involves nothing but ASCII text. It's also an attack that far more code review tools seem to be vulnerable to. ChrisA