Mailman 3 Secure string disposal (maybe other inmutable seq types too?) - Python-ideas

newer
Re: [Python-ideas] [issue33865]...

Secure string disposal (maybe other inmutable seq types too?)

Ezequiel Brizuela [aka EHB or qlixed]

June 22, 2018

5:31 p.m.

As all the string in python are immutable, is impossible to overwrite the value or to make a "secure disposal" (overwrite-then-free) of a string using something like:

...

...
...
a = "something to hide" a = "x"*len(a)

This will lead on the process memory "something to hide" and "x" repeated len(a) times. - Who cares? Why is this relevant? Well if you handle some sensitive information like CC numbers, Passwords, PINs, or other kind of information you wanna minimize the chance of leaking any of it. - How this "leak" can happen? If you get a core/memory dump of an app handling sensitive information you will get all the information on that core exposed! - Well, so what we can do about this? I propose to make the required changes on the string objects to add an option to overwrite the underlying buffer. To do so: * Add a wiped as an attribute that is read-only to be set when the string is overwrited. * Add a wipe() method that overwrite the internal string buffer. So this will work like this:

...

...
...
pwd =getpass.getpass('Set your password:') # could be other sensitive data. encrypted_pwd = crypt.crypt(pwd) # crypt() just as example. pwd.wiped # Check if pwd was wiped. False pwd.wipe() # Overwrite the underlying buffer pwd.wiped # Check if pwd was wiped. True print(pwd) # Print noise (or empty str?) del pwd # Now is in hands of the GC.

The wipe method immediately overwrite the underlying string buffer, setting wiped as True for reference so if the string is further used this can be checked to confirm that the change was made by a wipe and not by another procedure. Also initially the idea is to use unicode NULL datapoint to overwrite the string, but this could be change to let the user parametrize it over wipe() method. An alternative to this is to add a new exception "WipedError" that could be throw where the string is accessed again, but I found this method too disruptive to implement for a normal/standard string workflow usage. Quick & Dirty FAQ: - You do it wrong!, the correct code to do that in a secure way is:

...

...
...
pwd = crypt.crypt(getpass.getpass('Set your password')) Don't you know that fool?

Well no, the code still generate a temporary string in memory to pass to crypt. But now this string is lying there and can't be accessed for an overwrite with wipe() - Why not create a new type like in C# or Java? I see that this tend to disrupt the usual workflow of string usage. Also the idea here is not to offer secure storage of string in memory because there is already a few mechanism to achieve with the current Python base. I just want to have the hability to overwrite the buffer. - Why don't use one of the standard algorithms to overwrite like DoD5220 or MIL-STD-414? This kind of standard usually are oriented for usage on persistent storage, specially on magnetic media for where the data could be "easily" recoverd. But this could ve an option that could be implemented adding the option to plug a function that do the overwrite work inside the wipe method. - This is far beyond of the almost implementation-agnostic definition of the python lang. How about to you make a module with this functionality and left the lang as is? Well I already do it: https://github.com/qlixed/python-memwiper/ But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal. I think that this kind of security things needs to be tackled from within the language itself specially when the lang have GC. I firmly believe that the security and protections needs to be part of the "with batteries" offer of Python. And I think that this is one little thing that could help a lot to secure our apps. Let me know what do you think! ~ Ezequiel (Ezekiel) Brizuela [ aka Qlixed ] ~

Attachments:

attachment.htm (text/html — 5.6 KB)

Show replies by date

Chris Angelico

June 2018

5:45 p.m.

On Sat, Jun 23, 2018 at 10:31 AM, Ezequiel Brizuela [aka EHB or qlixed] <qlixed@gmail.com> wrote:

...

Since strings are immutable, it's entirely possible for them to be shared in various ways. Having the string be wiped while still existing seems to be a risky approach.

...

Would it suffice to flag the string as "this contains sensitive data, please overwrite its buffer when it gets deallocated"? The only difference, in your example, would be that the last print would show the original data, and the wipe would happen afterwards. Advantages of this approach include that getpass can automatically flag the string as sensitive, and the "sensitive" flag can infect other strings (so <<pwd + "x">> would be automatically flagged to be wiped). Downside: You can't say "I'm done with this string, destroy it immediately". ChrisA

Guido van Rossum

6:30 p.m.

A wipe() method that mutates a string while it can still be referenced elsewhere is unacceptable -- it breaks an abstraction that is widely assumed. Chris's proposal can be implemented, it would set a hidden flag. Hopefully there's room for the flag without increasing the object header size. On Fri, Jun 22, 2018 at 5:46 PM Chris Angelico <rosuav@gmail.com> wrote:

...

-- --Guido van Rossum (python.org/~guido)

Chris Angelico

6:35 p.m.

On Sat, Jun 23, 2018 at 11:30 AM, Guido van Rossum <guido@python.org> wrote:

...

Chris's proposal can be implemented, it would set a hidden flag. Hopefully there's room for the flag without increasing the object header size.

If I'm reading the include file correctly, the 'state' bitstruct has eight bits with defined meanings, and then 24 of padding to ensure alignment. Allocating one of those bits to say "sensitive" should be 100% backward-compatible. ChrisA

Greg Ewing

6:33 p.m.

Chris Angelico wrote:

...

Downside: You can't say "I'm done with this string, destroy it immediately".

Also it would be hard to be sure there wasn't another copy of the data somewhere from a time before you got around to marking the string as sensitive, e.g. in a file buffer. -- Greg

Steven D'Aprano

6:45 p.m.

On Sat, Jun 23, 2018 at 01:33:59PM +1200, Greg Ewing wrote:

...

Don't let the perfect be the enemy of the good. We know there's at least one place that a string could leak private information. Just because there could hypothetically be other such places, doesn't make it useless to wipe that known potential leak. Attackers are not always omniscient. Even if an application leaks private data in ten places, some attacker may only know of, or be capable of, attacking *one* leak. If we can, we ought to plug it, and leave those hypothetical other leaks for another day. (Burglars can lift the tiles off my roof, climb into the ceiling, and hence down into my house. Nevertheless I still lock my front door.) -- Steve

Nathaniel Smith

10:21 p.m.

On Fri, Jun 22, 2018 at 6:45 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

That's true, but for security features it's important to have a proper analysis of the threat and when the mitigation will and won't work; otherwise, you don't know whether it's even "good", and you don't know how to educate people on what they need to do to make effective use of it (or where it's not worth bothering). Another issue: I believe it'd be impossible for this proposal to work correctly on implementations with a compacting GC (e.g., PyPy), because with a compacting GC strings might get copied around in memory during their lifetime. And crucially, this might have already happened before the interpreter was told that a particular string object contained sensitive data. I'm guessing this is part of why Java and C# use a separate type. There's a lot of prior art on this in other languages/environments, and a lot of experts who've thought hard about it. Python-{ideas,dev} doesn't have a lot of security experts, so I'd very much want to see some review of that work before we go running off designing something ad hoc. The PyCA cryptography library has some discussion in their docs: https://cryptography.io/en/latest/limitations/ One possible way to move the discussion forward would be to ask the pyca devs what kind of API they'd like to see in the interpreter, if any. -n -- Nathaniel J. Smith -- https://vorpus.org

Steve Barnes

12:31 a.m.

On 23/06/2018 06:21, Nathaniel Smith wrote:

...

All good points - I would think that for this to be effective the string, or secure string, would need to be marked at create time and all operations on it would have to honour the wipe before free flag and include it forward in any copies made. This needs to be implemented at a very low level so that, e.g.: adding to a string, (which makes a copy if the string is growing beyond the current allocation), will have to check for the flag add it to the new string, copy the expanded contents and then wipe the old before freeing it - any normal string which is being added to a secure string should probably get the flag added automatically as well. Of course adding or assigning a secure string to a normal string should automatically make the target string secure as well. This sounds like a lot of overhead to be adding to every string operation secure or not. That being the case it probably makes a lot of sense to use a separate base class - while this will result in a certain amount of bloat in software that makes use of it it will avoid the overhead of checking the flag on the vast majority of software which does not use it. I do know that I have heard in the past of security breaches in both C & Pascal strings where the problem was tracked down to the "delete" mechanism being just setting the first byte to 0x00, (which in both cases would result in the length of the string being 0 but the contents being untouched in Pascal strings and only the first character being lost in C. -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. https://www.avg.com

Christian Heimes

12:54 p.m.

On 2018-06-23 07:21, Nathaniel Smith wrote:

...

A while ago, I spent a good amount of time to investigate memory wiping for hashlib and hmac module. Although I was only interested to perform memory wiping in C code [1], I eventually gave up. It was too annoying to create a platform and architecture independent implementation. Because compilers do funny things and memset_s() isn't universally available yet, it it requires code like static void * (* const volatile __memset_vp)(void *, int, size_t) = (memset); or assembler code like asm volatile("" : : "r"(s) : "memory"); to just work around compiler optimization. This doesn't even handle CPU architecture, virtual memory, paging, core dumps, debuggers or other things that can read memory or dump memory to disk. I honestly believe, that memory wiping with the current standard memory allocator won't do the trick. It might be possible to implement a 90% solution with a special memory allocator. Said allocator would a specially configured, mmap memory arena and perform wiping on realloc() and free(). The secure area can be prevented from swapping with mlock(), protected with mprotect() and possible hardware encrypted with pkey_mprotect(). It's just a 90% secure solution, because the data will eventually land in public buffers. If you need to protect sensitive data like private keys, then don't load them into memory of your current process. It's that simple. :) Bugs like heartbleed were an issue, because private key were in the same process space as the TLS/SSL code. Solutions like gpg-agent, ssh-agent, TPM, HSM, Linux's keyring and AF_ALG socket all aim to offload operations with private key material into a secure subprocess, Kernel space or special hardware. [1] https://bugs.python.org/issue17405

Gregory P. Smith

1:04 p.m.

On Sat, Jun 23, 2018 at 12:57 PM Christian Heimes <christian@python.org> wrote:

...

+10 It is fundamentally impossible for a Python VM (certainly CPython) to implement any sort of guaranteed erasure of data and/or control over data to prevent copying that is ever stored in a Python object. This is not unique to Python. All interpreted and jitted VMs share this trait, as do most languages with garbage collection. ex: Java, Ruby, Go, etc. Trying to pretend we could offer tracking and wiping of sensitive data in-process is harmful at best as it cannot be guaranteed and thus gives the wrong impression and will lead to misuse by people who ignore that. -gps

Greg Ewing

5:31 p.m.

Christian Heimes wrote:

...

It's just a 90% secure solution, because the data will eventually land in public buffers.

Seems like the only completely foolproof solution would have to involve some kind of quantum storage that can't be copied without destroying it. -- Greg

Steven D'Aprano

7:04 p.m.

On Sat, Jun 23, 2018 at 09:54:43PM +0200, Christian Heimes wrote:

...

If you need to protect sensitive data like private keys, then don't load them into memory of your current process. It's that simple. :)

How do ordinary Python programmers, like me, who want to do the Right Thing but without thinking too hard about it (or years of study), do this in a more-or-less platform independent way? We have the secrets module that is supposed to be the "batteries included" solution for sensitive data. Should it be involved? -- Steve

Terry Reedy

9 p.m.

On 6/22/2018 8:45 PM, Chris Angelico wrote:

...

But one can be careful about creating references, and in current CPython, deleting the last reference does mean destroy, and possibly wipe, immediately. -- Terry Jan Reedy

Chris Angelico

9:08 p.m.

On Sat, Jun 23, 2018 at 2:00 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

Yes, you can, for the most part. It's certainly possible to get stung (eg exceptions retaining locals), but mostly it should be fine. How will other Pythons handle this? ChrisA

Guido van Rossum

9:42 p.m.

On Fri, Jun 22, 2018 at 9:11 PM Chris Angelico <rosuav@gmail.com> wrote:

...

How will other Pythons handle this?

It could be optional behavior. ISTR that in Jython, strings are pretty much just Java strings. Does Java have such a feature? If not, do Java apps worry about this? If not, perhaps Python needn't either. -- --Guido van Rossum (python.org/~guido)

M.-A. Lemburg

5:11 a.m.

On 23.06.2018 02:45, Chris Angelico wrote:

...

I think the flag is an excellent idea. I'm not so sure about the automatic propagation of the flag, though. If a string gets interned with the flag set, this could lead to a lot of other strings receiving the flag without intent. Then again, you will probably not want such strings to be interned in the first place. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 23 2018)

...

::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Chris Angelico

12:55 p.m.

On Sat, Jun 23, 2018 at 10:11 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...

Yeah, I'm not entirely sure about the semantics of infection. There might need to be a special case, such as "an empty string is never sensitive", to prevent absolutely EVERYTHING from being infected. What do other languages do there? But even if the rules are extremely simple to start with, I think this will be of value. ChrisA

Ezequiel Brizuela [aka EHB or qlixed]

12:11 p.m.

El vie., 22 de jun. de 2018 21:46, Chris Angelico <rosuav@gmail.com> escribió:.

...

(...)

...

Downside: You can't say "I'm done with this string, destroy it immediately".

That is the main issue with this approach. The proposed one is inmmediate but I understand that is risky.

Terry Reedy

6:32 p.m.

On 6/22/2018 8:31 PM, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...

As all the string in python are immutable, is impossible to overwrite the value

Not if one uses ctypes. Is that what you did?

...

Well I already do it:

https://github.com/qlixed/python-memwiper/

...

But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal.

I think it is. A very small fraction of Python users need such wiping. And I doubt that it can be complete. For instance, I suspect that a password entered into getpass, for instance, first exists in OS form before being copied into a Python string objects. Wiping the Python string would not wipe the original copy. So this really should be attacked at the OS level, not the language level. I have read that phones use separate memory for critical data to try to protect critical data. -- Terry Jan Reedy

Ezequiel Brizuela [aka EHB or qlixed]

1:02 p.m.

El vie., 22 de jun. de 2018 22:33, Terry Reedy <tjreedy@udel.edu> escribió:

...

On 6/22/2018 8:31 PM, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...
As all the string in python are immutable, is impossible to overwrite the value

Not if one uses ctypes. Is that what you did?

No. I was using exclusivelly python strings functions from the C api.

...

Well I already do it:

...
https://github.com/qlixed/python-memwiper/

...
But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal.

I think it is. A very small fraction of Python users need such wiping.

And I doubt that it can be complete. For instance, I suspect that a

...

password entered into getpass, for instance, first exists in OS form before being copied into a Python string objects. Wiping the Python string would not wipe the original copy.

Agree. It migth be more places to search. So this really should be

...

attacked at the OS level, not the language level.

This need to be tackled from all the sides. Ensuring the minimal attack surface possible for anyone.

Paul Moore

4:13 a.m.

On 23 June 2018 at 01:31, Ezequiel Brizuela [aka EHB or qlixed] <qlixed@gmail.com> wrote:

...

Is there any reason this could not be implemented as a 3rd party class (implemented in C, of course) which subclasses str? So you'd do from safestring import SafeStr a = SafeStr("my secret data") ... work with a as if it were a string del a When the refcount of a goes to zero, before releasing the memory, the custom class wipes that memory. There are obvious questions around theres_a_copy_here = "prefix " + a + " suffix" which will copy the secure data, but those issues will be just as much of a problem with a change to the builtin string, unless you propose some mechanism for propagating "secureness" from one value to another. And then you get questions like, is a[0] still "secret"? What about sha256(a)? Having a mechanism for handling this seems like a good idea, but my feeling is that even with a mechanism, handling secure data needs care and specialised knowledge from the programmer, and supporting that is better done with a dedicated class rather than having the language runtime try to solve the problem automatically (which runs the risk that a naive programmer expects the language to do the job, and then *doesn't* think about the risks). Paul Paul

Paul Moore

4:16 a.m.

On 23 June 2018 at 12:13, Paul Moore <p.f.moore@gmail.com> wrote:

...

By the way, Perl has a concept of "tainted strings" which track string values (in Perl's case, whether they came from "external input") in a similar way. Anyone intending to take this proposal forward should almost certainly research that case - my recollection is that taintedness was a mixed success, in that it at best only partially solved the problems and was quite complex to implement and document. But it's probably 15 years or more since I looked at Perl's taint mechanism, so don't trust my recollection without checking :-) Paul

Greg Ewing

5:14 p.m.

Paul Moore wrote:

...

But in order to create the SafeStr, you need to first have the data in the form of an ordinary non-safe string. How do you dispose of that safely? -- Greg

Terry Reedy

7:44 p.m.

On 6/23/2018 8:14 PM, Greg Ewing wrote:

...

getpass could return a SafeStr (or SafeBytes?). SafeStr could be initialized from a sequence of ints. -- Terry Jan Reedy

Paul Moore

1:10 p.m.

On 24 June 2018 at 03:44, Terry Reedy <tjreedy@udel.edu> wrote:

...

That's certainly a possibility. It's basically what the .net SecureString class does. But the initialisation problem is definitely a big flaw in the idea that I hadn't thought of :-( The moral of this is probably for me to leave security design to the experts :-) Paul

Stephan Houben

6:57 a.m.

Would it not be much simpler and more secure to just disable core dumps? /etc/security/limits.conf on Linux. If the attacker can cause and read a core dump, the game seems over anyway since sooner or later he will catch the core dump at a time the string was not yet deleted. Stephan Op za 23 jun. 2018 02:32 schreef Ezequiel Brizuela [aka EHB or qlixed] < qlixed@gmail.com>:

...

As all the string in python are immutable, is impossible to overwrite the value or to make a "secure disposal" (overwrite-then-free) of a string using something like:

...
...
...
a = "something to hide" a = "x"*len(a)

This will lead on the process memory "something to hide" and "x" repeated len(a) times.

- Who cares? Why is this relevant? Well if you handle some sensitive information like CC numbers, Passwords, PINs, or other kind of information you wanna minimize the chance of leaking any of it.

- How this "leak" can happen? If you get a core/memory dump of an app handling sensitive information you will get all the information on that core exposed!

- Well, so what we can do about this? I propose to make the required changes on the string objects to add an option to overwrite the underlying buffer. To do so:

* Add a wiped as an attribute that is read-only to be set when the string is overwrited. * Add a wipe() method that overwrite the internal string buffer.

So this will work like this:

...
...
...
pwd =getpass.getpass('Set your password:') # could be other sensitive data. encrypted_pwd = crypt.crypt(pwd) # crypt() just as example. pwd.wiped # Check if pwd was wiped. False pwd.wipe() # Overwrite the underlying buffer pwd.wiped # Check if pwd was wiped. True print(pwd) # Print noise (or empty str?) del pwd # Now is in hands of the GC.

The wipe method immediately overwrite the underlying string buffer, setting wiped as True for reference so if the string is further used this can be checked to confirm that the change was made by a wipe and not by another procedure. Also initially the idea is to use unicode NULL datapoint to overwrite the string, but this could be change to let the user parametrize it over wipe() method. An alternative to this is to add a new exception "WipedError" that could be throw where the string is accessed again, but I found this method too disruptive to implement for a normal/standard string workflow usage.

Quick & Dirty FAQ:

- You do it wrong!, the correct code to do that in a secure way is:

...
...
...
pwd = crypt.crypt(getpass.getpass('Set your password')) Don't you know that fool?

Well no, the code still generate a temporary string in memory to pass to crypt. But now this string is lying there and can't be accessed for an overwrite with wipe()

- Why not create a new type like in C# or Java?

I see that this tend to disrupt the usual workflow of string usage. Also the idea here is not to offer secure storage of string in memory because there is already a few mechanism to achieve with the current Python base. I just want to have the hability to overwrite the buffer.

- Why don't use one of the standard algorithms to overwrite like DoD5220 or MIL-STD-414?

This kind of standard usually are oriented for usage on persistent storage, specially on magnetic media for where the data could be "easily" recoverd. But this could ve an option that could be implemented adding the option to plug a function that do the overwrite work inside the wipe method.

- This is far beyond of the almost implementation-agnostic definition of the python lang. How about to you make a module with this functionality and left the lang as is?

Well I already do it:

https://github.com/qlixed/python-memwiper/

But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal. I think that this kind of security things needs to be tackled from within the language itself specially when the lang have GC. I firmly believe that the security and protections needs to be part of the "with batteries" offer of Python. And I think that this is one little thing that could help a lot to secure our apps. Let me know what do you think!

~ Ezequiel (Ezekiel) Brizuela [ aka Qlixed ] ~

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Christian Heimes

12:28 p.m.

On 2018-06-23 15:57, Stephan Houben wrote:

...

That's not sufficient. You'd also need to ensure that the memory page is never paged to disk or a visible to gdb, ptrace, or any other kind of debugger. POSIX has mprotect(), but it doesn't necessarily work with malloc()ed memory and requires mmap() memory. Christian

Greg Ewing

5:26 p.m.

Christian Heimes wrote:

...

If the attacker can attach a debugger to your process, they can already do a lot worse than snoop on your secret strings. -- Greg

Ezequiel Brizuela [aka EHB or qlixed]

12:55 p.m.

El sáb., 23 de jun. de 2018 10:58, Stephan Houben <stephanh42@gmail.com> escribió:

...

Thing is that this could be leaked in other ways, not just on a core. Additiinally there is the case when you need a core to debug the issue, you could be sharing sensitive info without knowing it. Also is not always an option disabling core generation.

Christian Heimes

1:43 p.m.

On 2018-06-23 21:55, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...

If you have core dumps enabled, then memory wiping will not help against accidental leakage of sensitive data.

Kyle Lahnakoski

12:20 p.m.

Ezequiel (Ezekiel) Brizuela, How is the secret "password" getting into a Python variable? It is coming from disk, or network? Do the buffers of those systems have a copy? How about methods that operate on the secrets? Do they internally decrypt secrets to perform the necessary operations? I had this problem, and the only solution was a hardware security module (HSM): Private keys do not leave the module; encryption/decryption/verification are all done on the module. Passwords enter the secure system via hardware keypads; which encrypt the password before transmitting bytes to the local computer. I do not think you can trust a network connected machine to have private keys; all private keys end their life stolen, lost or expired. On 2018-06-22 20:31, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...

As all the string in python are immutable, is impossible to overwrite the value or to make a "secure disposal" (overwrite-then-free) of a string using something like:

...
...
...
a = "something to hide" a = "x"*len(a)

This will lead on the process memory "something to hide" and "x" repeated len(a) times.

- Who cares? Why is this relevant? Well if you handle some sensitive information like CC numbers, Passwords, PINs, or other kind of information you wanna minimize the chance of leaking any of it.

- How this "leak" can happen? If you get a core/memory dump of an app handling sensitive information you will get all the information on that core exposed!

- Well, so what we can do about this? I propose to make the required changes on the string objects to add an option to overwrite the underlying buffer. To do so:

* Add a wiped as an attribute that is read-only to be set when the string is overwrited. * Add a wipe() method that overwrite the internal string buffer.

So this will work like this:

...
...
...
pwd =getpass.getpass('Set your password:') # could be other sensitive data. encrypted_pwd = crypt.crypt(pwd) # crypt() just as example. pwd.wiped # Check if pwd was wiped. False pwd.wipe() # Overwrite the underlying buffer pwd.wiped # Check if pwd was wiped. True print(pwd) # Print noise (or empty str?) del pwd # Now is in hands of the GC.

The wipe method immediately overwrite the underlying string buffer, setting wiped as True for reference so if the string is further used this can be checked to confirm that the change was made by a wipe and not by another procedure. Also initially the idea is to use unicode NULL datapoint to overwrite the string, but this could be change to let the user parametrize it over wipe() method. An alternative to this is to add a new exception "WipedError" that could be throw where the string is accessed again, but I found this method too disruptive to implement for a normal/standard string workflow usage. Quick & Dirty FAQ:

- You do it wrong!, the correct code to do that in a secure way is:

...
...
...
pwd = crypt.crypt(getpass.getpass('Set your password')) Don't you know that fool?

Well no, the code still generate a temporary string in memory to pass to crypt. But now this string is lying there and can't be accessed for an overwrite with wipe()

- Why not create a new type like in C# or Java?

I see that this tend to disrupt the usual workflow of string usage. Also the idea here is not to offer secure storage of string in memory because there is already a few mechanism to achieve with the current Python base. I just want to have the hability to overwrite the buffer.

- Why don't use one of the standard algorithms to overwrite like DoD5220 or MIL-STD-414?

This kind of standard usually are oriented for usage on persistent storage, specially on magnetic media for where the data could be "easily" recoverd. But this could ve an option that could be implemented adding the option to plug a function that do the overwrite work inside the wipe method.

- This is far beyond of the almost implementation-agnostic definition of the python lang. How about to you make a module with this functionality and left the lang as is?

Well I already do it:

https://github.com/qlixed/python-memwiper/ <https://github.com/qlixed/python-memwiper/>

But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal. I think that this kind of security things needs to be tackled from within the language itself specially when the lang have GC. I firmly believe that the security and protections needs to be part of the "with batteries" offer of Python. And I think that this is one little thing that could help a lot to secure our apps. Let me know what do you think!

~ Ezequiel (Ezekiel) Brizuela [ aka Qlixed ] ~

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

June 2018

12:45 a.m.

On Sat, Jun 23, 2018 at 10:31 AM, Ezequiel Brizuela [aka EHB or qlixed] <qlixed@gmail.com> wrote:

...

Since strings are immutable, it's entirely possible for them to be shared in various ways. Having the string be wiped while still existing seems to be a risky approach.

...

Guido van Rossum

1:30 a.m.

...

-- --Guido van Rossum (python.org/~guido)

Chris Angelico

1:35 a.m.

On Sat, Jun 23, 2018 at 11:30 AM, Guido van Rossum <guido@python.org> wrote:

...

Chris's proposal can be implemented, it would set a hidden flag. Hopefully there's room for the flag without increasing the object header size.

Greg Ewing

1:33 a.m.

Chris Angelico wrote:

...

Downside: You can't say "I'm done with this string, destroy it immediately".

Also it would be hard to be sure there wasn't another copy of the data somewhere from a time before you got around to marking the string as sensitive, e.g. in a file buffer. -- Greg

Steven D'Aprano

1:45 a.m.

On Sat, Jun 23, 2018 at 01:33:59PM +1200, Greg Ewing wrote:

...

Nathaniel Smith

5:21 a.m.

On Fri, Jun 22, 2018 at 6:45 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Steve Barnes

June 2018

12:31 a.m.

On 23/06/2018 06:21, Nathaniel Smith wrote:

...

Christian Heimes

12:54 p.m.

On 2018-06-23 07:21, Nathaniel Smith wrote:

...

Gregory P. Smith

1:04 p.m.

On Sat, Jun 23, 2018 at 12:57 PM Christian Heimes <christian@python.org> wrote:

...

Greg Ewing

5:31 p.m.

Christian Heimes wrote:

...

It's just a 90% secure solution, because the data will eventually land in public buffers.

Seems like the only completely foolproof solution would have to involve some kind of quantum storage that can't be copied without destroying it. -- Greg

Steven D'Aprano

7:04 p.m.

On Sat, Jun 23, 2018 at 09:54:43PM +0200, Christian Heimes wrote:

...

If you need to protect sensitive data like private keys, then don't load them into memory of your current process. It's that simple. :)

Terry Reedy

9 p.m.

On 6/22/2018 8:45 PM, Chris Angelico wrote:

...

But one can be careful about creating references, and in current CPython, deleting the last reference does mean destroy, and possibly wipe, immediately. -- Terry Jan Reedy

Chris Angelico

June 2018

4:08 a.m.

On Sat, Jun 23, 2018 at 2:00 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

Yes, you can, for the most part. It's certainly possible to get stung (eg exceptions retaining locals), but mostly it should be fine. How will other Pythons handle this? ChrisA

Guido van Rossum

4:42 a.m.

On Fri, Jun 22, 2018 at 9:11 PM Chris Angelico <rosuav@gmail.com> wrote:

...

How will other Pythons handle this?

M.-A. Lemburg

12:11 p.m.

On 23.06.2018 02:45, Chris Angelico wrote:

...

Chris Angelico

7:55 p.m.

On Sat, Jun 23, 2018 at 10:11 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...

Ezequiel Brizuela [aka EHB or qlixed]

7:11 p.m.

El vie., 22 de jun. de 2018 21:46, Chris Angelico <rosuav@gmail.com> escribió:.

...

(...)

...

Downside: You can't say "I'm done with this string, destroy it immediately".

That is the main issue with this approach. The proposed one is inmmediate but I understand that is risky.

Terry Reedy

1:32 a.m.

On 6/22/2018 8:31 PM, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...

As all the string in python are immutable, is impossible to overwrite the value

Not if one uses ctypes. Is that what you did?

...

Well I already do it:

https://github.com/qlixed/python-memwiper/

...

But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal.

Ezequiel Brizuela [aka EHB or qlixed]

June 2018

1:02 p.m.

El vie., 22 de jun. de 2018 22:33, Terry Reedy <tjreedy@udel.edu> escribió:

...

On 6/22/2018 8:31 PM, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...
As all the string in python are immutable, is impossible to overwrite the value

Not if one uses ctypes. Is that what you did?

No. I was using exclusivelly python strings functions from the C api.

...

Well I already do it:

...
https://github.com/qlixed/python-memwiper/

...
But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal.

I think it is. A very small fraction of Python users need such wiping.

And I doubt that it can be complete. For instance, I suspect that a

...

password entered into getpass, for instance, first exists in OS form before being copied into a Python string objects. Wiping the Python string would not wipe the original copy.

Agree. It migth be more places to search. So this really should be

...

attacked at the OS level, not the language level.

This need to be tackled from all the sides. Ensuring the minimal attack surface possible for anyone.

Paul Moore

4:13 a.m.

On 23 June 2018 at 01:31, Ezequiel Brizuela [aka EHB or qlixed] <qlixed@gmail.com> wrote:

...

Paul Moore

4:16 a.m.

On 23 June 2018 at 12:13, Paul Moore <p.f.moore@gmail.com> wrote:

...

Greg Ewing

5:14 p.m.

Paul Moore wrote:

...

But in order to create the SafeStr, you need to first have the data in the form of an ordinary non-safe string. How do you dispose of that safely? -- Greg

Terry Reedy

7:44 p.m.

On 6/23/2018 8:14 PM, Greg Ewing wrote:

...

getpass could return a SafeStr (or SafeBytes?). SafeStr could be initialized from a sequence of ints. -- Terry Jan Reedy

Paul Moore

1:10 p.m.

On 24 June 2018 at 03:44, Terry Reedy <tjreedy@udel.edu> wrote:

...

Stephan Houben

June 2018

6:57 a.m.

...

As all the string in python are immutable, is impossible to overwrite the value or to make a "secure disposal" (overwrite-then-free) of a string using something like:

...
...
...
a = "something to hide" a = "x"*len(a)

This will lead on the process memory "something to hide" and "x" repeated len(a) times.

- Who cares? Why is this relevant? Well if you handle some sensitive information like CC numbers, Passwords, PINs, or other kind of information you wanna minimize the chance of leaking any of it.

- How this "leak" can happen? If you get a core/memory dump of an app handling sensitive information you will get all the information on that core exposed!

- Well, so what we can do about this? I propose to make the required changes on the string objects to add an option to overwrite the underlying buffer. To do so:

* Add a wiped as an attribute that is read-only to be set when the string is overwrited. * Add a wipe() method that overwrite the internal string buffer.

So this will work like this:

...
...
...
pwd =getpass.getpass('Set your password:') # could be other sensitive data. encrypted_pwd = crypt.crypt(pwd) # crypt() just as example. pwd.wiped # Check if pwd was wiped. False pwd.wipe() # Overwrite the underlying buffer pwd.wiped # Check if pwd was wiped. True print(pwd) # Print noise (or empty str?) del pwd # Now is in hands of the GC.

The wipe method immediately overwrite the underlying string buffer, setting wiped as True for reference so if the string is further used this can be checked to confirm that the change was made by a wipe and not by another procedure. Also initially the idea is to use unicode NULL datapoint to overwrite the string, but this could be change to let the user parametrize it over wipe() method. An alternative to this is to add a new exception "WipedError" that could be throw where the string is accessed again, but I found this method too disruptive to implement for a normal/standard string workflow usage.

Quick & Dirty FAQ:

- You do it wrong!, the correct code to do that in a secure way is:

...
...
...
pwd = crypt.crypt(getpass.getpass('Set your password')) Don't you know that fool?

Well no, the code still generate a temporary string in memory to pass to crypt. But now this string is lying there and can't be accessed for an overwrite with wipe()

- Why not create a new type like in C# or Java?

I see that this tend to disrupt the usual workflow of string usage. Also the idea here is not to offer secure storage of string in memory because there is already a few mechanism to achieve with the current Python base. I just want to have the hability to overwrite the buffer.

- Why don't use one of the standard algorithms to overwrite like DoD5220 or MIL-STD-414?

This kind of standard usually are oriented for usage on persistent storage, specially on magnetic media for where the data could be "easily" recoverd. But this could ve an option that could be implemented adding the option to plug a function that do the overwrite work inside the wipe method.

- This is far beyond of the almost implementation-agnostic definition of the python lang. How about to you make a module with this functionality and left the lang as is?

Well I already do it:

https://github.com/qlixed/python-memwiper/

But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal. I think that this kind of security things needs to be tackled from within the language itself specially when the lang have GC. I firmly believe that the security and protections needs to be part of the "with batteries" offer of Python. And I think that this is one little thing that could help a lot to secure our apps. Let me know what do you think!

~ Ezequiel (Ezekiel) Brizuela [ aka Qlixed ] ~

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Christian Heimes

12:28 p.m.

On 2018-06-23 15:57, Stephan Houben wrote:

...

Greg Ewing

5:26 p.m.

Christian Heimes wrote:

...

If the attacker can attach a debugger to your process, they can already do a lot worse than snoop on your secret strings. -- Greg

Ezequiel Brizuela [aka EHB or qlixed]

12:55 p.m.

El sáb., 23 de jun. de 2018 10:58, Stephan Houben <stephanh42@gmail.com> escribió:

...

Christian Heimes

1:43 p.m.

On 2018-06-23 21:55, Ezequiel Brizuela [aka EHB or qlixed] wrote:

...

If you have core dumps enabled, then memory wiping will not help against accidental leakage of sensitive data.

Kyle Lahnakoski

12:20 p.m.

...

As all the string in python are immutable, is impossible to overwrite the value or to make a "secure disposal" (overwrite-then-free) of a string using something like:

...
...
...
a = "something to hide" a = "x"*len(a)

This will lead on the process memory "something to hide" and "x" repeated len(a) times.

- Who cares? Why is this relevant? Well if you handle some sensitive information like CC numbers, Passwords, PINs, or other kind of information you wanna minimize the chance of leaking any of it.

- How this "leak" can happen? If you get a core/memory dump of an app handling sensitive information you will get all the information on that core exposed!

- Well, so what we can do about this? I propose to make the required changes on the string objects to add an option to overwrite the underlying buffer. To do so:

* Add a wiped as an attribute that is read-only to be set when the string is overwrited. * Add a wipe() method that overwrite the internal string buffer.

So this will work like this:

...
...
...
pwd =getpass.getpass('Set your password:') # could be other sensitive data. encrypted_pwd = crypt.crypt(pwd) # crypt() just as example. pwd.wiped # Check if pwd was wiped. False pwd.wipe() # Overwrite the underlying buffer pwd.wiped # Check if pwd was wiped. True print(pwd) # Print noise (or empty str?) del pwd # Now is in hands of the GC.

The wipe method immediately overwrite the underlying string buffer, setting wiped as True for reference so if the string is further used this can be checked to confirm that the change was made by a wipe and not by another procedure. Also initially the idea is to use unicode NULL datapoint to overwrite the string, but this could be change to let the user parametrize it over wipe() method. An alternative to this is to add a new exception "WipedError" that could be throw where the string is accessed again, but I found this method too disruptive to implement for a normal/standard string workflow usage. Quick & Dirty FAQ:

- You do it wrong!, the correct code to do that in a secure way is:

...
...
...
pwd = crypt.crypt(getpass.getpass('Set your password')) Don't you know that fool?

Well no, the code still generate a temporary string in memory to pass to crypt. But now this string is lying there and can't be accessed for an overwrite with wipe()

- Why not create a new type like in C# or Java?

I see that this tend to disrupt the usual workflow of string usage. Also the idea here is not to offer secure storage of string in memory because there is already a few mechanism to achieve with the current Python base. I just want to have the hability to overwrite the buffer.

- Why don't use one of the standard algorithms to overwrite like DoD5220 or MIL-STD-414?

This kind of standard usually are oriented for usage on persistent storage, specially on magnetic media for where the data could be "easily" recoverd. But this could ve an option that could be implemented adding the option to plug a function that do the overwrite work inside the wipe method.

- This is far beyond of the almost implementation-agnostic definition of the python lang. How about to you make a module with this functionality and left the lang as is?

Well I already do it:

https://github.com/qlixed/python-memwiper/ <https://github.com/qlixed/python-memwiper/>

But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal. I think that this kind of security things needs to be tackled from within the language itself specially when the lang have GC. I firmly believe that the security and protections needs to be part of the "with batteries" offer of Python. And I think that this is one little thing that could help a lot to secure our apps. Let me know what do you think!

~ Ezequiel (Ezekiel) Brizuela [ aka Qlixed ] ~

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

2460

Age (days ago)

2461

Last active (days ago)

List overview

Download

30 comments

14 participants

participants (14)

Chris Angelico
Christian Heimes
Ezequiel Brizuela [aka EHB or qlixed]
Greg Ewing
Gregory P. Smith
Guido van Rossum
Kyle Lahnakoski
M.-A. Lemburg
Nathaniel Smith
Paul Moore
Stephan Houben
Steve Barnes
Steven D'Aprano
Terry Reedy

Secure string disposal (maybe other inmutable seq types too?)

Stephan Houben

Stephan Houben

tags

participants (14)