[Python-ideas] Secure string disposal (maybe other inmutable seq types too?)

Christian Heimes christian at python.org
Sat Jun 23 15:54:43 EDT 2018


On 2018-06-23 07:21, Nathaniel Smith wrote:
> On Fri, Jun 22, 2018 at 6:45 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> On Sat, Jun 23, 2018 at 01:33:59PM +1200, Greg Ewing wrote:
>>> Chris Angelico wrote:
>>>> Downside:
>>>> You can't say "I'm done with this string, destroy it immediately".
>>>
>>> Also it would be hard to be sure there wasn't another
>>> copy of the data somewhere from a time before you
>>> got around to marking the string as sensitive, e.g.
>>> in a file buffer.
>>
>> Don't let the perfect be the enemy of the good.
> 
> That's true, but for security features it's important to have a proper
> analysis of the threat and when the mitigation will and won't work;
> otherwise, you don't know whether it's even "good", and you don't know
> how to educate people on what they need to do to make effective use of
> it (or where it's not worth bothering).
> 
> Another issue: I believe it'd be impossible for this proposal to work
> correctly on implementations with a compacting GC (e.g., PyPy),
> because with a compacting GC strings might get copied around in memory
> during their lifetime. And crucially, this might have already happened
> before the interpreter was told that a particular string object
> contained sensitive data. I'm guessing this is part of why Java and C#
> use a separate type.
> 
> There's a lot of prior art on this in other languages/environments,
> and a lot of experts who've thought hard about it. Python-{ideas,dev}
> doesn't have a lot of security experts, so I'd very much want to see
> some review of that work before we go running off designing something
> ad hoc.
> 
> The PyCA cryptography library has some discussion in their docs:
> https://cryptography.io/en/latest/limitations/
> 
> One possible way to move the discussion forward would be to ask the
> pyca devs what kind of API they'd like to see in the interpreter, if
> any.

A while ago, I spent a good amount of time to investigate memory wiping
for hashlib and hmac module. Although I was only interested to perform
memory wiping in C code [1], I eventually gave up. It was too annoying
to create a platform and architecture independent implementation.
Because compilers do funny things and memset_s() isn't universally
available yet, it it requires code like

   static void * (* const volatile __memset_vp)(void *, int, size_t) =
(memset);

or assembler code like

   asm volatile("" : : "r"(s) : "memory");

to just work around compiler optimization. This doesn't even handle CPU
architecture, virtual memory, paging, core dumps, debuggers or other
things that can read memory or dump memory to disk.


I honestly believe, that memory wiping with the current standard memory
allocator won't do the trick. It might be possible to implement a 90%
solution with a special memory allocator. Said allocator would a
specially configured, mmap memory arena and perform wiping on realloc()
and free(). The secure area can be prevented from swapping with mlock(),
protected with mprotect() and possible hardware encrypted with
pkey_mprotect(). It's just a 90% secure solution, because the data will
eventually land in public buffers.

If you need to protect sensitive data like private keys, then don't load
them into memory of your current process. It's that simple. :) Bugs like
heartbleed were an issue, because private key were in the same process
space as the TLS/SSL code. Solutions like gpg-agent, ssh-agent, TPM,
HSM, Linux's keyring and AF_ALG socket all aim to offload operations
with private key material into a secure subprocess, Kernel space or
special hardware.


[1] https://bugs.python.org/issue17405



More information about the Python-ideas mailing list