[New-bugs-announce] [issue23215] MemoryError with custom error handlers and multibyte codecs

Aleksi Torhamo report at bugs.python.org
Sat Jan 10 04:33:03 CET 2015

New submission from Aleksi Torhamo:

Using a multibyte codec and a custom error handler that ignores errors to encode a string that contains characters not representable in said encoding causes exponential growth of the output buffer, raising MemoryError.

The problem is in multibytecodec_encerror() and REQUIRE_ENCODEBUFFER() in Modules/cjkcodecs/multibytecodec.c. multibytecodec_encerror() always uses REQUIRE_ENCODEBUFFER() to ensure there's enough space for the replacement string, and if more space is needed, REQUIRE_ENCODEBUFFER() calls expand_encodebuffer(), which in turn always grows the buffer by at least 50%. However, if size < 1, REQUIRE_ENCODEBUFFER() doesn't check if more space is actually needed. (It's used with negative values in other places)

I have no idea why the condition was originally size < 1 instead of size < 0, but changing it seems to fix this. The replacement string case is also the only use of the macro that may use 0 as the argument. 

In the patch, I've instead wrapped the REQUIRE_ENCODEBUFFER() (and memcpy) in a if(size > 0), since that's what the corresponding part in multibytecodec_decerror() did in the past:

Not sure which one makes more sense.

As for the tests, I'm not sure if 1) all of the affected encodings should be tested or only one (or even all encodings, affected or not?) and 2) whether it should be a new test or if I should just add it to test_longstrings in Lib/test/test_codeccallbacks.py. (Structurally it's a perfect fit, but it really isn't a "long string" test as it can happen with <50 characters) At the moment, the patch is testing affected encodings in a separate test.

Is the test philosophy "as thorough as possible" or "as fast as possible"?

components: Interpreter Core
files: python_codec_crasher.py
messages: 233800
nosy: alexer
priority: normal
severity: normal
status: open
title: MemoryError with custom error handlers and multibyte codecs
type: resource usage
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5
Added file: http://bugs.python.org/file37659/python_codec_crasher.py

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list