[New-bugs-announce] [issue1359] py3k: out of bounds read in PyUnicode_DecodeUnicodeEscape

Amaury Forgeot d'Arc report at bugs.python.org
Mon Oct 29 22:21:29 CET 2007

New submission from Amaury Forgeot d'Arc:

A correction for the problem found by GvR in change 58692:

> There's one mystery: if I remove ob_sstate from the PyStringObject struct,
> some (unicode) string literals are mutilated, e.g. ('\\1', '\1') prints
> ('\\1', '\t').  This must be an out of bounds write or something that I
> can't track down.  (It doesn't help that it doesn't occur in debug mode.
> And no, make clean + recompilation doesn't help either.)
> So, in the mean time, I just keep the field, renamed to 'ob_placeholder'.

I think I found the problem. It reproduces on Windows, with a slightly
different input
    >>> ('\\2','\1')
    ('\\2', '\n')
(the win32 release build is of the kind "optimized with debug info", so
using the debugger is possible)

The problem is in unicodeobject.c::PyUnicode_DecodeUnicodeEscape:
- the input buffer is not null-terminated
- when decoding octal escape, we increment s without checking if it is
still in the limits.
In my case, the "\1" was followed by a "2" in memory, hence the bogus
chr(0o12) == '\n'.

Also corrected a potential problem when the string ends with a \:
PyUnicode_DecodeUnicodeEscape("\\t", 1) should return an error.

components: Interpreter Core
files: unicodeEscape.diff
messages: 56933
nosy: amaury.forgeotdarc, gvanrossum
severity: normal
status: open
title: py3k: out of bounds read in PyUnicode_DecodeUnicodeEscape
versions: Python 3.0
Added file: http://bugs.python.org/file8658/unicodeEscape.diff

Tracker <report at bugs.python.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicodeEscape.diff
Type: application/octet-stream
Size: 1688 bytes
Desc: not available
Url : http://mail.python.org/pipermail/new-bugs-announce/attachments/20071029/a1541f32/attachment.obj 

More information about the New-bugs-announce mailing list