[Patches] [ python-Patches-1101726 ] Patch for potential buffer overrun in tokenizer.c

Wed Feb 23 20:48:15 CET 2005

Patches item #1101726, was opened at 2005-01-13 16:45
Message generated for change (Comment added) made by doerwalter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101726&group_id=5470

Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch for potential buffer overrun in tokenizer.c

Initial Comment:
The fp_readl function in tokenizer.c has a potential
buffer overrun; see:

www.python.org/sf/1089395

It is also triggered by trying to generate the pycom
file for the Excel 11.0 typelib, which is where I ran
into it.

The attached patch allows successful generation of the
Excel file; it also runs the fail.py script from the
above report without an access violation.  It also
doesn't break any of the tests in the regression suite
(probably something like fail.py should be added as a
test).

It is not as efficient as it might be; with a function
for determining the number of unicode characters in a
utf8 string, you could avoid some memory allocations. 
Perhaps such a function should be added to unicodeobject.c?

And, of course, the patch definitely needs review.  I'm
especially concerned that my use of
tok->decoding_buffer might be violating some kind of
assumption that I missed.

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2005-02-23 20:48

Message:
Logged In: YES 
user_id=89016

I think the patch looks good. Staring at it for a while I
believe the reference counts are OK. Can you incorporate
fail.py into the test suite?

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2005-01-23 15:26

Message:
Logged In: YES 
user_id=86307

I'm attaching a new patch (tokenizer.c.2.diff), which should
be better since it avoids some unnecessary allocations and
decoding/encoding.  Hopefully, I haven't made any
unwarranted assumptions about UTF8.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101726&group_id=5470