[issue20574] Implement incremental decoder for cp65001

STINNER Victor report at bugs.python.org
Sun Feb 9 14:18:25 CET 2014


New submission from STINNER Victor:

(Follow up of issue #20538 and #20571.) Attached patch implements incremental decoders for multibyte code pages (on Windows), especially for CP_UTF8 aka "cp65001" in Python.

Code pages 932, 936, 949, 950 and 1361 already have an incremental decoder since:
---
changeset:   38817:549c547700af
branch:      legacy-trunk
user:        Martin v. Löwis <martin at v.loewis.de>
date:        Wed Jun 14 05:21:04 2006 +0000
files:       Doc/api/concrete.tex Include/unicodeobject.h Lib/encodings/mbcs.py Misc/NEWS Modules/_codecsmodule.c Objects/unicodeobject.c
description:
Patch #1455898: Incremental mode for "mbcs" codec.
---

Python currently uses IsDBCSLeadByteEx():
http://msdn.microsoft.com/en-us/library/windows/desktop/dd318667%28v=vs.85%29.aspx

And CharPrevA():
http://msdn.microsoft.com/en-us/library/windows/desktop/ms647471%28v=vs.85%29.aspx

But IsDBCSLeadByteEx() only supports code pages 932, 936, 949, 950 and 1361.

Python supports the code page 65001 (codec "cp65001") since Python 3.3. New tests on incremental decoders were added in Python 3.4: I addedd a skip for cp65001 since it was not supported (#20571). This issue implements the incremental decoder and so removes the skip.

I prefer to wait for Python 3.5 (not rush for add this new feature after 3.4 beta 3). cp65001 is mostly used for output (sys.stdout/sys.stderr) on Windows, not for input.

----------
files: incremental_cp_utf8.patch
keywords: patch
messages: 210759
nosy: haypo, larry, loewis, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Implement incremental decoder for cp65001
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file34008/incremental_cp_utf8.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20574>
_______________________________________


More information about the Python-bugs-list mailing list