[Python-ideas] [issue33865] [EASY] Missing code page aliases: "unknown encoding: 874"

Karthikeyan tir.karthi at gmail.com
Mon Jun 18 11:07:45 EDT 2018


> BTW. “cp874” does exist according to the unicode consortium: 
https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT, 
and appears to be a codepage for a (the?) Thai language.  The user might 
therefore be running Windows with a Thai locale.

This page 
<https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx> 
also lists 874 along with windows-874 as .NET name belonging to Thai 
language and doesn't mention cp-874. I don't have knowledge of .NET but 
just wanted to add this as a reference.

One another disadvantage of patching the search function (or adding any 
alias for digit only encoding assuming cpXXXX) is that it prepends "cp" and 
it also assumes that aliases.py that takes precedence doesn't resolve 
correctly. Since some of the digit only encodings like '936' that 
corresponds to 'gbk' are added in aliases.py they don't get resolved as 
'cp936' for now. But if new digit only and non-cp encodings are added in 
future then they have to be added to the file so that precedence works 
instead of always resolving to cpXXXX encoding. I think this is noted at 
https://bugs.python.org/issue33865#msg319617.

It would be nice if the original poster provided some more context or 
environment to reproduce it than the screenshot which has limited 
information. I am keeping aside the search_function.patch and look forward 
to OP to reply back in the issue.

Thanks

PS : This is my first mailing list post. Kindly ignore if I am using wrong 
quoting mechanism.

On Monday, June 18, 2018 at 12:01:01 AM UTC+5:30, Ronald Oussoren wrote:
>
>
>
> On 17 Jun 2018, at 14:02, Stephen J. Turnbull <
> turnbull.... at u.tsukuba.ac.jp <javascript:>> wrote:
>
> Folks.  There are standards.  "1252" *is not* an alias for
> "windows-1252" according to the IANA, while "866" *is* an alias for
> "IBM866" according to the same authority.  Most 3-digit "IBMxxx" ARE
> aliased to both "cpxxx" and just "xxx", but not all.  None of
> "IBM874", "874", or "cp874" exists according to the IANA.
>
>
> Sure, but for at least one user Python 3.6 fails to start because 
> initialising the sys.std* streams fails due to not finding a “874” 
> encoding.   
>
> The user sadly enough didn’t provide more information on his machine, 
> other than that it is running some version of Windows. 
>
> BTW. “cp874” does exist according to the unicode consortium: 
> https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT, 
> and appears to be a codepage for a (the?) Thai language.  The user might 
> therefore be running Windows with a Thai locale.
>
> Ronald
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180618/3dfa1f56/attachment.html>


More information about the Python-ideas mailing list