[docs] [issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
report at bugs.python.org
Fri May 3 20:00:17 EDT 2019
New submission from mbiggs <pythonbugs at doubleplum.net>:
In the Unicode HOWTO: http://docs.python.org/3.3/howto/unicode.html
It says the following:
"UTF-8 has several convenient properties:
2. A Unicode string is turned into a sequence of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes."
This is not right. UTF-8 uses the zero byte to represent the Unicode codepoint U+0000 (the ASCII NULL character). This is a valid character in UTF-8 and is handled just fine by python's UTF-8 string encoding/decoding.
assignee: docs at python
nosy: docs at python, mbiggs
title: Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8
Python tracker <report at bugs.python.org>
More information about the docs