Can utf-8 encoded character contain a byte of TAB?
Random832
random832 at fastmail.com
Mon Jan 15 10:09:41 EST 2018
On Mon, Jan 15, 2018, at 09:35, Peter Otten wrote:
> Peng Yu wrote:
>
> > Can utf-8 encoded character contain a byte of TAB?
>
> Yes; ascii is a subset of utf8.
>
> If you want to allow fields containing TABs in a file where TAB is also the
> field separator you need a convention to escape the TABs occuring in the
> values. Nothing I see in your post can cope with that, but the csv module
> can, by quoting field containing the delimiter:
Just to be clear, TAB *only* appears in utf-8 as the encoding for the actual TAB character, not as a part of any other character's encoding. The only bytes that can appear in the utf-8 encoding of non-ascii characters are starting with 0xC2 through 0xF4, followed by one or more of 0x80 through 0xBF.
More information about the Python-list
mailing list