Report on non-breaking spaces in posts
Rhodri James
rhodri at kynesim.co.uk
Tue Oct 31 13:55:09 EDT 2017
On 31/10/17 17:23, Stefan Ram wrote:
> Ned Batchelder <ned at nedbatchelder.com> writes:
>> Â Â Â def wrapped_join(values, sep):
>
> Ok, here's a report on me seing non-breaking spaces in
> posts in this NG. I have written this report so that you
> can see that it's not my newsreader that is converting
> something, because there is no newsreader involved.
>
> Here are some relevant lines from Ned's above post:
>
> |From: Ned Batchelder <ned at nedbatchelder.com>
> |Newsgroups: comp.lang.python
> |Subject: Re: How to join elements at the beginning and end of the list
> |Message-ID: <mailman.95.1509464977.1490.python-list at python.org>
Hm. That suggests the mail-to-news gateway has a hand in things.
> |Content-Type: text/plain; charset=utf-8; format=flowed
> |Content-Transfer-Encoding: 8bit
> | Â Â Â def wrapped_join(values, sep):
[snippety snip]
> |od -c tmp.txt
> |...
> |0012620 s u l a t e i t : \n \n  Â
> |0012640 Â d e f w r a p p e d _
> |...
> |
> |od -x tmp.txt
> |...
> |0012620 7573 616c 6574 6920 3a74 0a0a c220 c2a0
> |0012640 c2a0 20a0 6564 2066 7277 7061 6570 5f64
> |...
>
> And you can see, there are two octet pairs »c220« and
> »c2a0« in the post (directly preceding »def wrapped«).
> (Compare with the Content-Type and Content-Transfer-Encoding
> given above.) (Read table with a monospaced font:)
>
> corresponding
> Codepoint UTF-8 ISO-8859-1 interpretation
>
> U+0020? c2 20 20? SPACE?
> U+00A0 c2 a0 a0 NON-BREAKING SPACE
>
> This makes it clear that there really are codepoints
> U+00A0 in what I get from the server, i.e., non-breaking
> spaces directly in front of »def wrapped«.
And? Why does that bother you? A non-breaking space is a perfectly
valid thing to put into a UTF-8 encoded message. The 0xc2 0x20 byte
pair that you misidentify as a space is another matter entirely.
0xc2 0x20 is not a space in UTF-8. It is an invalid code sequence. I
don't know how or where it was generated, but it really shouldn't have
been. It might have been Ned's MUA, or some obscure bug in the
mail-to-news gateway. Does anyone in a position to know have any opinions?
--
Rhodri James *-* Kynesim Ltd
More information about the Python-list
mailing list