Flexible string representation, unicode, typography, ...
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Aug 26 16:13:21 EDT 2012
On Sun, 26 Aug 2012 09:40:13 -0600, Ian Kelly wrote:
> I think the documentation for those functions is simply badly worded.
> The "width in bytes" it returns is not the width of the rune (which as
> jmf notes is simply an alias for int32 that stores a single code point).
Is this documented somewhere?
I can't tell you how long I spent unsuccessfully googling for variations
on "go language runes", which unsurprisingly mostly came back with pages
about Germanic runes and elf runes but not Go runes. I read the golang
FAQs, which mentioned Unicode *once* and runes not at all. Obviously Go
language programmers don't care much about Unicode.
> It means the UTF-8 width of the character, i.e. the number of UTF-8
> bytes the function "consumed", presumably so that the caller can then
> reslice the data with that many bytes fewer.
That makes sense, given the lousy string implementation and API they're
working with.
I note that not all 32-bit ints are valid code points. I suppose I can
see sense in having rune be a 32-bit integer value limited to those valid
code points. (But, dammit, why not call it a code point?) But if rune is
merely an alias for int32, why not just call it int32?
--
Steven
More information about the Python-list
mailing list