hex dump w/ or w/out utf-8 chars
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Jul 9 02:46:40 EDT 2013
On Tue, 09 Jul 2013 00:32:00 +0100, MRAB wrote:
> On 08/07/2013 23:02, Joshua Landau wrote:
>> On 8 July 2013 22:38, MRAB <python at mrabarnett.plus.com> wrote:
>>> On 08/07/2013 21:56, Dave Angel wrote:
>>>> Characters do not have a width.
>>>
>>> [snip]
>>>
>>> It depends what you mean by "width"! :-)
>>>
>>> Try this (Python 3):
>>>
>>>>>> print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}")
>>> AA
>>
>> Serious question: How would one find the width of a character by that
>> definition?
>>
> >>> import unicodedata
> >>> unicodedata.east_asian_width("A")
> 'Na'
> >>> unicodedata.east_asian_width("\N{FULLWIDTH LATIN CAPITAL LETTER
> >>> A}")
> 'F'
>
> The possible widths are:
>
> N = Neutral
> A = Ambiguous
> H = Halfwidth
> W = Wide
> F = Fullwidth
> Na = Narrow
>
> All you then need to do is find out what those actually mean...
In some East-Asian encodings, there are code-points for Latin characters
in two forms: "half-width" and "full-width". The half-width form took up
a single fixed-width column; the full-width forms took up two fixed-width
columns, so they would line up nicely in columns with Asian characters.
See also:
http://www.unicode.org/reports/tr11/
and search Wikipedia for "full-width" and "half-width".
--
Steven
More information about the Python-list
mailing list