hex dump w/ or w/out utf-8 chars

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Jul 9 02:46:40 EDT 2013


On Tue, 09 Jul 2013 00:32:00 +0100, MRAB wrote:

> On 08/07/2013 23:02, Joshua Landau wrote:
>> On 8 July 2013 22:38, MRAB <python at mrabarnett.plus.com> wrote:
>>> On 08/07/2013 21:56, Dave Angel wrote:
>>>> Characters do not have a width.
>>>
>>> [snip]
>>>
>>> It depends what you mean by "width"! :-)
>>>
>>> Try this (Python 3):
>>>
>>>>>> print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}")
>>> AA
>>
>> Serious question: How would one find the width of a character by that
>> definition?
>>
>  >>> import unicodedata
>  >>> unicodedata.east_asian_width("A")
> 'Na'
>  >>> unicodedata.east_asian_width("\N{FULLWIDTH LATIN CAPITAL LETTER
>  >>> A}")
> 'F'
> 
> The possible widths are:
> 
>      N  = Neutral
>      A  = Ambiguous
>      H  = Halfwidth
>      W  = Wide
>      F  = Fullwidth
>      Na = Narrow
> 
> All you then need to do is find out what those actually mean...

In some East-Asian encodings, there are code-points for Latin characters 
in two forms: "half-width" and "full-width". The half-width form took up 
a single fixed-width column; the full-width forms took up two fixed-width 
columns, so they would line up nicely in columns with Asian characters.

See also:

http://www.unicode.org/reports/tr11/

and search Wikipedia for "full-width" and "half-width".


-- 
Steven



More information about the Python-list mailing list