How to get a "screen" length of a multibyte string?

Steven D'Aprano steve+comp.lang.python at
Sun Nov 25 14:30:24 CET 2012

On Sun, 25 Nov 2012 22:12:33 +1100, Chris Angelico wrote:

> On Sun, Nov 25, 2012 at 9:19 PM, kobayashi <pg.koba at> wrote:
>> Hello,
>> Under platform that has fixed pitch font, I want to get a "screen"
>> length of a multibyte string
>> --- sample ---
>> s1 = u"abcdef"
>> s2 = u"あいう" # It has same "screen" length as s1's. print len(s1)  # Got
>> 6
>> print len(s2)  # Got 3, but I want get 6. --------------
>> Abobe can get a "character" length of a multibyte string. Is there a
>> way to get a "screen" length of a multibyte string?
> What do you mean by screen length? Do you mean the length in bytes? That
> depends on your encoding. Do you mean width of the displayed version?
> That depends on your font.

That's what I thought, but on doing some experimentation in my terminal, 
and doing some googling, I have come to the understanding that so-called 
monospaced (fixed-width) fonts may support *double column* characters as 
well as single column.

So the OP's example has:

s1 = u"abcdef"
s2 = u"あいう"

s1 has six single-column ("narrow") characters, while s2 has three double-
column ("wide") characters, and both strings should take up the same 
horizontal space on screen.

If you are reading this in a non-monospaced font, the width of each 
character is not fixed, the idea of columns doesn't really work, and the 
strings may not be the same width.

See for more detail.

Interestingly, Unicode supports wide versions of many non-EastAsian 
characters (presumably because pre-Unicode EastAsian encodings supported 
them). For example, run this code in Python:


which should output:


If your font supports this, you should see a single "A" as wide as the 
double "AA" beneath it. 

Curiously, in the monospaced font I am using to type this, the 
"fullwidth" (wide, two-column) A is actually 2/3rds the width of the 
standard ("halfwidth", narrow, one-column) A. Font designers -- can't 
live with them, can't take them out and shoot them.

Hans Mulder's suggestion:

from unicodedata import east_asian_width

def screen_length(s):
    return sum(2 if east_asian_width(c) == 'W' else 1 for c in s)

is almost right. The Unicode document above states:

In a broad sense, wide characters include W, F, and A (when in East Asian 
context), and narrow characters include N, Na, H, and A (when not in East 
Asian context).
[end quote]

from unicodedata import east_asian_width
def columns(s, eastasian_context=True):
    if eastasian_context:
        wide = 'WFA'
        wide = 'WF'
    return sum(2 if east_asian_width(c) in wide else 1 for c in s)

ought to do it for all but the most sophisticated text layout 
applications. For those needing much more sophistication, see here:


More information about the Python-list mailing list