[New-bugs-announce] [issue8859] split() splits on non whitespace char when ther is no separator given.

Peter Landgren report at bugs.python.org
Sun May 30 20:54:14 CEST 2010


New submission from Peter Landgren <peter.talken at telia.com>:

When the variable label is equal to '\xc5\xa0 Z\nX W'
this line sequence
label = " ".join(label.split())
label = unicode(label)
results in:
7347: ERROR: gramps.py: line 138: Unhandled exception
Traceback (most recent call last):
  File "C:\Program Files (x86)\gramps\gui\views\listview.py", line 660, in row_changed
    self.uistate.modify_statusbar(self.dbstate)
  File "C:\Program Files (x86)\gramps\DisplayState.py", line 521, in modify_statusbar
    name, obj = navigation_label(dbstate.db, nav_type, active_handle)
  File "C:\Program Files (x86)\gramps\Utils.py", line 1358, in navigation_label
    label = unicode(label)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data

While this line sequence:
label = unicode(label)
label = " ".join(label.split())
gives correct result and no error.

With the error the variable label changes from
'\xc5\xa0 Z\nX W'
to
'\xc5 Z X W'
by the line:
label = " ".join(label.split())
Note '\xa0' has been dropped, interpreted as "whitespace"?
This happens on Windows. It works perfectly well on Linux.

----------
components: Library (Lib)
messages: 106773
nosy: PeterL
priority: normal
severity: normal
status: open
title: split() splits on non whitespace char when ther is no separator given.
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8859>
_______________________________________


More information about the New-bugs-announce mailing list