New GitHub issue #101421 from xmo-odoo:<br>

<hr>

<pre>

I'm not sure whether it's a bug or expected behaviour, but it seems odd so I figure reporting it is a good idea: while a precomposed character is considered "a word" by the regex engine (specifically `\w`), its decomposed form is not, because a diacritic is not considered part of a word.

```python

>>> s = "ö"

>>> list(s)

['ö']

>>> list(unicodedata.normalize('NFD', s))

['o', '̈']

>>> re.fullmatch(r'\w+', s)

<re.Match object; span=(0, 1), match='ö'>

>>> re.fullmatch(r'\w+', unicodedata.normalize('NFD', s))

```

This leads to odd effects when ingesting and filtering decomposed data.

Tested on 3.8.13, 3.10.6, and 3.11.1 (all installed via pyenv), on a Mint 21.1).

</pre>

<hr>

<a href="https://github.com/python/cpython/issues/101421">View on GitHub</a>

<p>Labels: type-bug</p>

<p>Assignee: </p>