Matching horizontal white space

Ben Finney bignose+hates-spam at benfinney.id.au
Sun Sep 14 18:55:20 EDT 2008


Magnus.Moraberg at gmail.com writes:

> multipleSpaces = re.compile(u'\\h+')
> 
> importantTextString = '\n  \n  \n \t\t  '
> importantTextString = multipleSpaces.sub("M", importantTextString)

Please get into the habit of following the Python coding style guide
<URL:http://www.python.org/dev/peps/pep-0008>.

For literal strings that you expect to contain backslashes, it's often
clearer to use the "raw" string syntax:

    multiple_spaces = re.compile(ur'\h+')

> I would have expected consecutive spaces and tabs to be replaced by
> M

Why, what leads you to expect that? Your regular expression doesn't
specify spaces or tabs. It specifies "the character 'h', one or more
times".

For "space or tab", specify a character class of space and tab:

    >>> multiple_spaces = re.compile(u'[\t ]+')
    >>> important_text_string = u'\n  \n  \n \t\t  '
    >>> multiple_spaces.sub("M", important_text_string)
    u'\nM\nM\nM'


You probably want to read the documentation for the Python 're' module
<URL:http://www.python.org/doc/lib/module-re>. This is standard
practice when using any unfamiliar module from the standard library.

-- 
 \           “If you do not trust the source do not use this program.” |
  `\                                —Microsoft Vista security dialogue |
_o__)                                                                  |
Ben Finney



More information about the Python-list mailing list