str.title question after '

John Machin sjmachin at lexicon.net
Mon Nov 13 12:21:32 CET 2006


Antoon Pardon wrote:
> I have a text in ascii. I use the ' for an apostroph. The problem is
> this gives problems with the title method.  I don't want letters
> after a ' to be uppercased. Here are some examples:
>
>    argument       result          expected
>
>   't smidje       'T Smidje       't Smidje
>   na'ama          Na'Ama          Na'ama
>   al pi tnu'at    Al Pi Tnu'At    Al Pi Tnu'at
>
>
> Is there an easy way to get what I want?

Depends on your definition of "easy". Writing your own function that
will regard the apostrophe as a letter would be "easy" in my book.

>
> Should the current behaviour condidered a bug?

Its limitations could use some documentation.

> My would be inclined to answer yes, but that may be
> because this behaviour would be wrong in Dutch. I'm
> not so sure about english.
>

It's not very appropriate for English, either:

| >>> "didn't".title()
| "Didn'T"

It's OK for the English way of writing Irish surnames e.g. O'Brien, but
not IMHO very good behaviour for anything else.

The docs say: "Return a titlecased version of the string: words start
with uppercase characters, all remaining cased characters are
lowercase." Evidently the definition of "word" is the culprit.

Doing titlecasing properly depends heavily on the language/locale and
what data you are working on. For example, in the UK and anywhere that
Scots have migrated in reasonable numbers, you would probably want to
do McDonald and MacDonald. Avoiding nonsenses like MacE and MacHin :-)
takes some effort and a look-up table, and may not be cost-effective.

A related problem: some people mistakenly try too hard to correct
perceived data entry errors and also produce nonsenses -- a colleague
of Dutch extraction occasionally received mail addressed to Mr O'Belt
:-)

Cheers,
John




More information about the Python-list mailing list