str.title() fails with words containing apostrophes
jussi.piitulainen at helsinki.fi
Tue Mar 7 02:12:38 EST 2017
Steve D'Aprano writes:
> On Tue, 7 Mar 2017 03:28 am, Grant Edwards wrote:
>> Besides locale-aware, it'll need to be style-guide-aware so that it
>> knows whether you want MLA, Chicago, Strunk & White, NYT, Gregg,
>> Mrs. Johnson from 9th grade English class, or any of a dozen or two
>> others. And that's just for US English. [For all I know, most of
>> the ones I listed agree completely on "title case", but I doubt it.]
> As far as I am aware, there are only two conventions for title case in
> Initial Capitals For All The Words In A Sentence.
> Initial Capitals For All the Significant Words in a Sentence.
> For some unstated, subjective rule for "significant" which usually
> means "three or more letters, excluding the definite article ('the')".
That's where the variation is hidden. I browsed three sites to see what
they do. One doesn't title-capitalize anything. One capitalizes
One was more interesting. I think it has human editors who pay attention
to these matters. They do not capitalize these short words: 'a', 'an',
'at', 'the', 'in', 'of', 'on', 'for', 'to', 'and', 'vs.'; they
capitalize longer prepositions: 'From', 'Into', 'With', 'Through'. Also
auxiliary verbs and copulas even when short.
A 'Nor' was capitalized in the middle of a title, but there was a
sentence boundary just before the 'Nor'. I'd classify 'nor' with 'and'
otherwise, but they might base the non-capitalization on frequency for
all I know.
Some two-letter words: 'Is', 'Am', 'Do', 'So', 'No', 'He', 'We', 'It',
'My', 'Up'; also 'Au Revoir', 'Oi Oi Oi', 'Ay Ay Ay'.
Then there is 'Grown-Ups' and 'Contrary-to-Fact' but 'X-ing'. Sometimes
a hyphen makes a word boundary, sometimes not.
> But of course there are exceptions: words which are necessarily in
> all-caps should stay in all-caps (e.g. NASA) and names.
There may be lots of these if you are handling something like a tech
news site that talks about people and companies and institutions from
all over the world. Names are tricky.
More information about the Python-list