From JoyceUlysses.txt -- words occurring exactly once
Thomas Passin
list1 at tompassin.net
Sat Jun 1 09:38:51 EDT 2024
On 6/1/2024 4:04 AM, Peter J. Holzer via Python-list wrote:
> On 2024-05-30 19:26:37 -0700, HenHanna via Python-list wrote:
>> hard to decide what to do with hyphens
>> and apostrophes
>> (I'd, he's, can't, haven't, A's and B's)
>
> Especially since the same character is used as both an apostrophe and a
> closing quotation mark. And while that's pretty unambiguous between to
> characters it isn't at the end of a word:
>
> This is Alex’ house.
> This type of building is called an ‘Alex’ house.
> The sentence ‘We are meeting at Alex’ house’ contains an apostrophe.
>
> (using proper unicode quotation marks. It get's worse if you stick to
> ASCII.)
>
> Personally I like to use U+0027 APOSTROPHE as an apostrophe and U+2018
> LEFT SINGLE QUOTATION MARK and U+2019 RIGHT SINGLE QUOTATION MARK as
> single quotation marks[1], but despite the suggestive names, this is not
> the common typographical convention, so your texts are unlikely to make
> this distinction.
>
> hp
>
> [1] Which I use rarely, anyway.
My usual approach is to replace punctuation by spaces and then to
discard anything remaining that is only one character long (or sometimes
two, depending on what I'm working on). Yes, OK, I will miss words like
"I". Usually I don't care about them. Make exceptions to the policy if
you like.
More information about the Python-list
mailing list