[Tutor] finding special character string

Dinesh B Vadhia dineshbvadhia at hotmail.com
Tue Jun 3 11:13:02 CEST 2008


Yes, I'm happy because I found a non-regex way to solve the problem (see below).

No, I'm not a student or worn out but wish I was back at college and partying!

Yes, this is an interesting problem and here is the requirement:

- A text document contains special words that start and end with a period ("."), the word between the start and end periods contain no punctuation or spaces except a hyphen in some special words.
- Examples of special words include ".thrfore.", ".because.", '.music-sharp.", ".music-flat.", ".dbd.", ".vertline.", ".uparw.", ".hoarfrost." etc.
- In most cases, the special words have a space (" ") before and after.
- In some cases, a special word will be followed by one or two other special words eg. ".dbd..vertline." or ".music-flat..dbd..vertline."
- In some cases, a special word will be followed by an ordinary word (with or without punctuation) eg. ".music-flat.mozart" or ".vertline.isn't"
- A special word followed by an ordinary word (with or without punctuation) could be the end of a sentence and hence have a full-stop (".") eg. ".music-flat.mozart." or ".vertline.isn't."
- The number of characters in a special word excluding the two periods is > 1
- Find and remove all special words from the text document (by processing one line at a time)

How did I solve it?  I found a list of all the special words, created a set of special words and then checked if each word in the text belonged to the set of special words.  If we assume that the list of special words doesn't exist then the problem is interesting in itself to solve.

Cheers!

Dinesh


--------------------------------------------------------------------------------

Date: Sun, 1 Jun 2008 21:56:26 -0400
From: "Kent Johnson" <kent37 at tds.net>
Subject: Re: [Tutor] finding special character string
To: "Marilyn Davis" <marilyn at deliberate.com>
Cc: tutor at python.org
Message-ID:
<1c2a2c590806011856x1875665ep690353c7c2ebc3da at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Jun 1, 2008 at 9:41 PM, Marilyn Davis <marilyn at deliberate.com> wrote:

> Yeh, we need a better spec. I was wondering if the stuff between the text
> ought not include white space, or even a word boundary.  A character class
> might be better, if we knew.

Hmm, yes, my regex will find many ordinary sentences in plain text.

> Anyhow, I think we wore out the student. :^)

He went away happy after my first reply.

Kent


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20080603/1464a729/attachment.htm>


More information about the Tutor mailing list