[Tutor] about regular expression

Magnus Lycka magnus@thinkware.se
Wed Mar 26 21:25:03 2003


At Wed, 26 Mar 2003 03:07:06 -0800 (PST), Abdirizak abdi wrote:
>Can any one suggest a regular _expression that can distinguish between two 
>fullstop(e.g : he got married last weeek.)  and abbreviated word (e.g Al. 
>Brown), I have tried  some but  I couldn't get them right. please give an 
>example each of these two.

There's no magic in regular expressions.

Would you be able to teach a person who could read
latin letters, but didn't understand English at all
how to do that? If you can do that in a clear and
unambious way, you can do the same with a computer.

But I think I'd be able to show you an example that
breaks your rule--for any rule you come up with...

Even if you knew all possible abbreviations, how would
you know whether an abbreviation ended a sentence or not
if it's followed by a word that is always capitlized?

The spoken languages are full of ambiguities. This can
simply not be done in a reliable way. There are probably
heuristic approaches that will lead to a big probablility
of correct guesses, but there is no way of knowing for
sure all the time. Not even if you know the language...

We use a lot of context to disambiguate written text, and
for spoken language we can't manage without the cues in
the voice and in gestures etc. (Just read transcripts of
what people have said in different situations. They are
usually full of ambiguity.) But since people are intelligent,
we can usually figure things out even if the message is
ambiguous.

Computers aren't clever though. Just look att automatic
translators, such as babelfish... Understanding language
is too tough for machines...

Look at the following cases:

a) I really liked Al. Washington is not as nice.

b) I really liked Al. Washington in that movie.

Case a) is two sentences, and b) is only one. Not that
people typically use a full stop when they abbreviate
Albert to Al, but that was your suggestion... I'm sure
we can make up other cases were it's unclear whther an
abbreviation ends a sentence unless we perform some
kind of non-trivial grammer analysis.

We can't even start sentences with capital letters all
the time:

"mm is the correct abbreviation for milli metre, Mm on
the other hand, means mega metre, or one million metres."

"2001 will go to history as a year that changed public
attitudes on threats against western countries."

Then there are cases where the same word / abbreviation
can mean different things. I'm pretty sure words like
"as" and "is" are used as abbreviations in some contexts.


-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se