Converting a string to the most probable type

Lie Lie.1296 at gmail.com
Sat Mar 8 12:27:34 EST 2008


On Mar 8, 8:34 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Fri, 07 Mar 2008 16:13:04 -0800, Paul Rubin wrote:
> > Pierre Quentel <quentel.pie... at wanadoo.fr> writes:
> >> I would like to know if there is a module that converts a string to a
> >> value of the "most probable type"
>
> >     Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
> >     >>> import this
> >     The Zen of Python, by Tim Peters
> >     ...
> >     In the face of ambiguity, refuse the temptation to guess.
>
> Good advice, but one which really only applies to libraries. At the
> application level, sometimes (but not always, or even most times!)
> guessing is the right thing to do.

Guessing should only be done when it have to be done. Users should
input data in an unambiguous way (such as using 4 digit years and
textual month name, this is the most preferred solution, as
flexibility is retained but ambiguity is ruled out) or be forced to
use a certain convention or be aware of how to properly input the
date. Guessing should be done at the minimum. Personally, when I'm
working with spreadsheet applications (in MS Office or OpenOffice) I
always input dates in an unambiguous way using 4-digit year and
textual month name (usually the 3-letter abbrevs for quicker
inputting), then I can confidently rely the spreadsheet to convert it
to its internal format correctly.

The general parsers like the OP wanted are easy to create if dates
aren't involved.

> E.g. spreadsheet applications don't insist on different syntax for
> strings, dates and numbers. You can use syntax to force one or the other,
> but by default the application will auto-detect what you want according
> to relatively simple, predictable and intuitive rules:
>
> * if the string looks like a date, it's a date;
> * if it looks like a number, it's a number;
> * otherwise it's a string.

The worse thing that can happen is when we input a date in a format we
know but the application can't parse and it consider it as a string
instead. This kind of thing can sometimes easily pass our nose. I
remembered I once formatted a column in Excel to write date with
certain style, but when I tried to input the date with the same style,
Excel can't recognize it, making the whole column rendered as useless
string and requiring me to reinput the dates again.

> Given the user-base of the application, the consequences of a wrong
> guess, and the ease of fixing it, guessing is the right thing to do.
>
> Auto-completion is another good example of guessing in the face of
> ambiguity. It's not guessing that is bad, but what you do with the guess.
>
> --
> Steven



More information about the Python-list mailing list