[Tutor] value range checker

Fri Aug 28 12:03:21 CEST 2015

----------------------------------------
> To: tutor at python.org
> From: alan.gauld at btinternet.com
> Date: Wed, 26 Aug 2015 17:29:08 +0100
> Subject: Re: [Tutor] value range checker
>
> On 26/08/15 14:19, Albert-Jan Roskam wrote:
>
>> I have a written a function checks the validity of values.
>> The ranges of valid values are stored in a database table.
>
> That's an unusual choice because:
>
> 1) using a database normally only makes sense in the case
> where you are already using the database to store the
> other data. But in that case you would normally get
> validation done using a database constraint.

The other data are indeed also stored in the database. But CHECK constraints seem an attractive mechanism

to get validation done. It should be possible to define a CHECK contstraint with CASE statements in the CREATE TABLE definition.

This page even describes how to do a modulus-11 check, which I also need: 

https://www.simple-talk.com/sql/learn-sql-server/check-your-digits/

Will the Python exceptions be clear enough if a record is rejected because one or more contraints are not met? I mean, if I only get a ValueError or 

a TypeError because *one* of the 100 or so columns is invalid, this would be annoying. 

> 2) For small amounts of data the database introduces
> a significant overhead. Databases are good for handling
> large amounts of data.
>
> 3) A database is rather inflexible since you need to
> initialise it, create it, etc. Which limits the number
> of environments where it can be used.
>
>> Such a table contains three columns: category, min and max. ...
>> a category may be spread out over multiple records.
>
> And searching multiple rows is even less efficient.
>
>> Would yaml be a better choice? Some of the tables are close to 200 records.
>
> Mostly I wouldn't use a data format per-se (except for
> persistence between sessions). I'd load the limits into
> a Python set and let the validation be a simple member-of check.
>
> Unless you are dealing with large ranges rather than sets
> of small ranges. Even with complex options I'd still
> opt for a two tier data structure. But mostly I'd query
> any design that requires a lot of standalone data validation.
> (Unless its function is to be a bulk data loader or similar.)
> I'd probably be looking to having the data stored as
> objects that did their own validation at creation/modification
> time.

The data are collected electronically, but also by paper-and-pencil. With a web page you can check all kinds of things right at the beginning. But that's not true for paper-and-pencil data collection.

Bottom line is that *only* electronic data collection would make things easier.

> If I was doing a bulk data loader/checker I'd probably create
> a validation function for each category and add it to a
> dictionary. So I'd write a make_validator() function that
> took the validation data and created a specific validator
> function for that category. Very simple example:
>
> def make_validator(min, max, *values):
> def validate(value):
> return (min <= value <= max) or value in *values)
> return validator

This looks simple and therefore attractive! Thank you!

> for category in categories:
> lookup[category] = make_validator(min,max, valueList)
> ...
> if lookup[category](my_value):
> # process valid value
> else:
> raise ValueError
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor