[Baypiggies] quick question: regex to stop naughty control characters

Kelly Yancey kelly at nttmcl.com
Wed Apr 25 23:32:48 CEST 2007

Shannon -jj Behrens wrote::
> Hi,
> I'm doing some form validation.  I accept UTF-8 strings and decode
> them to unicode objects.  I would like to check that the strings are
> no longer than 128 characters, and that they are "reasonable".  I'm
> using FormEncode with a regex that looks like r".{1,128}$".  By
> "reasonable", I think the only thing I want to prevent are control
> characters.  Now, I'm sure some Unicode whiz out there knows how to do
> this with some funky Unicode regex magic, but I don't know how.
> Anyone know the right way to do this?  Should I be worried about more
> than just control characters?  I'm already taking care of HTML
> escaping, SQL injection, etc.
> Thanks,
> -jj


   It ain't pretty, but how about this:


   If python's re module implemented POSIX named character classes you 
could do this:

Or if it supported Unicode regular expressions as detailed in 
http://www.unicode.org/unicode/reports/tr18/, you could do this:

But alas, we aren't there yet. :(

   I hope that works for you,


More information about the Baypiggies mailing list