[Baypiggies] quick question: regex to stop naughty control characters

Shannon -jj Behrens jjinux at gmail.com
Wed Apr 25 21:44:35 CEST 2007


I'm doing some form validation.  I accept UTF-8 strings and decode
them to unicode objects.  I would like to check that the strings are
no longer than 128 characters, and that they are "reasonable".  I'm
using FormEncode with a regex that looks like r".{1,128}$".  By
"reasonable", I think the only thing I want to prevent are control
characters.  Now, I'm sure some Unicode whiz out there knows how to do
this with some funky Unicode regex magic, but I don't know how.
Anyone know the right way to do this?  Should I be worried about more
than just control characters?  I'm already taking care of HTML
escaping, SQL injection, etc.



