[Baypiggies] quick question: regex to stop naughty control characters
dyoo at cs.wpi.edu
Wed Apr 25 22:22:51 CEST 2007
The question is slightly underdefined still; do you mind if I ask a few
> I accept UTF-8 strings and decode them to unicode objects.
Ok, so what we really have are bytes whose intended interpretation is
utf-8, yes? Is the input a unicode string? Or is it rather a sequence of
bytes (which Python often uses a regular string for)?
> I would like to check that the strings are no longer than 128 characters
Unfortunately, "characters" is ambiguous and has at least two meanings
these days. Do you mean 128 bytes, or 128 unicode characters? There's a
slight ambiguity here that needs to be cleared up before this problem can
Also, what part of this really requires regular expressions here? What
you've shown so far restricts a string by length, but that's already a
len(some_string) < 128
I have to assume it has something to do with the definition of
Does the check for reasonableness have to happen at the same time as the
test for length? Must the check for reasonableness happen before decoding
bytes assuming a utf-8 interpretation? Or can something like:
return (len(some_string < 128 and
> By "reasonable", I think the only thing I want to prevent are control
What do you mean by a "control character"? Can you be more specific about
the context that you're trying to guard?
I apologize about being pedantic, but form validation needs to be handled
methodically to be valuable.
Best of wishes!
More information about the Baypiggies