Finding Peoples' Names in Files
byte8bits at gmail.com
Thu Oct 11 21:50:00 CEST 2007
Chris Mellon wrote:
> In case you're doing this for PCI validation, be aware that just the
> CC number is considered sensitive and you'd get some false negatives
> if you filter on anything except that.
> Random strings that match CC checksums are really quite rare and false
> positives from that alone are unlikely to be a problem. Unless I
> deployed this and there was a significant false positive rate I
> wouldn't risk the false negatives, personally.
Yes, it is for PCI. Our rate of false positives is low, very low. I
wasn't aware that a number alone was a PCI violation. Thank you! On
another note, we're a university (Virginia Tech) and we're subject to
FERPA, HIPPA, GLBA, etc... in addition to PCI. So we do these checks for
U.S. Social Security Numbers too in an effort to prevent or lessen the
chance of ID theft. Unfortunately, there is no luhn check for SSNs. We
follow the Social Security Administration verification guideline
religiously... here's an web front-end to my logic:
but still have many false positives on SSNs, so being able to id *names
and numbers* in files would still be a be benefit to us.
More information about the Python-list