Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
Dave Angel
d at davea.name
Mon Oct 31 17:47:06 EDT 2011
On 10/31/2011 03:54 PM, python at bdurham.com wrote:
> Wondering if there's a fast/efficient built-in way to determine
> if a string has non-ASCII chars outside the range ASCII 32-127,
> CR, LF, or Tab?
>
> I know I can look at the chars of a string individually and
> compare them against a set of legal chars using standard Python
> code (and this works fine), but I will be working with some very
> large files in the 100's Gb to several Tb size range so I'd
> thought I'd check to see if there was a built-in in C that might
> handle this type of check more efficiently.
>
> Does this sound like a use case for cython or pypy?
>
> Thanks,
> Malcolm
>
How about doing a .replace() method call, with all those characters
turning into '', and then see if there's anything left?
--
DaveA
More information about the Python-list
mailing list