[Python-3000] basestring removal, file and co_filename

Thu Oct 11 23:23:22 CEST 2007

Gregory P. Smith wrote:
> Thats pretty much what Christian pondered at the start of this thread but
> with a defined "failure" mode.
> 
> +1 from me, give it a try and see what 3.0a2 testers say.  Are there OSes
> and filesystems out there that'd store in anything other than one of the
> popular codecs (UTF-8, 16, 32, latin1, mbcs)?  That seems like a bad idea to
> me but obviously I don't run the world.

I've implemented the method but my C is a bit rusty and not very good.
I'm not happy with the code especially with the large if else block.

PyObject*
PyUnicode_DecodeFSDefault(const char *string, Py_ssize_t length,
                          const char *errors)
{
    PyObject *v = NULL;
    char encoding[32], mangled[32], *encptr, *manptr;
    char tmp;

    if (errors != NULL)
        Py_FatalError("non-NULL encoding in PyUnicode_DecodeFSDefault");
    if ((length == 0) && *string)
        length = (Py_ssize_t)strlen(string);

    strncpy(encoding,
           Py_FileSystemDefaultEncoding ?
           Py_FileSystemDefaultEncoding : "UTF-8",
           31);
    encoding[31] = '\0';

    encptr = encoding;
    manptr = mangled;
    /* lower the string and remove non alpha numeric chars like '-' */
    while(*encptr) {
       tmp = *encptr++;
       if (isupper(tmp))
           tmp = tolower(tmp);
       if (!isalnum(tmp))
           continue;
       *manptr++ = tmp;
    }
    *manptr++ = '\0';

    if (mangled == "utf8")
        v = PyUnicode_DecodeUTF8(string, length, NULL);
    else if (mangled == "utf16")
        v = PyUnicode_DecodeUTF16(string, length, NULL, 0);
    else if (mangled == "utf32")
        v = PyUnicode_DecodeUTF32(string, length, NULL, 0);
    else if ((mangled == "latin1") || (mangled == "iso88591") ||
             (mangled == "iso885915"))
        v = PyUnicode_DecodeLatin1(string, length, NULL);
    else if (mangled == "ascii")
        v = PyUnicode_DecodeASCII(string, length, NULL);
#ifdef MS_WIN32
    else if (mangled = "mbcs")
        v = PyUnicode_DecodeMBCS(string, length, NULL);
#endif

    if (v == NULL)
        v = PyUnicode_DecodeUTF8(string, length, "replace");

    return (PyObject*)v;
}

[Python-3000] basestring removal, __file__ and co_filename

[Python-3000] basestring removal, file and co_filename