[Python-3000] setup.py fails in the py3k-struni branch

Thu Jun 7 22:54:07 CEST 2007

Guido van Rossum wrote:
 > On 6/7/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
 >> On 6/7/07, Guido van Rossum <guido at python.org> wrote:
 >>> It's time to look at the original traceback (attached as "tb", after
 >>> fixing the formatting problems). it looks like any call to
 >>> encodings.normalize_encoding() causes this problem.
 >> Don't know if it will help to know that, but it seems adding a
 >> debugging print() in the normalize_encoding method, makes Python act
 >> weird:
 >>
 >>   >>> print("hello")  # no output
 >>   [38357 refs]
 >>   >>> hello?          # note the exception is not shown
 >>   [30684 refs]
 >>   >>> exit()          # does quit
 >
 > That's a bootstrapping issue. normalize_encoding() is apparently
 > called in order to set up stdin/stdout/stderr, so it shouldn't attempt
 > to touch those (or raise errors).
 >
 >>> I don't know why linking an extension avoids this, and why it's only
 >>> a problem for you and not for me, but that's probably a locale
 >>> setting (if you mail me the values of all your locale-specific
 >>> environment variables I can try to reproduce it).
 >> I don't think it is related to locales settings. Since even with a
 >> minimum number of environment variables, I still can reproduce the
 >> problem.
 >>
 >>   % sh
 >>   $ for v in `set | egrep -v 'OPTIND|PS|PATH' | cut -d "=" -f1`
 >>   > do unset $v; done
 >>   $ make
 >>   make: *** [sharedmods] Error 1
 >
 > Well, then it is up to you to come up with a hypothesis for why it
 > doesn't happen on my system. (I tried the above thing and it still
 > works.)

There's a couple of things going on here.

The "sharedmods" section of the makefile doesn't execute on every make 
depending on what options are set or what targets are built.  That is why 
the error doesn't occur on the first run after a 'make clean', and why it 
doesn't occur if some targets are rebuilt like _struct.so.  I'm not sure 
why it matters which files are built in this case.  <shrug>

Also if you have some make flags set then it may be avoiding that 
particular problem because the default 'all' section is never ran.

Does setup.py run without an error for you?  (Without the 
encodings.__init__.py patch.)   How about "make test".


I've ran across the same zero arg split error a while back when attempting 
to run 'make test'.  Below was the solution I came up with.  Is there going 
to be an unicode equivalent to the str.translate() method?

Cheers,
    Ron



Index: Lib/encodings/__init__.py
===================================================================

--- Lib/encodings/__init__.py   (revision 55388)
+++ Lib/encodings/__init__.py   (working copy)
@@ -34,19 +34,16 @@
  _cache = {}
  _unknown = '--unknown--'
  _import_tail = ['*']
-_norm_encoding_map = ('                                              . '
-                      '0123456789       ABCDEFGHIJKLMNOPQRSTUVWXYZ     '
-                      ' abcdefghijklmnopqrstuvwxyz                     '
-                      '                                                '
-                      '                                                '
-                      '                ')
+_norm_encoding_map = ('.0123456789'
+                      'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+                      'abcdefghijklmnopqrstuvwxyz')
+
  _aliases = aliases.aliases

  class CodecRegistryError(LookupError, SystemError):
      pass

  def normalize_encoding(encoding):
-
      """ Normalize an encoding name.

          Normalization works as follows: all non-alphanumeric
@@ -54,18 +51,12 @@
          collapsed and replaced with a single underscore, e.g. '  -;#'
          becomes '_'. Leading and trailing underscores are removed.

-        Note that encoding names should be ASCII only; if they do use
-        non-ASCII characters, these must be Latin-1 compatible.
+        Note that encoding names should be ASCII characters only; if they
+        do use non-ASCII characters, these must be Latin-1 compatible.

      """
-    # Make sure we have an 8-bit string, because .translate() works
-    # differently for Unicode strings.
-    if isinstance(encoding, str):
-        # Note that .encode('latin-1') does *not* use the codec
-        # registry, so this call doesn't recurse. (See unicodeobject.c
-        # PyUnicode_AsEncodedString() for details)
-        encoding = encoding.encode('latin-1')
-    return '_'.join(encoding.translate(_norm_encoding_map).split())
+    return ''.join([ch if ch in _norm_encoding_map else '_'
+                        for ch in encoding])


 >>> The trail leads back to the optparse module using the gettext module
 >>> to translate its error messages. That seems overengineered to me,
 >>> but I won't argue too strongly.
 >>>
 >>> In any case, the root cause is that normalize_encoding() is badly
 >>> broken. I've attached a hack that might fix it. Can you try if that
 >>> helps?
 >> Yep, that worked. What this new str8 type is for, btw? It is the second
 >> time I encounter it, today.
 >
 > It is the temporary new name for the old 8-bit str type. The plan is
 > to rename unicode->str and delete the old str type, but in the short
 > term that doesn't quite work because there is too much C code that
 > requires 8-bit strings (and can't be made to work with the bytes type
 > either). So for the time being I've renamed the old str type to str8
 > rather than deleting it altogether. Once we have things 99% working
 > tis way we'll make another pass to get rid of str8 completely -- or
 > perhaps keep it around under some other name with reduced
 > functionality (since there have been requests for an immutable bytes
 > type).