[Python-Dev] status of development documentation

Sun Dec 25 07:29:36 CET 2005

On 12/24/05, Tim Peters <tim.peters at gmail.com> wrote:
> [Tim]
> >> FWIW, test_builtin and test_pep263 both passed on WinXP in rev 39757.
> >> That's the last revision before the AST branch was merged.
> >>
> >> I can't build rev 39758 on WinXP (VC complains that pythoncore.vcproj
> >> can't be loaded -- looks like it got checked in with unresolved SVN
> >> conflict markers -- which isn't easy to do under SVN ;-( ), so don't
> >> know about that.
> >>
> >> The first revision at which Python built again was 39791 (23 Oct), and
> >> test_builtin and test_pep263 both fail under that the same way they
> >> fail today.
>
> [Brett]
> > Both syntax errors, right?
>
> In test_builtin, yes, two syntax errors.  test_pep263 is different:
>
> test test_pep263 failed -- Traceback (most recent call last):
>   File "C:\Code\python\lib\test\test_pep263.py", line 12, in test_pep263
>     '\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd'
> AssertionError:
> '\xc3\xb0\xc3\x89\xc3\x94\xc3\x8f\xc3\x8e' !=
> '\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd'
>
> That's not a syntax error, it's a wrong result.  There are other
> parsing-related test failures, but those are the only two I've written
> up so far (partly because I expect they all have the same underlying
> cause, and partly because nobody so far seems to understand the code
> well enough to explain why the first one works on any platform ;-)).
>
> > My mind is partially gone thanks to being on vacation so following this thread
> > has been abnormally hard.  =)
> >
> > Since it is a syntax error there won't be any bytecode to compare against.
>
> Shouldn't be needed.  The snippet:
>
>     bom = '\xef\xbb\xbf'
>     compile(bom + 'print 1\n', '', 'exec')
>
> treats the `bom` prefix like any other sequence of illegal characters.
>  That's why it raises SyntaxError:
>
>     It peels off the first character (\xef), and says "syntax
>     error" at that point:
>
>     Py_CompileStringFlags ->
>     PyParser_ASTFromString ->
>     PyParser_ParseStringFlagsFilename ->
>     parsetok ->
>     PyTokenizer_Get
>
>     That sets `a` to point at the start of the string, `b` to point at the
>     second character, and returns type==51.  Then `len` is set to 1,
>     `str` is malloc'ed to hold 2 bytes, and `str` is filled in with
>     "\xef\x00" (the first byte of the input, as a NUL-terminated C
>     string).
>
>     PyParser_AddToken then calls classify(), which falls off the end of
>     its last loop and returns -1:  syntax error.
>
> and later:
>
>     I'm getting a strong suspicion that I'm the only developer to _try_
>     building the trunk on WinXP since the AST merge was done, and that
>     something obscure is fundamentally broken with it on this box.  For
>     example, in tokenizer.c, these functions don't even exist on Windows
>     today (because an enclosing #ifdef says not to compile them):
>
>     error_ret
>     new_string
>     get_normal_name
>     get_coding_spec
>     check_coding_spec
>     check_bom
>     fp_readl
>     fp_setreadl
>     fp_getc
>     fp_ungetc
>     decoding_fgets
>     decoding_feof
>     buf_getc
>     buf_ungetc
>     buf_setreadl
>     translate_into_utf8
>     decode_str
>
>     OK, that's not quite true.  "Degenerate" forms of three of those
>     functions exist on Windows:
>
>     static char *
>     decoding_fgets(char *s, int size, struct tok_state *tok)
>     {
>            return fgets(s, size, tok->fp);
>     }
>
>     static int
>     decoding_feof(struct tok_state *tok)
>     {
>            return feof(tok->fp);
>     }
>
>     static const char *
>     decode_str(const char *str, struct tok_state *tok)
>     {
>           return str;
>     }
>
>     In the simple failing test, that degenerate decode_str() is getting
>     called.  If the "fancy" decode_str() were being used instead, that one
>     _does_ call check_bom().  Why do we have two versions of these
>     functions?  Which set is supposed to be in use now?  What's the
>     meaning of "#ifdef PGEN" today?  Should it be true or false?
>

Looking at the logs for tokenizer.c, tokenizer.h, and
tokenizer_pgen.c, it looks like this stuff has not been heavily
touched since Martin did stuff for PEP 263.

> >> I'm darned near certain that we're not using the _intended_ parsing
> >> code on Windows now -- PGEN is still #define'd when the "final"
> >> parsing code is compiled into python25.dll.  Don't know how to fix
> >> that (I don't understand it).
>
> > But the AST branch didn't touch the parser (unless you are grouping
> > ast.c and compile.c under the "parser" umbrella just to throw me off
> > =).
>
> Possibly.  See above for unanswered questions about tokenizer.c, which
> appears to be the whole problem wrt test_builtin.  Python couldn't be
> built under VC7.1 on Windows after the AST merge.  However that got
> repaired left parsing/tokenizing broken on Windows wrt (at least) some
> encoding gimmicks.  Since the tests passed immediately before the AST
> merge, and failed the first time Python could be built again after
> that merge, it's the only natural candidate for finger-wagging.
>

Did it lead to tokenizer_pgen.c to suddenly be used for the build
instead of tokenizer.c?  The former seems to be the only place where
PGEN is defined.

> > What can I do to help?
>
> I don't know.  Enjoying Christmas couldn't hurt :-)  What this needs
> is someone who understands how
>
>     bom = '\xef\xbb\xbf'
>     compile(bom + 'print 1\n', '', 'exec')
>
> is supposed to work at the front-end level.
>

Hopefully Martin will have some inkling since he committed the phase 1
stuff for PEP 263.

> >  Do you need me to step through something?
>
> Why doesn't the little code snippet above fail anywhere else?
> "Should" the degenerate decode_str() be getting called during it -- or
> should the other decode_str() be getting called?  If the latter, what
> got broke on Windows during the merge so that the wrong one is getting
> called now?
>

> > Do you need to know how gcc is preprocessing some file?
>
> No, I just need to know how to fix Python on Windows ;-)

=)

-Brett