[Expat-bugs] [ expat-Bugs-653180 ] problem with column and line numbers

noreply at sourceforge.net noreply at sourceforge.net
Sun Dec 15 08:20:05 EST 2002


Bugs item #653180, was opened at 2002-12-13 11:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=653180&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: MM Zeeman (mmzeeman)
Assigned to: Karl Waclawek (kwaclaw)
Summary: problem with column and line numbers 

Initial Comment:

XML_GetCurrentColumnNumber returns 2 times the  actual
column 
number. The same holds for XML_GetCurrentLineNumber. The 
XML_GetCurrentByteIndex function returns the correct
value. 

The expat version i'm using is: expat_1.95.5



----------------------------------------------------------------------

>Comment By: MM Zeeman (mmzeeman)
Date: 2002-12-15 17:20

Message:
Logged In: YES 
user_id=350634

No, this does not solve the problem.

tests/runtests
Running suite(s): basic
93%: Checks: 31, Failures: 2, Errors: 0
tests/runtests.c:342:F:basic tests: expected 4 lines, saw 7
tests/runtests.c:357:F:basic tests: expected 11 columns, saw 22
make: *** [check] Error 1

Btw, this whole line/column counting seems to be one big
side-effect ;) Why is
it not implemented as part of the normal scanning routines.
It seems like the buffer passed to XML_Parse will be checked
(again) just to adjust the line and column numbers. During
the "normal" scanning phase nothing is done to adjust the
line and column numbers. 

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-12-15 07:54

Message:
Logged In: YES 
user_id=290026

Try this - not tested:

in XML_Parse(), right after XMLUpdatePosition,
insert this line:

  eventPtr = eventEndPtr = end;

and in XML_ParseBuffer, also right after XMLUpdatePosition,
insert (as part of the conditional statement):

  eventPtr = eventEndPtr = bufferPtr;

That should hopefully prevent double counting.
Haven't really checked possible side-effects.


----------------------------------------------------------------------

Comment By: MM Zeeman (mmzeeman)
Date: 2002-12-15 06:48

Message:
Logged In: YES 
user_id=350634

Hmm,

It looks more like the line and column numbers are counted
double outside the handlers. When I added this piece of code
in front of the update routine the counting was correct.
Looking at the source if found that outside the handers the
update is called twice. It is first called in parse, and
later when you call either XML_GetCurrentLineNumber, or
XML_GetCurrentColumnNumber, because for some reason the
eventPtr is set. (Which to me seems like this should only be
the case in handlers, but I'm not sure)

 static void FASTCALL
PREFIX(updatePosition)(const ENCODING *enc,
                       const char *ptr,
                       const char *end,
                       POSITION *pos)
{     
  /* THIS IS NOT CORRECT
   * Yes, this is lame, but it helps to indicate a double 
   * counting the newlines and column number problem 
   */   
  static const char *s_ptr = NULL;
  static const char *s_end = NULL;
  if( (s_ptr == ptr) && (s_end == end))
    /* we already counted this one */
    return;
  s_ptr = ptr; s_end = end;

/* The rest of the function */

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-12-14 17:41

Message:
Logged In: YES 
user_id=290026

I have looked a little closer.
It seems the numbers are reported OK only if the
parser is at the end or beginning of a token, but not
in-between. Btw, the line/column number relates
to the start of the token, as far as I can tell from the source.

In your example, isFinal = 0, so the parser is in-between 
tokens and expects at least one more call to XML_Parse.
I am not sure it is OK to call the line/column number
functions in between calls to XML_Parse, since at this point
the parser is in the middle of processing a token, while
in the other situations (inside a handler, and when having an 
error) the parser knows exactly what the current token is
or is not.

One thing is sure: The documentation is not clear and 
sufficient. Also, from what I can tell, line numbers are
1-based and column numbers are 0-based. I don't think
that is a good idea. It should probably be consistent
with SAX, where both numbers are 1-based.


----------------------------------------------------------------------

Comment By: MM Zeeman (mmzeeman)
Date: 2002-12-14 17:22

Message:
Logged In: YES 
user_id=350634

When I set a start element handler the line number reporting
is fine, as long as no trailing newlines are inserted after
the last tag. When instead of a start element handler a end
element handler is set the line-number is always wrong.
Inside the handlers the line-numbers are always ok. 

I did not check the behavoir for xml_parse errors, or the
column number (yet).



----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-12-14 03:18

Message:
Logged In: YES 
user_id=290026

Does this also happen in other situations?
For instance:
- inside a handler
- when XML_Parse returns with an error?

----------------------------------------------------------------------

Comment By: MM Zeeman (mmzeeman)
Date: 2002-12-13 12:01

Message:
Logged In: YES 
user_id=350634

I forgot to add the tests:

START_TEST(test_line_number_maas)
{  
  char *text = "<tag>\n"
               "\n"
               "\n</tag>";
  int lineno;
  if (XML_Parse(parser, text, strlen(text), 0) ==
XML_STATUS_ERROR)
      xml_failure(parser);
  lineno = XML_GetCurrentLineNumber(parser);
  if (lineno != 4) {
      char buffer[100];
      sprintf(buffer, "expected 4 lines, saw %d", lineno);
      fail(buffer);
  }
}
END_TEST

START_TEST(test_column_number_maas)
{
  char *text = "<tag></tag>";
  int colno;
  if (XML_Parse(parser, text, strlen(text), 0) ==
XML_STATUS_ERROR)
            xml_failure(parser);
  colno = XML_GetCurrentColumnNumber(parser);
  if (colno != 11) {
      char buffer[100];
      sprintf(buffer, "expected 11 columns, saw %d", colno);
      fail(buffer);
  }
}
END_TEST

Added to the test suite this resulted in:

Running suite(s): basic
93%: Checks: 31, Failures: 2, Errors: 0
tests/runtests.c:342:F:basic tests: expected 4 lines, saw 7
tests/runtests.c:357:F:basic tests: expected 11 columns, saw 22

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=653180&group_id=10127



More information about the Expat-bugs mailing list