[Doc-SIG] broken \ref links

Trent Mick trentm@ActiveState.com
Thu, 17 Oct 2002 19:08:04 -0700


[Neal Norwitz wrote]
> The second oldest bug on SF is:  http://python.org/sf/217195
> 
> \ref links are broken when there are multiple \refs on the same line.
> The problem seems to be in Doc/tools/node2label.pl around lines 47-57.
> 
> I really don't know perl.  I'm afraid to learn, :-) otherwise I'd
> suggest a fix.  If someone has suggestions though, I will try them.

... I might eventually have gotten there, but then I saw that Neal found
    the problem (use chomp() instead of chop()). I can verify that that
    is the problem. Read on if you want to see why.


I'll give a quick try by documenting the code in question:

    while (<>) {

# This while loop runs once for each input line (where the input files
# are '*.html' as called by Doc/tools/mkhowto.

      # don't want to do one s/// per line per node
      # so look for lines with hrefs, then do s/// on nodes present
      if (/(HREF|href)=[\"\']node\d+\.html[\#\"\']/) {

# The current line ($_) being processed has one or more HREF="..."
# strings in it. The line mentioned in the bug (from my Python 2.2 doc
# build is:
#  '<A HREF="node87.html#try">7.4</A> and <tt class="keyword">raise</tt> statement in section <A href="node77.html#raise">6.9</A>.\n'
#

        @parts = split(/(HREF|href)\=[\"\']/);

# to use Python list syntax:
# parts = ['<A ', 'HREF',
#          'node87.html#try">7.4</A> and <tt class="keyword">raise</tt> statement in',
#          'section <A ', 'href', 'node77.html#raise">6.9</A>.\n']

        shift @parts;

# parts = ['HREF',
#          'node87.html#try">7.4</A> and <tt class="keyword">raise</tt> statement in',
#          'section <A ', 'href', 'node77.html#raise">6.9</A>.\n']

        for $node (@parts) {

# One pass for each element ($node) of parts.

          $node =~ s/[\#\"\'].*$//g;

# After this:
#   node = 'HREF'
#   node = 'node87.html'
#   node = 'href'
#   node = 'node77.html\n'

          chop($node);    # Neal was right, the bug is here. (See WRONG
                          # below)

# Just want the foo.html part (strip newlines and anything from " or '
# or # on.
#   node = 'HRE'
#   node = 'node87.htm'   <---- WRONG
#   node = 'hre'
#   node = 'node77.html'

          if (defined($nodes{$node})) {
            $label = $nodes{$node};

# If 'node' is in the nodes dictionary, which is built from labels.pl,
# which in my build will result in:
#   nodes = {
#       'node87.html' : 'try',
#       'node77.html' : 'raise',
#       ...
#   }
# and because "node87.html" was mangled by chop() this lookup fails.

            if (s/(HREF|href)=([\"\'])$node([\#\"\'])/href=$2$label.html$3/g) {
              s/(HREF|href)=([\"\'])$label.html/href=$2$label.html/g;
              $newnames{$node} = "$label.html";
            }
          }
        }
      }
      print;
    }



Cheers,
Trent


-- 
Trent Mick
TrentM@ActiveState.com