[pypy-svn] r63443 - in pypy/extradoc/talk/icooolps2009: . code

cfbolz at codespeak.net cfbolz at codespeak.net
Tue Mar 31 11:26:34 CEST 2009


Author: cfbolz
Date: Tue Mar 31 11:26:33 2009
New Revision: 63443

Added:
   pypy/extradoc/talk/icooolps2009/
   pypy/extradoc/talk/icooolps2009/Makefile
   pypy/extradoc/talk/icooolps2009/acm_proc_article-sp.cls
   pypy/extradoc/talk/icooolps2009/code/
   pypy/extradoc/talk/icooolps2009/code/full.txt   (contents, props changed)
   pypy/extradoc/talk/icooolps2009/code/no-green-folding.txt   (contents, props changed)
   pypy/extradoc/talk/icooolps2009/code/normal-tracing.txt   (contents, props changed)
   pypy/extradoc/talk/icooolps2009/code/tlr-paper-full.py   (contents, props changed)
   pypy/extradoc/talk/icooolps2009/code/tlr-paper.py   (contents, props changed)
   pypy/extradoc/talk/icooolps2009/paper.bib
   pypy/extradoc/talk/icooolps2009/paper.tex
Log:
A draft for a paper about the tracing JIT that I will maybe submit to ICOOOLPS.


Added: pypy/extradoc/talk/icooolps2009/Makefile
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/Makefile	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,10 @@
+
+pypy-jit.pdf: paper.tex paper.bib
+	pdflatex paper
+	bibtex paper
+	pdflatex paper
+	pdflatex paper
+	mv paper.pdf pypy-jit.pdf
+
+view: pypy-jit.pdf
+	evince pypy-jit.pdf &

Added: pypy/extradoc/talk/icooolps2009/acm_proc_article-sp.cls
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/acm_proc_article-sp.cls	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,1391 @@
+% ACM_PROC_ARTICLE-SP.CLS - VERSION 2.7SP
+% COMPATIBLE WITH THE "ACM_PROC_ARTICLE.CLS" V2.5
+% Gerald Murray October 15th., 2004
+%
+% ---- Start of 'updates'  ----
+%
+% Allowance made to switch default fonts between those systems using
+% METAFONT and those using 'Type 1' or 'Truetype' fonts.
+% See LINE NUMBER 266 for details.
+% Also provided for enumerated/annotated Corollaries 'surrounded' by
+% enumerated Theorems (line 838).
+% Gerry November 11th. 1999
+%
+% Made the Permission Statement / Conference Info / Copyright Info
+% 'user definable' in the source .tex file OR automatic if
+% not specified.
+% This 'sp' version does NOT produce the permission block.
+%
+% Major change in January 2000 was to include a "blank line" in between
+% new paragraphs. This involved major changes to the, then, acmproc-sp.cls  1.0SP
+% file, precipitating a 'new' name: "acm_proc_article-sp.cls" V2.01SP.
+%
+% Georgia fixed bug in sub-sub-section numbering in paragraphs (July 29th. 2002)
+% JS/GM fix to vertical spacing before Proofs (July 30th. 2002)
+%
+% Footnotes inside table cells using \minipage (Oct. 2002)
+%
+% ---- End of 'updates' ----
+%
+\def\fileversion{V2.7SP}          % for ACM's tracking purposes
+\def\filedate{October 15, 2004}    % Gerry Murray's tracking data
+\def\docdate {Friday 15th. October 2004} % Gerry Murray (with deltas to doc}
+\usepackage{epsfig}
+\usepackage{amssymb}
+\usepackage{amsmath}
+\usepackage{amsfonts}
+%
+% ACM_PROC_ARTICLE-SP  DOCUMENT STYLE
+% G.K.M. Tobin August-October 1999
+%    adapted from ARTICLE document style by Ken Traub, Olin Shivers
+%    also using elements of esub2acm.cls
+% LATEST REVISION V2.7SP - OCTOBER 2004
+% ARTICLE DOCUMENT STYLE -- Released 16 March 1988
+%    for LaTeX version 2.09
+% Copyright (C) 1988 by Leslie Lamport
+%
+%
+%%% ACM_PROC_ARTICLE-SP is a document style for producing two-column camera-ready pages for
+%%% ACM conferences, according to ACM specifications.  The main features of
+%%% this style are:
+%%%
+%%% 1)  Two columns.
+%%% 2)  Side and top margins of 4.5pc, bottom margin of 6pc, column gutter of
+%%%     2pc, hence columns are 20pc wide and 55.5pc tall.  (6pc =3D 1in, approx)
+%%% 3)  First page has title information, and an extra 6pc of space at the
+%%%     bottom of the first column for the ACM copyright notice.
+%%% 4)  Text is 9pt on 10pt baselines; titles (except main) are 9pt bold.
+%%%
+%%%
+%%% There are a few restrictions you must observe:
+%%%
+%%% 1)  You cannot change the font size; ACM wants you to use 9pt.
+%%% 3)  You must start your paper with the \maketitle command.  Prior to the
+%%%     \maketitle you must have \title and \author commands.  If you have a
+%%%     \date command it will be ignored; no date appears on the paper, since
+%%%     the proceedings will have a date on the front cover.
+%%% 4)  Marginal paragraphs, tables of contents, lists of figures and tables,
+%%%     and page headings are all forbidden.
+%%% 5)  The `figure' environment will produce a figure one column wide; if you
+%%%     want one that is two columns wide, use `figure*'.
+%%%
+%
+%%% Copyright Space:
+%%% This style automatically leaves 1" blank space at the bottom of page 1/
+%%% column 1.  This space can optionally be filled with some text using the
+%%% \toappear{...} command.  If used, this command must be BEFORE the \maketitle
+%%% command.  If this command is defined AND [preprint] is on, then the
+%%% space is filled with the {...} text (at the bottom); otherwise, it is
+%%% blank.  If you use \toappearbox{...} instead of \toappear{...} then a
+%%% box will be drawn around the text (if [preprint] is on).
+%%%
+%%% A typical usage looks like this:
+%%%     \toappear{To appear in the Ninth AES Conference on Medievil Lithuanian
+%%%               Embalming Technique, June 1991, Alfaretta, Georgia.}
+%%% This will be included in the preprint, and left out of the conference
+%%% version.
+%%%
+%%% WARNING:
+%%% Some dvi-ps converters heuristically allow chars to drift from their
+%%% true positions a few pixels. This may be noticeable with the 9pt sans-serif
+%%% bold font used for section headers.
+%%% You may turn this hackery off via the -e option:
+%%%     dvips -e 0 foo.dvi >foo.ps
+%%%
+\typeout{Document Class 'acm_proc_article-sp' <15th. October '04>.  Modified by G.K.M. Tobin}
+\typeout{Based in part upon document Style `acmconf' <22 May 89>. Hacked 4/91 by}
+\typeout{shivers at cs.cmu.edu, 4/93 by theobald at cs.mcgill.ca}
+\typeout{Excerpts were taken from (Journal Style) 'esub2acm.cls'.}
+\typeout{****** Bugs/comments/suggestions  to Gerry Murray -- murray at hq.acm.org ******}
+
+\oddsidemargin 4.5pc
+\evensidemargin 4.5pc
+\advance\oddsidemargin by -1in  % Correct for LaTeX gratuitousness
+\advance\evensidemargin by -1in % Correct for LaTeX gratuitousness
+\marginparwidth 0pt             % Margin pars are not allowed.
+\marginparsep 11pt              % Horizontal space between outer margin and
+                                % marginal note
+
+                                % Top of page:
+\topmargin 4.5pc                % Nominal distance from top of page to top of
+                                % box containing running head.
+\advance\topmargin by -1in      % Correct for LaTeX gratuitousness
+\headheight 0pt                 % Height of box containing running head.
+\headsep 0pt                    % Space between running head and text.
+                                % Bottom of page:
+\footskip 30pt                  % Distance from baseline of box containing foot
+                                % to baseline of last line of text.
+\@ifundefined{footheight}{\newdimen\footheight}{}% this is for LaTeX2e
+\footheight 12pt                % Height of box containing running foot.
+
+
+%% Must redefine the top margin so there's room for headers and
+%% page numbers if you are using the preprint option. Footers
+%% are OK as is. Olin.
+\advance\topmargin by -37pt     % Leave 37pt above text for headers
+\headheight 12pt                % Height of box containing running head.
+\headsep 25pt                   % Space between running head and text.
+
+\textheight 666pt       % 9 1/4 column height
+\textwidth 42pc         % Width of text line.
+                        % For two-column mode:
+\columnsep 2pc          %    Space between columns
+\columnseprule 0pt      %    Width of rule between columns.
+\hfuzz 1pt              % Allow some variation in column width, otherwise it's
+                        % too hard to typeset in narrow columns.
+
+\footnotesep 5.6pt      % Height of strut placed at the beginning of every
+                        % footnote =3D height of normal \footnotesize strut,
+                        % so no extra space between footnotes.
+
+\skip\footins 8.1pt plus 4pt minus 2pt  % Space between last line of text and
+                                        % top of first footnote.
+\floatsep 11pt plus 2pt minus 2pt       % Space between adjacent floats moved
+                                        % to top or bottom of text page.
+\textfloatsep 18pt plus 2pt minus 4pt   % Space between main text and floats
+                                        % at top or bottom of page.
+\intextsep 11pt plus 2pt minus 2pt      % Space between in-text figures and
+                                        % text.
+\@ifundefined{@maxsep}{\newdimen\@maxsep}{}% this is for LaTeX2e
+\@maxsep 18pt                           % The maximum of \floatsep,
+                                        % \textfloatsep and \intextsep (minus
+                                        % the stretch and shrink).
+\dblfloatsep 11pt plus 2pt minus 2pt    % Same as \floatsep for double-column
+                                        % figures in two-column mode.
+\dbltextfloatsep 18pt plus 2pt minus 4pt% \textfloatsep for double-column
+                                        % floats.
+\@ifundefined{@dblmaxsep}{\newdimen\@dblmaxsep}{}% this is for LaTeX2e
+\@dblmaxsep 18pt                        % The maximum of \dblfloatsep and
+                                        % \dbltexfloatsep.
+\@fptop 0pt plus 1fil    % Stretch at top of float page/column. (Must be
+                         % 0pt plus ...)
+\@fpsep 8pt plus 2fil    % Space between floats on float page/column.
+\@fpbot 0pt plus 1fil    % Stretch at bottom of float page/column. (Must be
+                         % 0pt plus ... )
+\@dblfptop 0pt plus 1fil % Stretch at top of float page. (Must be 0pt plus ...)
+\@dblfpsep 8pt plus 2fil % Space between floats on float page.
+\@dblfpbot 0pt plus 1fil % Stretch at bottom of float page. (Must be
+                         % 0pt plus ... )
+\marginparpush 5pt       % Minimum vertical separation between two marginal
+                         % notes.
+
+\parskip 0pt                % Extra vertical space between paragraphs.
+                    % Set to 0pt outside sections, to keep section heads
+                    % uniformly spaced.  The value of parskip is set
+                    % to leading value _within_ sections.
+                    % 12 Jan 2000 gkmt
+\parindent 0pt                % Width of paragraph indentation.
+\partopsep 2pt plus 1pt minus 1pt% Extra vertical space, in addition to
+                                 % \parskip and \topsep, added when user
+                                 % leaves blank line before environment.
+
+\@lowpenalty   51       % Produced by \nopagebreak[1] or \nolinebreak[1]
+\@medpenalty  151       % Produced by \nopagebreak[2] or \nolinebreak[2]
+\@highpenalty 301       % Produced by \nopagebreak[3] or \nolinebreak[3]
+
+\@beginparpenalty -\@lowpenalty % Before a list or paragraph environment.
+\@endparpenalty   -\@lowpenalty % After a list or paragraph environment.
+\@itempenalty     -\@lowpenalty % Between list items.
+
+\@namedef{ds at 10pt}{\@latexerr{The `10pt' option is not allowed in the `acmconf'
+  document style.}\@eha}
+\@namedef{ds at 11pt}{\@latexerr{The `11pt' option is not allowed in the `acmconf'
+  document style.}\@eha}
+\@namedef{ds at 12pt}{\@latexerr{The `12pt' option is not allowed in the `acmconf'
+  document style.}\@eha}
+
+\@options
+
+\lineskip 2pt           % \lineskip is 1pt for all font sizes.
+\normallineskip 2pt
+\def\baselinestretch{1}
+
+\abovedisplayskip 9pt plus2pt minus4.5pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip  \z@ plus3pt%
+\belowdisplayshortskip  5.4pt plus3pt minus3pt%
+\let\@listi\@listI     % Setting of \@listi added 9 Jun 87
+
+\def\small{\@setsize\small{9pt}\viiipt\@viiipt
+\abovedisplayskip 7.6pt plus 3pt minus 4pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip \z@ plus2pt%
+\belowdisplayshortskip 3.6pt plus2pt minus 2pt
+\def\@listi{\leftmargin\leftmargini %% Added 22 Dec 87
+\topsep 4pt plus 2pt minus 2pt\parsep 2pt plus 1pt minus 1pt
+\itemsep \parsep}}
+
+\def\footnotesize{\@setsize\footnotesize{9pt}\ixpt\@ixpt
+\abovedisplayskip 6.4pt plus 2pt minus 4pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip \z@ plus 1pt%
+\belowdisplayshortskip 2.7pt plus 1pt minus 2pt
+\def\@listi{\leftmargin\leftmargini %% Added 22 Dec 87
+\topsep 3pt plus 1pt minus 1pt\parsep 2pt plus 1pt minus 1pt
+\itemsep \parsep}}
+
+\newcount\aucount
+\newcount\originalaucount
+\newdimen\auwidth
+\auwidth=\textwidth
+\newdimen\auskip
+\newcount\auskipcount
+\newdimen\auskip
+\global\auskip=1pc
+\newdimen\allauboxes
+\allauboxes=\auwidth
+\newtoks\addauthors
+\newcount\addauflag
+\global\addauflag=0 %Haven't shown additional authors yet
+
+\newtoks\subtitletext
+\gdef\subtitle#1{\subtitletext={#1}}
+
+\gdef\additionalauthors#1{\addauthors={#1}}
+
+\gdef\numberofauthors#1{\global\aucount=#1
+\ifnum\aucount>3\global\originalaucount=\aucount \global\aucount=3\fi %g}
+\global\auskipcount=\aucount\global\advance\auskipcount by 1
+\global\multiply\auskipcount by 2
+\global\multiply\auskip by \auskipcount
+\global\advance\auwidth by -\auskip
+\global\divide\auwidth by \aucount}
+
+% \and was modified to count the number of authors.  GKMT 12 Aug 1999
+\def\alignauthor{%                  % \begin{tabular}
+\end{tabular}%
+  \begin{tabular}[t]{p{\auwidth}}\centering}%
+
+%  *** NOTE *** NOTE *** NOTE *** NOTE ***
+%  If you have 'font problems' then you may need
+%  to change these, e.g. 'arialb' instead of "arialbd".
+%  Gerry Murray 11/11/1999
+%  *** OR ** comment out block A and activate block B or vice versa.
+% **********************************************
+%
+%  -- Start of block A -- (Type 1 or Truetype fonts)
+%\newfont{\secfnt}{timesbd at 12pt} % was timenrb originally - now is timesbd
+%\newfont{\secit}{timesbi at 12pt}   %13 Jan 00 gkmt
+%\newfont{\subsecfnt}{timesi at 11pt} % was timenrri originally - now is timesi
+%\newfont{\subsecit}{timesbi at 11pt} % 13 Jan 00 gkmt -- was times changed to timesbi gm 2/4/2000
+%                         % because "normal" is italic, "italic" is Roman
+%\newfont{\ttlfnt}{arialbd at 18pt} % was arialb originally - now is arialbd
+%\newfont{\ttlit}{arialbi at 18pt}    % 13 Jan 00 gkmt
+%\newfont{\subttlfnt}{arial at 14pt} % was arialr originally - now is arial
+%\newfont{\subttlit}{ariali at 14pt} % 13 Jan 00 gkmt
+%\newfont{\subttlbf}{arialbd at 14pt}  % 13 Jan 00 gkmt
+%\newfont{\aufnt}{arial at 12pt} % was arialr originally - now is arial
+%\newfont{\auit}{ariali at 12pt} % 13 Jan 00 gkmt
+%\newfont{\affaddr}{arial at 10pt} % was arialr originally - now is arial
+%\newfont{\affaddrit}{ariali at 10pt} %13 Jan 00 gkmt
+%\newfont{\eaddfnt}{arial at 12pt} % was arialr originally - now is arial
+%\newfont{\ixpt}{times at 9pt} % was timenrr originally - now is times
+%\newfont{\confname}{timesi at 8pt} % was timenrri - now is timesi
+%\newfont{\crnotice}{times at 8pt} % was timenrr originally - now is times
+%\newfont{\ninept}{times at 9pt} % was timenrr originally - now is times
+
+% *********************************************
+%  -- End of block A --
+%
+%
+% -- Start of block B -- METAFONT
+% +++++++++++++++++++++++++++++++++++++++++++++
+% Next (default) block for those using Metafont
+% Gerry Murray 11/11/1999
+% *** THIS BLOCK FOR THOSE USING METAFONT *****
+% *********************************************
+\newfont{\secfnt}{ptmb at 12pt}
+\newfont{\secit}{ptmbi at 12pt}    %13 Jan 00 gkmt
+\newfont{\subsecfnt}{ptmri at 11pt}
+\newfont{\subsecit}{ptmbi at 11pt}  % 13 Jan 00 gkmt -- was ptmr changed to ptmbi gm 2/4/2000
+                         % because "normal" is italic, "italic" is Roman
+\newfont{\ttlfnt}{phvb at 18pt}
+\newfont{\ttlit}{phvbo at 18pt}    % GM 2/4/2000
+\newfont{\subttlfnt}{phvr at 14pt}
+\newfont{\subttlit}{phvro at 14pt} % GM 2/4/2000
+\newfont{\subttlbf}{phvb at 14pt}  % 13 Jan 00 gkmt
+\newfont{\aufnt}{phvr at 12pt}
+\newfont{\auit}{phvro at 12pt}     % GM 2/4/2000
+\newfont{\affaddr}{phvr at 10pt}
+\newfont{\affaddrit}{phvro at 10pt} % GM 2/4/2000
+\newfont{\eaddfnt}{phvr at 12pt}
+\newfont{\ixpt}{ptmr at 9pt}
+\newfont{\confname}{ptmri at 8pt}
+\newfont{\crnotice}{ptmr at 8pt}
+\newfont{\ninept}{ptmr at 9pt}
+% +++++++++++++++++++++++++++++++++++++++++++++
+% -- End of block B --
+
+\def\email#1{{{\eaddfnt{\vskip 4pt#1}}}}
+
+\def\addauthorsection{\ifnum\originalaucount>3
+    \section{Additional Authors}\the\addauthors
+  \fi}
+
+\newcount\savesection
+\newcount\sectioncntr
+\global\sectioncntr=1
+
+\setcounter{secnumdepth}{3}
+
+\def\appendix{\par
+\section*{APPENDIX}
+\setcounter{section}{0}
+ \setcounter{subsection}{0}
+ \def\thesection{\Alph{section}} }
+
+
+\leftmargini 22.5pt
+\leftmarginii 19.8pt    % > \labelsep + width of '(m)'
+\leftmarginiii 16.8pt   % > \labelsep + width of 'vii.'
+\leftmarginiv 15.3pt    % > \labelsep + width of 'M.'
+\leftmarginv 9pt
+\leftmarginvi 9pt
+
+\leftmargin\leftmargini
+\labelsep 4.5pt
+\labelwidth\leftmargini\advance\labelwidth-\labelsep
+
+\def\@listI{\leftmargin\leftmargini \parsep 3.6pt plus 2pt minus 1pt%
+\topsep 7.2pt plus 2pt minus 4pt%
+\itemsep 3.6pt plus 2pt minus 1pt}
+
+\let\@listi\@listI
+\@listi
+
+\def\@listii{\leftmargin\leftmarginii
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
+   \topsep 3.6pt plus 2pt minus 1pt
+   \parsep 1.8pt plus 0.9pt minus 0.9pt
+   \itemsep \parsep}
+
+\def\@listiii{\leftmargin\leftmarginiii
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
+    \topsep 1.8pt plus 0.9pt minus 0.9pt
+    \parsep \z@ \partopsep 1pt plus 0pt minus 1pt
+    \itemsep \topsep}
+
+\def\@listiv{\leftmargin\leftmarginiv
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
+
+\def\@listv{\leftmargin\leftmarginv
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
+
+\def\@listvi{\leftmargin\leftmarginvi
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
+
+\def\labelenumi{\theenumi.}
+\def\theenumi{\arabic{enumi}}
+
+\def\labelenumii{(\theenumii)}
+\def\theenumii{\alph{enumii}}
+\def\p at enumii{\theenumi}
+
+\def\labelenumiii{\theenumiii.}
+\def\theenumiii{\roman{enumiii}}
+\def\p at enumiii{\theenumi(\theenumii)}
+
+\def\labelenumiv{\theenumiv.}
+\def\theenumiv{\Alph{enumiv}}
+\def\p at enumiv{\p at enumiii\theenumiii}
+
+\def\labelitemi{$\bullet$}
+\def\labelitemii{\bf --}
+\def\labelitemiii{$\ast$}
+\def\labelitemiv{$\cdot$}
+
+\def\verse{\let\\=\@centercr
+  \list{}{\itemsep\z@ \itemindent -1.5em\listparindent \itemindent
+          \rightmargin\leftmargin\advance\leftmargin 1.5em}\item[]}
+\let\endverse\endlist
+
+\def\quotation{\list{}{\listparindent 1.5em
+    \itemindent\listparindent
+    \rightmargin\leftmargin \parsep 0pt plus 1pt}\item[]}
+\let\endquotation=\endlist
+
+\def\quote{\list{}{\rightmargin\leftmargin}\item[]}
+\let\endquote=\endlist
+
+\def\descriptionlabel#1{\hspace\labelsep \bf #1}
+\def\description{\list{}{\labelwidth\z@ \itemindent-\leftmargin
+       \let\makelabel\descriptionlabel}}
+
+\let\enddescription\endlist
+
+\def\theequation{\arabic{equation}}
+
+\arraycolsep 4.5pt   % Half the space between columns in an array environment.
+\tabcolsep 5.4pt     % Half the space between columns in a tabular environment.
+\arrayrulewidth .4pt % Width of rules in array and tabular environment.
+\doublerulesep 1.8pt % Space between adjacent rules in array or tabular env.
+
+\tabbingsep \labelsep   % Space used by the \' command.  (See LaTeX manual.)
+
+\skip\@mpfootins =\skip\footins
+
+\fboxsep =2.7pt      % Space left between box and text by \fbox and \framebox.
+\fboxrule =.4pt      % Width of rules in box made by \fbox and \framebox.
+
+\def\thepart{\Roman{part}} % Roman numeral part numbers.
+\def\thesection       {\arabic{section}}
+\def\thesubsection    {\thesection.\arabic{subsection}}
+%\def\thesubsubsection {\thesubsection.\arabic{subsubsection}} % GM 7/30/2002
+%\def\theparagraph     {\thesubsubsection.\arabic{paragraph}}  % GM 7/30/2002
+\def\thesubparagraph  {\theparagraph.\arabic{subparagraph}}
+
+\def\@pnumwidth{1.55em}
+\def\@tocrmarg {2.55em}
+\def\@dotsep{4.5}
+\setcounter{tocdepth}{3}
+
+\def\tableofcontents{\@latexerr{\tableofcontents: Tables of contents are not
+  allowed in the `acmconf' document style.}\@eha}
+
+\def\l at part#1#2{\addpenalty{\@secpenalty}
+   \addvspace{2.25em plus 1pt}  % space above part line
+   \begingroup
+   \@tempdima 3em       % width of box holding part number, used by
+     \parindent \z@ \rightskip \@pnumwidth      %% \numberline
+     \parfillskip -\@pnumwidth
+     {\large \bf        % set line in \large boldface
+     \leavevmode        % TeX command to enter horizontal mode.
+     #1\hfil \hbox to\@pnumwidth{\hss #2}}\par
+     \nobreak           % Never break after part entry
+   \endgroup}
+
+\def\l at section#1#2{\addpenalty{\@secpenalty} % good place for page break
+   \addvspace{1.0em plus 1pt}   % space above toc entry
+   \@tempdima 1.5em             % width of box holding section number
+   \begingroup
+     \parindent \z@ \rightskip \@pnumwidth
+     \parfillskip -\@pnumwidth
+     \bf                        % Boldface.
+     \leavevmode                % TeX command to enter horizontal mode.
+      \advance\leftskip\@tempdima %% added 5 Feb 88 to conform to
+      \hskip -\leftskip           %% 25 Jan 88 change to \numberline
+     #1\nobreak\hfil \nobreak\hbox to\@pnumwidth{\hss #2}\par
+   \endgroup}
+
+
+\def\l at subsection{\@dottedtocline{2}{1.5em}{2.3em}}
+\def\l at subsubsection{\@dottedtocline{3}{3.8em}{3.2em}}
+\def\l at paragraph{\@dottedtocline{4}{7.0em}{4.1em}}
+\def\l at subparagraph{\@dottedtocline{5}{10em}{5em}}
+
+\def\listoffigures{\@latexerr{\listoffigures: Lists of figures are not
+  allowed in the `acmconf' document style.}\@eha}
+
+\def\l at figure{\@dottedtocline{1}{1.5em}{2.3em}}
+
+\def\listoftables{\@latexerr{\listoftables: Lists of tables are not
+  allowed in the `acmconf' document style.}\@eha}
+\let\l at table\l at figure
+
+\def\footnoterule{\kern-3\p@
+  \hrule width .4\columnwidth
+  \kern 2.6\p@}                 % The \hrule has default height of .4pt .
+% ------
+\long\def\@makefntext#1{\noindent 
+%\hbox to .5em{\hss$^{\@thefnmark}$}#1}   % original
+\hbox to .5em{\hss\textsuperscript{\@thefnmark}}#1}  % C. Clifton / GM Oct. 2nd. 2002
+% -------
+
+\long\def\@maketntext#1{\noindent
+#1}
+
+\long\def\@maketitlenotetext#1#2{\noindent
+            \hbox to 1.8em{\hss$^{#1}$}#2}
+
+\setcounter{topnumber}{2}
+\def\topfraction{.7}
+\setcounter{bottomnumber}{1}
+\def\bottomfraction{.3}
+\setcounter{totalnumber}{3}
+\def\textfraction{.2}
+\def\floatpagefraction{.5}
+\setcounter{dbltopnumber}{2}
+\def\dbltopfraction{.7}
+\def\dblfloatpagefraction{.5}
+
+\long\def\@makecaption#1#2{
+   \vskip \baselineskip
+   \setbox\@tempboxa\hbox{\textbf{#1: #2}}
+   \ifdim \wd\@tempboxa >\hsize % IF longer than one line:
+       \textbf{#1: #2}\par               %   THEN set as ordinary paragraph.
+     \else                      %   ELSE  center.
+       \hbox to\hsize{\hfil\box\@tempboxa\hfil}\par
+   \fi}
+
+\@ifundefined{figure}{\newcounter {figure}} % this is for LaTeX2e
+
+\def\fps at figure{tbp}
+\def\ftype at figure{1}
+\def\ext at figure{lof}
+\def\fnum at figure{Figure \thefigure}
+\def\figure{\@float{figure}}
+\let\endfigure\end at float
+\@namedef{figure*}{\@dblfloat{figure}}
+\@namedef{endfigure*}{\end at dblfloat}
+
+\@ifundefined{table}{\newcounter {table}} % this is for LaTeX2e
+
+\def\fps at table{tbp}
+\def\ftype at table{2}
+\def\ext at table{lot}
+\def\fnum at table{Table \thetable}
+\def\table{\@float{table}}
+\let\endtable\end at float
+\@namedef{table*}{\@dblfloat{table}}
+\@namedef{endtable*}{\end at dblfloat}
+
+\newtoks\titleboxnotes
+\newcount\titleboxnoteflag
+
+\def\maketitle{\par
+ \begingroup
+   \def\thefootnote{\fnsymbol{footnote}}
+   \def\@makefnmark{\hbox
+       to 0pt{$^{\@thefnmark}$\hss}}
+     \twocolumn[\@maketitle]
+\@thanks
+ \endgroup
+ \setcounter{footnote}{0}
+ \let\maketitle\relax
+ \let\@maketitle\relax
+ \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\gdef\@subtitle{}\let\thanks\relax
+ \@copyrightspace}
+
+%% CHANGES ON NEXT LINES
+\newif\if at ll % to record which version of LaTeX is in use
+
+\expandafter\ifx\csname LaTeXe\endcsname\relax % LaTeX2.09 is used
+\else% LaTeX2e is used, so set ll to true
+\global\@lltrue
+\fi
+
+\if at ll
+  \NeedsTeXFormat{LaTeX2e}
+  \ProvidesClass{acm_proc_article-sp} [2004/15/10 - V2.7SP - based on esub2acm.sty <23 April 96>]
+  \RequirePackage{latexsym}% QUERY: are these two really needed?
+  \let\dooptions\ProcessOptions
+\else
+  \let\dooptions\@options
+\fi
+%% END CHANGES
+
+\def\@height{height}
+\def\@width{width}
+\def\@minus{minus}
+\def\@plus{plus}
+\def\hb at xt@{\hbox to}
+\newif\if at faircopy
+\@faircopyfalse
+\def\ds at faircopy{\@faircopytrue}
+
+\def\ds at preprint{\@faircopyfalse}
+
+\@twosidetrue
+\@mparswitchtrue
+\def\ds at draft{\overfullrule 5\p@}
+%% CHANGE ON NEXT LINE
+\dooptions
+
+\lineskip \p@
+\normallineskip \p@
+\def\baselinestretch{1}
+\def\@ptsize{0} %needed for amssymbols.sty
+
+%% CHANGES ON NEXT LINES
+\if at ll% allow use of old-style font change commands in LaTeX2e
+\@maxdepth\maxdepth
+%
+\DeclareOldFontCommand{\rm}{\ninept\rmfamily}{\mathrm}
+\DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf}
+\DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt}
+\DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf}
+\DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit}
+\DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl}
+\DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc}
+\DeclareRobustCommand*{\cal}{\@fontswitch{\relax}{\mathcal}}
+\DeclareRobustCommand*{\mit}{\@fontswitch{\relax}{\mathnormal}}
+\fi
+%
+\if at ll
+ \renewcommand{\rmdefault}{cmr}  % was 'ttm'
+% Note! I have also found 'mvr' to work ESPECIALLY well.
+% Gerry - October 1999
+% You may need to change your LV1times.fd file so that sc is
+% mapped to cmcsc - -for smallcaps -- that is if you decide
+% to change {cmr} to {times} above. (Not recommended)
+  \renewcommand{\@ptsize}{}
+  \renewcommand{\normalsize}{%
+    \@setfontsize\normalsize\@ixpt{10.5\p@}%\ninept%
+    \abovedisplayskip 6\p@ \@plus2\p@ \@minus\p@
+    \belowdisplayskip \abovedisplayskip
+    \abovedisplayshortskip 6\p@ \@minus 3\p@
+    \belowdisplayshortskip 6\p@ \@minus 3\p@
+    \let\@listi\@listI
+  }
+\else
+  \def\@normalsize{%changed next to 9 from 10
+    \@setsize\normalsize{9\p@}\ixpt\@ixpt
+   \abovedisplayskip 6\p@ \@plus2\p@ \@minus\p@
+    \belowdisplayskip \abovedisplayskip
+    \abovedisplayshortskip 6\p@ \@minus 3\p@
+    \belowdisplayshortskip 6\p@ \@minus 3\p@
+    \let\@listi\@listI
+  }%
+\fi
+\if at ll
+  \newcommand\scriptsize{\@setfontsize\scriptsize\@viipt{8\p@}}
+  \newcommand\tiny{\@setfontsize\tiny\@vpt{6\p@}}
+  \newcommand\large{\@setfontsize\large\@xiipt{14\p@}}
+  \newcommand\Large{\@setfontsize\Large\@xivpt{18\p@}}
+  \newcommand\LARGE{\@setfontsize\LARGE\@xviipt{20\p@}}
+  \newcommand\huge{\@setfontsize\huge\@xxpt{25\p@}}
+  \newcommand\Huge{\@setfontsize\Huge\@xxvpt{30\p@}}
+\else
+  \def\scriptsize{\@setsize\scriptsize{8\p@}\viipt\@viipt}
+  \def\tiny{\@setsize\tiny{6\p@}\vpt\@vpt}
+  \def\large{\@setsize\large{14\p@}\xiipt\@xiipt}
+  \def\Large{\@setsize\Large{18\p@}\xivpt\@xivpt}
+  \def\LARGE{\@setsize\LARGE{20\p@}\xviipt\@xviipt}
+  \def\huge{\@setsize\huge{25\p@}\xxpt\@xxpt}
+  \def\Huge{\@setsize\Huge{30\p@}\xxvpt\@xxvpt}
+\fi
+\normalsize
+
+% make aubox hsize/number of authors up to 3, less gutter
+% then showbox gutter showbox gutter showbox -- GKMT Aug 99
+\newbox\@acmtitlebox
+\def\@maketitle{\newpage
+ \null
+ \setbox\@acmtitlebox\vbox{%
+\baselineskip 20pt
+\vskip 2em                   % Vertical space above title.
+   \begin{center}
+    {\ttlfnt \@title\par}       % Title set in 18pt Helvetica (Arial) bold size.
+    \vskip 1.5em                % Vertical space after title.
+%This should be the subtitle.
+{\subttlfnt \the\subtitletext\par}\vskip 1.25em%\fi
+    {\baselineskip 16pt\aufnt   % each author set in \12 pt Arial, in a
+     \lineskip .5em             % tabular environment
+     \begin{tabular}[t]{c}\@author
+     \end{tabular}\par}
+    \vskip 1.5em               % Vertical space after author.
+   \end{center}}
+ \dimen0=\ht\@acmtitlebox
+ \advance\dimen0 by -12.75pc\relax % Increased space for title box -- KBT
+ \unvbox\@acmtitlebox
+ \ifdim\dimen0<0.0pt\relax\vskip-\dimen0\fi}
+
+
+\newcount\titlenotecount
+\global\titlenotecount=0
+\newtoks\tntoks
+\newtoks\tntokstwo
+\newtoks\tntoksthree
+\newtoks\tntoksfour
+\newtoks\tntoksfive
+
+\def\abstract{
+\ifnum\titlenotecount>0 % was =1
+    \insert\footins{%
+    \reset at font\footnotesize
+        \interlinepenalty\interfootnotelinepenalty
+        \splittopskip\footnotesep
+        \splitmaxdepth \dp\strutbox \floatingpenalty \@MM
+        \hsize\columnwidth \@parboxrestore
+        \protected at edef\@currentlabel{%
+        }%
+        \color at begingroup
+\ifnum\titlenotecount=1
+      \@maketntext{%
+         \raisebox{4pt}{$\ast$}\rule\z@\footnotesep\ignorespaces\the\tntoks\@finalstrut\strutbox}%
+\fi
+\ifnum\titlenotecount=2
+      \@maketntext{%
+      \raisebox{4pt}{$\ast$}\rule\z@\footnotesep\ignorespaces\the\tntoks\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\dagger$}\rule\z@\footnotesep\ignorespaces\the\tntokstwo\@finalstrut\strutbox}%
+\fi
+\ifnum\titlenotecount=3
+      \@maketntext{%
+         \raisebox{4pt}{$\ast$}\rule\z@\footnotesep\ignorespaces\the\tntoks\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\dagger$}\rule\z@\footnotesep\ignorespaces\the\tntokstwo\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\ddagger$}\rule\z@\footnotesep\ignorespaces\the\tntoksthree\@finalstrut\strutbox}%
+\fi
+\ifnum\titlenotecount=4
+      \@maketntext{%
+         \raisebox{4pt}{$\ast$}\rule\z@\footnotesep\ignorespaces\the\tntoks\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\dagger$}\rule\z@\footnotesep\ignorespaces\the\tntokstwo\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\ddagger$}\rule\z@\footnotesep\ignorespaces\the\tntoksthree\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\S$}\rule\z@\footnotesep\ignorespaces\the\tntoksfour\@finalstrut\strutbox}%
+\fi
+\ifnum\titlenotecount=5
+      \@maketntext{%
+         \raisebox{4pt}{$\ast$}\rule\z@\footnotesep\ignorespaces\the\tntoks\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\dagger$}\rule\z@\footnotesep\ignorespaces\the\tntokstwo\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\ddagger$}\rule\z@\footnotesep\ignorespaces\the\tntoksthree\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\S$}\rule\z@\footnotesep\ignorespaces\the\tntoksfour\par\@finalstrut\strutbox}%
+\@maketntext{%
+         \raisebox{4pt}{$\P$}\rule\z@\footnotesep\ignorespaces\the\tntoksfive\@finalstrut\strutbox}%
+\fi
+   \color at endgroup} %g}
+\fi
+\setcounter{footnote}{0}
+\section*{ABSTRACT}\normalsize %\the\parskip \the\baselineskip%\ninept
+}
+
+\def\endabstract{\if at twocolumn\else\endquotation\fi}
+
+\def\keywords{\if at twocolumn
+\section*{Keywords}
+\else \small
+\quotation
+\fi}
+
+% I've pulled the check for 2 cols, since proceedings are _always_
+% two-column  11 Jan 2000 gkmt
+\def\terms{%\if at twocolumn
+\section*{General Terms}
+%\else \small
+%\quotation\the\parskip
+%\fi}
+}
+
+% -- Classification needs to be a bit smart due to optionals - Gerry/Georgia November 2nd. 1999
+\newcount\catcount
+\global\catcount=1
+
+\def\category#1#2#3{%
+\ifnum\catcount=1
+\section*{Categories and Subject Descriptors}
+\advance\catcount by 1\else{\unskip; }\fi
+    \@ifnextchar [{\@category{#1}{#2}{#3}}{\@category{#1}{#2}{#3}[]}%
+}
+
+\def\@category#1#2#3[#4]{%
+    \begingroup
+        \let\and\relax
+            #1 [\textbf{#2}]%
+            \if!#4!%
+                \if!#3!\else : #3\fi
+            \else
+                :\space
+                \if!#3!\else #3\kern\z at ---\hskip\z@\fi
+                \textit{#4}%
+            \fi
+    \endgroup
+}
+%
+
+%%% This section (written by KBT) handles the 1" box in the lower left
+%%% corner of the left column of the first page by creating a picture,
+%%% and inserting the predefined string at the bottom (with a negative
+%%% displacement to offset the space allocated for a non-existent
+%%% caption).
+%%%
+\newtoks\copyrightnotice
+\def\ftype at copyrightbox{8}
+\def\@copyrightspace{
+\@float{copyrightbox}[b]
+\begin{center}
+\setlength{\unitlength}{1pc}
+\begin{picture}(20,6) %Space for copyright notice
+\put(0,-0.95){\crnotice{\@toappear}}
+\end{picture}
+\end{center}
+\end at float}
+
+\def\@toappear{} % Default setting blank - commands below change this.
+\long\def\toappear#1{\def\@toappear{\parbox[b]{20pc}{\baselineskip 9pt#1}}}
+\def\toappearbox#1{\def\@toappear{\raisebox{5pt}{\framebox[20pc]{\parbox[b]{19pc}{#1}}}}}
+
+\newtoks\conf
+\newtoks\confinfo
+\def\conferenceinfo#1#2{\global\conf={#1}\global\confinfo{#2}}
+
+
+\def\marginpar{\@latexerr{The \marginpar command is not allowed in the
+  `acmconf' document style.}\@eha}
+
+\mark{{}{}}     % Initializes TeX's marks
+
+\def\today{\ifcase\month\or
+  January\or February\or March\or April\or May\or June\or
+  July\or August\or September\or October\or November\or December\fi
+  \space\number\day, \number\year}
+
+\def\@begintheorem#1#2{%
+    \trivlist
+    \item[%
+        \hskip 10\p@
+        \hskip \labelsep
+        {{\sc #1}\hskip 5\p@\relax#2.}%
+    ]
+    \it
+}
+\def\@opargbegintheorem#1#2#3{%
+    \trivlist
+    \item[%
+        \hskip 10\p@
+        \hskip \labelsep
+        {\sc #1\ #2\             % This mod by Gerry to enumerate corollaries
+   \setbox\@tempboxa\hbox{(#3)}  % and bracket the 'corollary title'
+        \ifdim \wd\@tempboxa>\z@ % and retain the correct numbering of e.g. theorems
+            \hskip 5\p@\relax    % if they occur 'around' said corollaries.
+            \box\@tempboxa       % Gerry - Nov. 1999.
+        \fi.}%
+    ]
+    \it
+}
+\newif\if at qeded
+\global\@qededfalse
+
+% -- original
+%\def\proof{%
+%  \vspace{-\parskip} % GM July 2000 (for tighter spacing)
+%    \global\@qededfalse
+%    \@ifnextchar[{\@xproof}{\@proof}%
+%}
+% -- end of original
+
+% (JSS) Fix for vertical spacing bug - Gerry Murray July 30th. 2002
+\def\proof{%
+\vspace{-\lastskip}\vspace{-\parsep}\penalty-51%
+\global\@qededfalse
+\@ifnextchar[{\@xproof}{\@proof}%
+}
+
+\def\endproof{%
+    \if at qeded\else\qed\fi
+    \endtrivlist
+}
+\def\@proof{%
+    \trivlist
+    \item[%
+        \hskip 10\p@
+        \hskip \labelsep
+        {\sc Proof.}%
+    ]
+    \ignorespaces
+}
+\def\@xproof[#1]{%
+    \trivlist
+    \item[\hskip 10\p@\hskip \labelsep{\sc Proof #1.}]%
+    \ignorespaces
+}
+\def\qed{%
+    \unskip
+    \kern 10\p@
+    \begingroup
+        \unitlength\p@
+        \linethickness{.4\p@}%
+        \framebox(6,6){}%
+    \endgroup
+    \global\@qededtrue
+}
+
+\def\newdef#1#2{%
+    \expandafter\@ifdefinable\csname #1\endcsname
+        {\@definecounter{#1}%
+         \expandafter\xdef\csname the#1\endcsname{\@thmcounter{#1}}%
+         \global\@namedef{#1}{\@defthm{#1}{#2}}%
+         \global\@namedef{end#1}{\@endtheorem}%
+    }%
+}
+\def\@defthm#1#2{%
+    \refstepcounter{#1}%
+    \@ifnextchar[{\@ydefthm{#1}{#2}}{\@xdefthm{#1}{#2}}%
+}
+\def\@xdefthm#1#2{%
+    \@begindef{#2}{\csname the#1\endcsname}%
+    \ignorespaces
+}
+\def\@ydefthm#1#2[#3]{%
+    \trivlist
+    \item[%
+        \hskip 10\p@
+        \hskip \labelsep
+        {\it #2%
+         \savebox\@tempboxa{#3}%
+         \ifdim \wd\@tempboxa>\z@
+            \ \box\@tempboxa
+         \fi.%
+        }]%
+    \ignorespaces
+}
+\def\@begindef#1#2{%
+    \trivlist
+    \item[%
+        \hskip 10\p@
+        \hskip \labelsep
+        {\it #1\ \rm #2.}%
+    ]%
+}
+\def\theequation{\arabic{equation}}
+
+\newcounter{part}
+\newcounter{section}
+\newcounter{subsection}[section]
+\newcounter{subsubsection}[subsection]
+\newcounter{paragraph}[subsubsection]
+\def\thepart{\Roman{part}}
+\def\thesection{\arabic{section}}
+\def\thesubsection{\thesection.\arabic{subsection}}
+\def\thesubsubsection{\thesubsection.\arabic{subsubsection}} %removed \subsecfnt 29 July 2002 gkmt
+\def\theparagraph{\thesubsubsection.\arabic{paragraph}} %removed \subsecfnt 29 July 2002 gkmt
+
+\newif\if at uchead
+\@ucheadfalse
+
+%% CHANGES: NEW NOTE
+%% NOTE: OK to use old-style font commands below, since they were
+%% suitably redefined for LaTeX2e
+%% END CHANGES
+\setcounter{secnumdepth}{3}
+\def\part{%
+    \@startsection{part}{9}{\z@}{-10\p@ \@plus -4\p@ \@minus -2\p@}
+        {4\p@}{\normalsize\@ucheadtrue}%
+}
+
+% Rationale for changes made in next four definitions:
+% "Before skip" is made elastic to provide some give in setting columns (vs.
+% parskip, which is non-elastic to keep section headers "anchored" to their
+% subsequent text.
+%
+% "After skip" is minimized -- BUT setting it to 0pt resulted in run-in heads, despite
+% the documentation asserted only after-skip < 0pt would have result.
+%
+% Baselineskip added to style to ensure multi-line section titles, and section heads
+% followed by another section head rather than text, are decently spaced vertically.
+% 12 Jan 2000 gkmt
+\def\section{%
+    \@startsection{section}{1}{\z@}{-10\p@ \@plus -4\p@ \@minus -2\p@}%
+    {0.5pt}{\baselineskip=14pt\secfnt\@ucheadtrue}%
+}
+
+\def\subsection{%
+    \@startsection{subsection}{2}{\z@}{-10\p@ \@plus -4\p@ \@minus -2\p@}
+    {0.5pt}{\baselineskip=14pt\secfnt}%
+}
+\def\subsubsection{%
+    \@startsection{subsubsection}{3}{\z@}{-10\p@ \@plus -4\p@ \@minus -2\p@}%
+    {0.5pt}{\baselineskip=14pt\subsecfnt}%
+}
+
+\def\paragraph{%
+    \@startsection{paragraph}{3}{\z@}{-10\p@ \@plus -4\p@ \@minus -2\p@}%
+    {0.5pt}{\baselineskip=14pt\subsecfnt}%
+}
+
+\let\@period=.
+\def\@startsection#1#2#3#4#5#6{%
+        \if at noskipsec  %gkmt, 11 aug 99
+        \global\let\@period\@empty
+        \leavevmode
+        \global\let\@period.%
+    \fi
+    \par
+    \@tempskipa #4\relax
+    \@afterindenttrue
+    \ifdim \@tempskipa <\z@
+        \@tempskipa -\@tempskipa
+        \@afterindentfalse
+    \fi
+    %\if at nobreak  11 Jan 00 gkmt
+        %\everypar{}
+    %\else
+        \addpenalty\@secpenalty
+        \addvspace\@tempskipa
+    %\fi
+    \parskip=0pt
+    \@ifstar
+        {\@ssect{#3}{#4}{#5}{#6}}
+        {\@dblarg{\@sect{#1}{#2}{#3}{#4}{#5}{#6}}}%
+}
+
+
+\def\@ssect#1#2#3#4#5{%
+  \@tempskipa #3\relax
+  \ifdim \@tempskipa>\z@
+    \begingroup
+      #4{%
+        \@hangfrom{\hskip #1}%
+          \interlinepenalty \@M #5\@@par}%
+    \endgroup
+  \else
+    \def\@svsechd{#4{\hskip #1\relax #5}}%
+  \fi
+  \vskip -10.5pt  %gkmt, 7 jan 00 -- had been -14pt, now set to parskip
+  \@xsect{#3}\parskip=10.5pt} % within the starred section, parskip = leading 12 Jan 2000 gkmt
+
+
+\def\@sect#1#2#3#4#5#6[#7]#8{%
+    \ifnum #2>\c at secnumdepth
+        \let\@svsec\@empty
+    \else
+        \refstepcounter{#1}%
+        \edef\@svsec{%
+            \begingroup
+                %\ifnum#2>2 \noexpand\rm \fi % changed to next 29 July 2002 gkmt
+            \ifnum#2>2 \noexpand#6 \fi
+                \csname the#1\endcsname
+            \endgroup
+            \ifnum #2=1\relax .\fi
+            \hskip 1em
+        }%
+    \fi
+    \@tempskipa #5\relax
+    \ifdim \@tempskipa>\z@
+        \begingroup
+            #6\relax
+            \@hangfrom{\hskip #3\relax\@svsec}%
+            \begingroup
+                \interlinepenalty \@M
+                \if at uchead
+                    \uppercase{#8}%
+                \else
+                    #8%
+                \fi
+                \par
+            \endgroup
+        \endgroup
+        \csname #1mark\endcsname{#7}%
+        \vskip -10.5pt  % -14pt gkmt, 11 aug 99 -- changed to -\parskip 11 Jan 2000
+      \addcontentsline{toc}{#1}{%
+            \ifnum #2>\c at secnumdepth \else
+                \protect\numberline{\csname the#1\endcsname}%
+            \fi
+            #7%
+        }%
+    \else
+        \def\@svsechd{%
+            #6%
+            \hskip #3\relax
+            \@svsec
+            \if at uchead
+                \uppercase{#8}%
+            \else
+                #8%
+            \fi
+            \csname #1mark\endcsname{#7}%
+            \addcontentsline{toc}{#1}{%
+                \ifnum #2>\c at secnumdepth \else
+                    \protect\numberline{\csname the#1\endcsname}%
+                \fi
+                #7%
+            }%
+        }%
+    \fi
+    \@xsect{#5}\parskip=10.5pt% within the section, parskip = leading 12 Jan 2000 gkmt
+}
+\def\@xsect#1{%
+    \@tempskipa #1\relax
+    \ifdim \@tempskipa>\z@
+        \par
+        \nobreak
+        \vskip \@tempskipa
+        \@afterheading
+    \else
+        \global\@nobreakfalse
+        \global\@noskipsectrue
+        \everypar{%
+            \if at noskipsec
+                \global\@noskipsecfalse
+                \clubpenalty\@M
+                \hskip -\parindent
+                \begingroup
+                    \@svsechd
+                    \@period
+                \endgroup
+                \unskip
+                \@tempskipa #1\relax
+                \hskip -\@tempskipa
+            \else
+                \clubpenalty \@clubpenalty
+                \everypar{}%
+            \fi
+        }%
+    \fi
+    \ignorespaces
+}
+
+\def\@trivlist{%
+    \@topsepadd\topsep
+    \if at noskipsec
+        \global\let\@period\@empty
+        \leavevmode
+        \global\let\@period.%
+    \fi
+    \ifvmode
+        \advance\@topsepadd\partopsep
+    \else
+        \unskip
+        \par
+    \fi
+    \if at inlabel
+        \@noparitemtrue
+        \@noparlisttrue
+    \else
+        \@noparlistfalse
+        \@topsep\@topsepadd
+    \fi
+    \advance\@topsep \parskip
+    \leftskip\z at skip
+    \rightskip\@rightskip
+    \parfillskip\@flushglue
+    \@setpar{\if at newlist\else{\@@par}\fi}
+    \global\@newlisttrue
+    \@outerparskip\parskip
+}
+
+%%% Actually, 'abbrev' works just fine as the default - Gerry Feb. 2000
+%%% Bibliography style.
+
+\parindent 0pt
+\typeout{Using 'Abbrev' bibliography style}
+\newcommand\bibyear[2]{%
+    \unskip\quad\ignorespaces#1\unskip
+    \if#2..\quad \else \quad#2 \fi
+}
+\newcommand{\bibemph}[1]{{\em#1}}
+\newcommand{\bibemphic}[1]{{\em#1\/}}
+\newcommand{\bibsc}[1]{{\sc#1}}
+\def\@normalcite{%
+    \def\@cite##1##2{[##1\if at tempswa , ##2\fi]}%
+}
+\def\@citeNB{%
+    \def\@cite##1##2{##1\if at tempswa , ##2\fi}%
+}
+\def\@citeRB{%
+    \def\@cite##1##2{##1\if at tempswa , ##2\fi]}%
+}
+\def\start at cite#1#2{%
+    \edef\citeauthoryear##1##2##3{%
+        ###1%
+        \ifnum#2=\z@ \else\ ###2\fi
+    }%
+    \ifnum#1=\thr@@
+        \let\@@cite\@citeyear
+    \else
+        \let\@@cite\@citenormal
+    \fi
+    \@ifstar{\@citeNB\@@cite}{\@normalcite\@@cite}%
+}
+\def\cite{\start at cite23}
+\def\citeNP{\cite*}
+\def\citeA{\start at cite10}
+\def\citeANP{\citeA*}
+\def\shortcite{\start at cite23}
+\def\shortciteNP{\shortcite*}
+\def\shortciteA{\start at cite20}
+\def\shortciteANP{\shortciteA*}
+\def\citeyear{\start at cite30}
+\def\citeyearNP{\citeyear*}
+\def\citeN{%
+    \@citeRB
+    \def\citeauthoryear##1##2##3{##1\ [##3%
+        \def\reserved at a{##1}%
+        \def\citeauthoryear####1####2####3{%
+            \def\reserved at b{####1}%
+            \ifx\reserved at a\reserved at b
+                ####3%
+            \else
+                \errmessage{Package acmart Error: author mismatch
+                         in \string\citeN^^J^^J%
+                    See the acmart package documentation for explanation}%
+            \fi
+        }%
+    }%
+    \@ifstar\@citeyear\@citeyear
+}
+\def\shortciteN{%
+    \@citeRB
+    \def\citeauthoryear##1##2##3{##2\ [##3%
+        \def\reserved at a{##2}%
+        \def\citeauthoryear####1####2####3{%
+            \def\reserved at b{####2}%
+            \ifx\reserved at a\reserved at b
+                ####3%
+            \else
+                \errmessage{Package acmart Error: author mismatch
+                         in \string\shortciteN^^J^^J%
+                    See the acmart package documentation for explanation}%
+            \fi
+        }%
+    }%
+    \@ifstar\@citeyear\@citeyear % changed from  "\@ifstart" 12 Jan 2000 gkmt
+}
+
+\def\@citenormal{%
+    \@ifnextchar [{\@tempswatrue\@citex;}
+                  {\@tempswafalse\@citex,[]}% was ; Gerry 2/24/00
+}
+\def\@citeyear{%
+    \@ifnextchar [{\@tempswatrue\@citex,}%
+                  {\@tempswafalse\@citex,[]}%
+}
+\def\@citex#1[#2]#3{%
+    \let\@citea\@empty
+    \@cite{%
+        \@for\@citeb:=#3\do{%
+            \@citea
+            \def\@citea{#1 }%
+            \edef\@citeb{\expandafter\@iden\@citeb}%
+            \if at filesw
+                \immediate\write\@auxout{\string\citation{\@citeb}}%
+            \fi
+            \@ifundefined{b@\@citeb}{%
+                {\bf ?}%
+                \@warning{%
+                    Citation `\@citeb' on page \thepage\space undefined%
+                }%
+            }%
+            {\csname b@\@citeb\endcsname}%
+        }%
+    }{#2}%
+}
+\let\@biblabel\@gobble
+\newdimen\bibindent
+\setcounter{enumi}{1}
+\bibindent=0em
+\def\thebibliography#1{%
+\ifnum\addauflag=0\addauthorsection\global\addauflag=1\fi
+    \section{%
+       {References} % was uppercased but this affects pdf bookmarks (SP/GM Oct. 2004)
+        \@mkboth{{\refname}}{{\refname}}%
+    }%
+    \list{[\arabic{enumi}]}{%
+        \settowidth\labelwidth{[#1]}%
+        \leftmargin\labelwidth
+        \advance\leftmargin\labelsep
+        \advance\leftmargin\bibindent
+        \itemindent -\bibindent
+        \listparindent \itemindent
+        \usecounter{enumi}
+    }%
+    \let\newblock\@empty
+    \raggedright  %% 7 JAN 2000 gkmt
+    \sloppy
+    \sfcode`\.=1000\relax
+}
+
+
+\gdef\balancecolumns
+{\vfill\eject
+\global\@colht=\textheight
+\global\ht\@cclv=\textheight
+}
+
+\newcount\colcntr
+\global\colcntr=0
+\newbox\savebox
+
+\gdef \@makecol {%
+\global\advance\colcntr by 1
+\ifnum\colcntr>2 \global\colcntr=1\fi
+   \ifvoid\footins
+     \setbox\@outputbox \box\@cclv
+   \else
+     \setbox\@outputbox \vbox{%
+\boxmaxdepth \@maxdepth
+       \@tempdima\dp\@cclv
+       \unvbox \@cclv
+       \vskip-\@tempdima
+       \vskip \skip\footins
+       \color at begingroup
+         \normalcolor
+         \footnoterule
+         \unvbox \footins
+       \color at endgroup
+       }%
+   \fi
+   \xdef\@freelist{\@freelist\@midlist}%
+   \global \let \@midlist \@empty
+   \@combinefloats
+   \ifvbox\@kludgeins
+     \@makespecialcolbox
+   \else
+     \setbox\@outputbox \vbox to\@colht {%
+\@texttop
+       \dimen@ \dp\@outputbox
+       \unvbox \@outputbox
+   \vskip -\dimen@
+       \@textbottom
+       }%
+   \fi
+   \global \maxdepth \@maxdepth
+}
+\def\titlenote{\@ifnextchar[\@xtitlenote{\stepcounter\@mpfn
+\global\advance\titlenotecount by 1
+\ifnum\titlenotecount=1
+    \raisebox{9pt}{$\ast$}
+\fi
+\ifnum\titlenotecount=2
+    \raisebox{9pt}{$\dagger$}
+\fi
+\ifnum\titlenotecount=3
+    \raisebox{9pt}{$\ddagger$}
+\fi
+\ifnum\titlenotecount=4
+\raisebox{9pt}{$\S$}
+\fi
+\ifnum\titlenotecount=5
+\raisebox{9pt}{$\P$}
+\fi
+         \@titlenotetext
+}}
+
+\long\def\@titlenotetext#1{\insert\footins{%
+\ifnum\titlenotecount=1\global\tntoks={#1}\fi
+\ifnum\titlenotecount=2\global\tntokstwo={#1}\fi
+\ifnum\titlenotecount=3\global\tntoksthree={#1}\fi
+\ifnum\titlenotecount=4\global\tntoksfour={#1}\fi
+\ifnum\titlenotecount=5\global\tntoksfive={#1}\fi
+    \reset at font\footnotesize
+    \interlinepenalty\interfootnotelinepenalty
+    \splittopskip\footnotesep
+    \splitmaxdepth \dp\strutbox \floatingpenalty \@MM
+    \hsize\columnwidth \@parboxrestore
+    \protected at edef\@currentlabel{%
+    }%
+    \color at begingroup
+   \color at endgroup}}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%
+\ps at plain
+\baselineskip=11pt
+\let\thepage\relax % For  NO page numbers - Gerry Nov. 30th. 1999
+\def\setpagenumber#1{\global\setcounter{page}{#1}}
+%\pagenumbering{arabic}  % Arabic page numbers but commented out for NO page numbes - Gerry Nov. 30th. 1999
+\twocolumn             % Double column.
+\flushbottom           % Even bottom -- alas, does not balance columns at end of document
+\pagestyle{plain}
+
+% Need Copyright Year and Copyright Data to be user definable (in .tex file).
+% Gerry Nov. 30th. 1999
+\newtoks\copyrtyr
+\newtoks\acmcopyr
+\newtoks\boilerplate
+\def\CopyrightYear#1{\global\copyrtyr{#1}}
+\def\crdata#1{\global\acmcopyr{#1}}
+\def\permission#1{\global\boilerplate{#1}}
+%
+\newtoks\copyrightetc
+\global\copyrightetc{\ } %  Need to have 'something' so that adequate space is left for pasting in a line if "confinfo" is supplied.
+
+\toappear{\the\boilerplate\par
+{\confname{\the\conf}} \the\confinfo\par \the\copyrightetc}
+% End of ACM_PROC_ARTICLE-SP.CLS -- V2.7SP - 10/15/2004 --
+% Gerry Murray -- October 15th. 2004

Added: pypy/extradoc/talk/icooolps2009/code/full.txt
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/code/full.txt	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,22 @@
+\begin{verbatim}
+loop_start(a0, regs0)
+# MOV_R_A 0
+a1 = call(Const(<* fn list_getitem>), regs0, Const(0))
+# DECR_A
+a2 = int_sub(a1, Const(1))
+# MOV_A_R 0    
+call(Const(<* fn list_setitem>), regs0, Const(0), a2)
+# MOV_R_A 2
+a3 = call(Const(<* fn list_getitem>), regs0, Const(2))
+# ADD_R_TO_A  1
+i0 = call(Const(<* fn list_getitem>), regs0, Const(1))
+a4 = int_add(a3, i0)
+# MOV_A_R 2
+call(Const(<* fn list_setitem>), regs0, Const(2), a4)
+# MOV_R_A 0
+a5 = call(Const(<* fn list_getitem>), regs0, Const(0))
+# JUMP_IF_A 4
+i1 = int_is_true(a5)
+guard_true(i1)
+jump(a5, regs0)
+\end{verbatim}

Added: pypy/extradoc/talk/icooolps2009/code/no-green-folding.txt
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/code/no-green-folding.txt	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,62 @@
+{\small
+\begin{verbatim}
+loop_start(a0, regs0, bytecode0, pc0)
+# MOV_R_A 0
+opcode0 = strgetitem(bytecode0, pc0)
+pc1 = int_add(pc0, Const(1))
+guard_value(opcode0, Const(2))
+n1 = strgetitem(bytecode0, pc1)
+pc2 = int_add(pc1, Const(1))
+a1 = call(Const(<* fn list_getitem>), regs0, n1)
+# DECR_A
+opcode1 = strgetitem(bytecode0, pc2)
+pc3 = int_add(pc2, Const(1))
+guard_value(opcode1, Const(7))
+a2 = int_sub(a1, Const(1))
+# MOV_A_R 0
+opcode1 = strgetitem(bytecode0, pc3)
+pc4 = int_add(pc3, Const(1))
+guard_value(opcode1, Const(1)) 
+n2 = strgetitem(bytecode0, pc4)
+pc5 = int_add(pc4, Const(1))
+call(Const(<* fn list_setitem>), regs0, n2, a2)
+# MOV_R_A 2
+opcode2 = strgetitem(bytecode0, pc5)
+pc6 = int_add(pc5, Const(1))
+guard_value(opcode2, Const(2))
+n3 = strgetitem(bytecode0, pc6)
+pc7 = int_add(pc6, Const(1))
+a3 = call(Const(<* fn list_getitem>), regs0, n3)
+# ADD_R_TO_A 1
+opcode3 = strgetitem(bytecode0, pc7)
+pc8 = int_add(pc7, Const(1))
+guard_value(opcode3, Const(5))
+n4 = strgetitem(bytecode0, pc8)
+pc9 = int_add(pc8, Const(1))
+i0 = call(Const(<* fn list_getitem>), regs0, n4)
+a4 = int_add(a3, i0)
+# MOV_A_R 2
+opcode4 = strgetitem(bytecode0, pc9)
+pc10 = int_add(pc9, Const(1))
+guard_value(opcode4, Const(1))
+n5 = strgetitem(bytecode0, pc10)
+pc11 = int_add(pc10, Const(1))
+call(Const(<* fn list_setitem>), regs0, n5, a4)
+# MOV_R_A 0
+opcode5 = strgetitem(bytecode0, pc11)
+pc12 = int_add(pc11, Const(1))
+guard_value(opcode5, Const(2))
+n6 = strgetitem(bytecode0, pc12)
+pc13 = int_add(pc12, Const(1))
+a5 = call(Const(<* fn list_getitem>), regs0, n6)
+# JUMP_IF_A 4
+opcode6 = strgetitem(bytecode0, pc13)
+pc14 = int_add(pc13, Const(1))
+guard_value(opcode6, Const(3))
+target0 = strgetitem(bytecode0, pc14)
+pc15 = int_add(pc14, Const(1))
+i1 = int_is_true(a5)
+guard_true(i1)
+jump(a5, regs0, bytecode0, target0)
+\end{verbatim}
+}

Added: pypy/extradoc/talk/icooolps2009/code/normal-tracing.txt
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/code/normal-tracing.txt	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,8 @@
+\begin{verbatim}
+loop_start(a0, regs0, bytecode0, pc0)
+opcode0 = strgetitem(bytecode0, pc0)
+pc1 = int_add(pc0, Const(1))
+guard_value(opcode0, Const(7))
+a1 = int_sub(a0, Const(1))
+jump(a1, regs0, bytecode0, pc1)
+\end{verbatim}

Added: pypy/extradoc/talk/icooolps2009/code/tlr-paper-full.py
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/code/tlr-paper-full.py	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,32 @@
+\begin{verbatim}
+class TLRJitDriver(JitDriver):
+    greens = ['pc', 'bytecode']
+    reds   = ['a', 'regs']
+
+tlrjitdriver = TLRJitDriver()
+
+def interpret(bytecode, a):
+    regs = [0] * 256
+    pc = 0
+    while True:
+        tlrjitdriver.jit_merge_point(
+            bytecode=bytecode, pc=pc,
+            a=a, regs=regs)
+        opcode = ord(bytecode[pc])
+        pc += 1
+        if opcode == JUMP_IF_A:
+            target = ord(bytecode[pc])
+            pc += 1
+            if a:
+                if target < pc:
+                    tlrjitdriver.can_enter_jit(
+                        bytecode=bytecode, pc=target,
+                        a=a, regs=regs)
+                pc = target
+        elif opcode == MOV_A_R:
+            n = ord(bytecode[pc])
+            pc += 1
+            regs[n] = a
+        elif opcode == MOV_R_A:
+            ... # rest unmodified
+\end{verbatim}

Added: pypy/extradoc/talk/icooolps2009/code/tlr-paper.py
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/code/tlr-paper.py	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,29 @@
+\begin{verbatim}
+def interpret(bytecode, a):
+    regs = [0] * 256
+    pc = 0
+    while True:
+        opcode = ord(bytecode[pc])
+        pc += 1
+        if opcode == JUMP_IF_A:
+            target = ord(bytecode[pc])
+            pc += 1
+            if a:
+                pc = target
+        elif opcode == MOV_A_R:
+            n = ord(bytecode[pc])
+            pc += 1
+            regs[n] = a
+        elif opcode == MOV_R_A:
+            n = ord(bytecode[pc])
+            pc += 1
+            a = regs[n]
+        elif opcode == ADD_R_TO_A:
+            n = ord(bytecode[pc])
+            pc += 1
+            a += regs[n]
+        elif opcode == DECR_A:
+            a -= 1
+        elif opcode == RETURN_A:
+            return a
+\end{verbatim}

Added: pypy/extradoc/talk/icooolps2009/paper.bib
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/paper.bib	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,160 @@
+
+ at techreport{miranda_context_1999,
+	title = {Context Management in {VisualWorks} 5i},
+	abstract = {Smalltalk-80 provides a reification of execution state in the form of context objects which represent procedure activation records. Smalltalk-80 also provides full closures with indefinite extent. These features pose interesting implementation challenges because a naïve implementation entails instantiating context objects on every method activation, but typical Smalltalk-80 programs obey stack discipline for the vast majority of activations. Both software and hardware implementations of Smalltalk-80 have mapped contexts and closure activations to stack frames but not without overhead when compared to traditional stack-based activation and return in “conventional” languages. We present a new design for contexts and closures that significantly reduces the overall overhead of these features and imposes overhead only in code that actually manipulates execution state in the form of contexts.},
+	institution = {{ParcPlace} Division, {CINCOM,} Inc.},
+	author = {Eliot Miranda},
+	year = {1999},
+},
+
+ at inproceedings{sullivan_dynamic_2003,
+	address = {San Diego, California},
+	title = {Dynamic native optimization of interpreters},
+	isbn = {1-58113-655-2},
+	url = {http://portal.acm.org/citation.cfm?id=858570.858576},
+	doi = {10.1145/858570.858576},
+	abstract = {For domain specific languages, "scripting languages", dynamic languages, and for virtual machine-based languages, the most straightforward implementation strategy is to write an interpreter. A simple interpreter consists of a loop that fetches the next bytecode, dispatches to the routine handling that bytecode, then loops. There are many ways to improve upon this simple mechanism, but as long as the execution of the program is driven by a representation of the program other than as a stream of native instructions, there will be some "interpretive {overhead".There} is a long history of approaches to removing interpretive overhead from programming language implementations. In practice, what often happens is that, once an interpreted language becomes popular, pressure builds to improve performance until eventually a project is undertaken to implement a native Just In Time {(JIT)} compiler for the language. Implementing a {JIT} is usually a large effort, affects a significant part of the existing language implementation, and adds a significant amount of code and complexity to the overall code {base.In} this paper, we present an innovative approach that dynamically removes much of the interpreted overhead from language implementations, with minimal instrumentation of the original interpreter. While it does not give the performance improvements of hand-crafted native compilers, our system provides an appealing point on the language implementation spectrum.},
+	booktitle = {Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators},
+	publisher = {{ACM}},
+	author = {Gregory T. Sullivan and Derek L. Bruening and Iris Baron and Timothy Garnett and Saman Amarasinghe},
+	year = {2003},
+	pages = {50--57},
+},
+
+ at inbook{bolz_back_2008,
+	title = {Back to the Future in One Week — Implementing a Smalltalk {VM} in {PyPy}},
+	url = {http://dx.doi.org/10.1007/978-3-540-89275-5_7},
+	abstract = {We report on our experiences with the Spy project, including implementation details and benchmark results. Spy is a re-implementation of the Squeak (i.e. Smalltalk-80) {VM} using the {PyPy} toolchain. The {PyPy} project allows code written in {RPython,} a subset of Python, to be translated
+to a multitude of different backends and architectures. During the translation, many aspects of the implementation can be
+independently tuned, such as the garbage collection algorithm or threading implementation. In this way, a whole host of interpreters
+can be derived from one abstract interpreter definition. Spy aims to bring these benefits to Squeak, allowing for greater portability and, eventually, improved performance. The current
+Spy codebase is able to run a small set of benchmarks that demonstrate performance superior to many similar Smalltalk {VMs,} but
+which still run slower than in Squeak itself. Spy was built from scratch over the course of a week during a joint {Squeak-PyPy} Sprint in Bern last autumn.
+},
+	booktitle = {{Self-Sustaining} Systems},
+	author = {Carl Friedrich Bolz and Adrian Kuhn and Adrian Lienhard and Nicholas Matsakis and Oscar Nierstrasz and Lukas Renggli and Armin Rigo and Toon Verwaest},
+	year = {2008},
+	pages = {123--139}
+},
+
+ at inproceedings{hlzle_optimizing_1994,
+	address = {Orlando, Florida, United States},
+	title = {Optimizing dynamically-dispatched calls with run-time type feedback},
+	isbn = {{0-89791-662-X}},
+	url = {http://portal.acm.org/citation.cfm?id=178243.178478},
+	doi = {10.1145/178243.178478},
+	abstract = {Note: {OCR} errors may be found in this Reference List extracted from the full text article. {ACM} has opted to expose the complete List rather than only correct and linked references.},
+	booktitle = {Proceedings of the {ACM} {SIGPLAN} 1994 conference on Programming language design and implementation},
+	publisher = {{ACM}},
+	author = {Urs Hölzle and David Ungar},
+	year = {1994},
+	pages = {326--336},
+},
+
+ at techreport{andreas_gal_incremental_2006,
+	title = {Incremental Dynamic Code Generation with Trace Trees},
+	abstract = {The unit of compilation for traditional just-in-time compilers is the method. We have explored trace-based compilation, in which the unit of compilation is a loop, potentially spanning multiple methods and even library code. Using a new intermediate representation that is discovered and updated lazily on-demand while the program is being executed, our compiler generates code that is competitive with traditional dynamic compilers, but that uses only a fraction of the compile time and memory footprint.},
+	number = {{ICS-TR-06-16}},
+	institution = {Donald Bren School of Information and Computer Science, University of California, Irvine},
+	author = {Andreas Gal and Michael Franz},
+	month = nov,
+	year = {2006},
+	pages = {11}
+},
+
+ at techreport{mason_chang_efficient_2007,
+	title = {Efficient {Just-In-Time} Execution of Dynamically Typed Languages
+Via Code Specialization Using Precise Runtime Type Inference},
+	abstract = {Dynamically typed languages such as {JavaScript} present a challenge to just-in-time compilers. In contrast to statically typed languages such as {JVML,} in which there are specific opcodes for common operations on primitive types (such as iadd for integer addition), all operations in dynamically typed language such as {JavaScript} are late-bound. Often enough, types cannot be inferred with certainty ahead of execution. As a result, just-in-time compilers for dynamically typed languages have tended to perform worse than their statically-typed counterparts. We present a new approach to compiling dynamically typed languages in which code traces observed during execution are dynamically specialized for each actually observed run-time type. For most benchmark programs, our prototype {JavaScript} virtual machine outperforms every other {JavaScript} platform known to us.},
+	number = {{ICS-TR-07-10}},
+	institution = {Donald Bren School of Information and Computer Science, University of California, Irvine},
+	author = {Mason Chang and Michael Bebenita and Alexander Yermolovich and Andreas Gal and Michael Franz},
+	year = {2007},
+},
+
+ at inproceedings{andreas_gal_one_2007,
+	address = {Berlin, Germany},
+	title = {One Method At A Time Is Quite a Waste of Time},
+	abstract = {Most just-in-time compilers for object-oriented languages operate at the granularity of methods. Unfortunately, even “hot” methods often contain "cold" code paths. As a consequence, just-in-time compilers waste time compiling code that will be executed only rarely, if at all. We discuss an alternative approach in which only truly “hot” code is ever compiled.
+},
+	booktitle = {Proceedings of the Second Workshop on Implementation, Compilation, Optimization of {Object-Oriented} Languages, Programs and Systems {(ICOOOLPS'2007)}},
+	author = {Andreas Gal and Michael Bebenita and Michael Franz},
+	month = jul,
+	year = {2007},
+	pages = {11--16},
+},
+
+ at inproceedings{carl_friedrich_bolz_to_2007,
+	title = {How to not write a Virtual Machine},
+	abstract = {Typical modern dynamic languages have a growing number of implementations. We explore the reasons for this situation, and the limitations it imposes on open source or academic communities that lack the resources to fine-tune and maintain them all. It is sometimes proposed that implementing dynamic languages on top of a standardized general-purpose object-oriented virtual machine (like Java or {.NET)} would help reduce this burden. We propose a complementary alternative to writing custom virtual machine {(VMs)} by hand, validated by the {PyPy} project: flexibly generating {VMs} from a high-level "specification",
+inserting features and low-level details automatically – including good just-in-time compilers tuned to the dynamic language at hand.
+We believe this to be ultimately a better investment of efforts than the development of more and more advanced general-purpose object
+oriented {VMs.} In this paper we compare these two approaches in detail.},
+	booktitle = {Proceedings of 3rd Workshop on Dynamic Languages and Applications {(DYLA} 2007)},
+	author = {Carl Friedrich Bolz and Armin Rigo},
+	year = {2007}
+},
+
+ at inproceedings{rigo_pypys_2006,
+	address = {Portland, Oregon, {USA}},
+	title = {{PyPy's} approach to virtual machine construction},
+	isbn = {{1-59593-491-X}},
+	url = {http://portal.acm.org/citation.cfm?id=1176753},
+	doi = {10.1145/1176617.1176753},
+	abstract = {The {PyPy} project seeks to prove both on a research and a practical level the feasibility of constructing a virtual machine {(VM)} for a dynamic language in a dynamic language - in this case, Python. The aim is to translate (i.e. compile) the {VM} to arbitrary target environments, ranging in level from {C/Posix} to {Smalltalk/Squeak} via Java and {CLI/.NET,} while still being of reasonable efficiency within these {environments.A} key tool to achieve this goal is the systematic reuse of the Python language as a system programming language at various levels of our architecture and translation process. For each level, we design a corresponding type system and apply a generic type inference engine - for example, the garbage collector is written in a style that manipulates simulated pointer and address objects, and when translated to C these operations become C-level pointer and address instructions.},
+	booktitle = {Companion to the 21st {ACM} {SIGPLAN} conference on Object-oriented programming systems, languages, and applications},
+	publisher = {{ACM}},
+	author = {Armin Rigo and Samuele Pedroni},
+	year = {2006},
+	pages = {944--953}
+},
+
+ at article{bala_dynamo:transparent_2000,
+	title = {Dynamo: a transparent dynamic optimization system},
+	volume = {35},
+	url = {http://citeseer.ist.psu.edu/bala00dynamo.html},
+	number = {5},
+	journal = {{ACM} {SIG{\textbackslash}-PLAN} Notices},
+	author = {Vasanth Bala and Evelyn Duesterwald and Sanjeev Banerjia},
+	year = {2000},
+	pages = {1--12}
+},
+
+ at inproceedings{gal_hotpathvm:effective_2006,
+	address = {Ottawa, Ontario, Canada},
+	title = {{HotpathVM:} an effective {JIT} compiler for resource-constrained devices},
+	isbn = {1-59593-332-6},
+	url = {http://portal.acm.org/citation.cfm?doid=1134760.1134780},
+	doi = {10.1145/1134760.1134780},
+	abstract = {We present a just-in-time compiler for a Java {VM} that is small enough to fit on resource-constrained devices, yet is surprisingly effective. Our system dynamically identifies traces of frequently executed bytecode instructions (which may span several basic blocks across several methods) and compiles them via Static Single Assignment {(SSA)} construction. Our novel use of {SSA} form in this context allows to hoist instructions across trace side-exits without necessitating expensive compensation code in off-trace paths. The overall memory consumption (code and data) of our system is only 150 {kBytes,} yet benchmarks show a speedup that in some cases rivals heavy-weight just-in-time compilers.},
+	booktitle = {Proceedings of the 2nd international conference on Virtual execution environments},
+	publisher = {{ACM}},
+	author = {Andreas Gal and Christian W. Probst and Michael Franz},
+	year = {2006},
+	pages = {144--153}
+},
+
+ at inproceedings{hlzle_optimizing_1991,
+	title = {Optimizing {Dynamically-Typed} {Object-Oriented} Languages With Polymorphic Inline Caches},
+	isbn = {3-540-54262-0},
+	url = {http://portal.acm.org/citation.cfm?id=679193&dl=ACM&coll=portal},
+	booktitle = {Proceedings of the European Conference on {Object-Oriented} Programming},
+	publisher = {{Springer-Verlag}},
+	author = {Urs Hölzle and Craig Chambers and David Ungar},
+	year = {1991},
+	pages = {21--38}
+},
+
+ at inproceedings{rigo_representation-based_2004,
+	address = {Verona, Italy},
+	title = {Representation-based just-in-time specialization and the psyco prototype for python},
+	isbn = {1-58113-835-0},
+	url = {http://portal.acm.org/citation.cfm?id=1014010},
+	doi = {10.1145/1014007.1014010},
+	abstract = {A powerful application of specialization is to remove interpretative overhead: a language can be implemented with an interpreter, whose performance is then improved by specializing it for a given program source. This approach is only moderately successful with very high level languages, where the operation of each single step can be highly dependent on run-time data and context. In the present paper, the Psyco prototype for the Python language is presented. It introduces two novel techniques. The first is just-in-time specialization, or specialization by need, which introduces the "unlifting" ability for a value to be promoted from run-time to compile-time during specialization -- the inverse of the lift operator of partial evaluation. Its presence gives an unusual and powerful perspective on the specialization process. The second technique is representations, a theory of data-oriented specialization generalizing the traditional specialization domains (i.e. the compile-time/run-time dichotomy).},
+	booktitle = {Proceedings of the 2004 {ACM} {SIGPLAN} symposium on Partial evaluation and semantics-based program manipulation},
+	publisher = {{ACM}},
+	author = {Armin Rigo},
+	year = {2004},
+	pages = {15--26}
+}

Added: pypy/extradoc/talk/icooolps2009/paper.tex
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/icooolps2009/paper.tex	Tue Mar 31 11:26:33 2009
@@ -0,0 +1,630 @@
+\documentclass{acm_proc_article-sp}
+
+\usepackage{ifthen}
+\usepackage{fancyvrb}
+\usepackage{color}
+
+\let\oldcite=\cite
+
+\renewcommand\cite[1]{\ifthenelse{\equal{#1}{XXX}}{[citation~needed]}{\oldcite{#1}}}
+
+\begin{document}
+
+\title{Tracing the Meta-Level: PyPy's JIT Compiler}
+
+\numberofauthors{2}
+\author{
+\alignauthor Carl Friedrich Bolz\\
+       \affaddr{Heinrich-Heine-Universität Düsseldorf}\\
+       \affaddr{Softwaretechnik und Programmiersprachen}\\
+       \affaddr{Institut für Informatik}\\ 
+       \affaddr{Universitätsstra{\ss}e 1}\\
+       \affaddr{D-40225 Düsseldorf}\\
+       \affaddr{Deutschland}\\
+       \email{cfbolz at gmx.de}
+\alignauthor Armin Rigo\\
+       \email{arigo at tunes.org}
+}
+\maketitle
+
+
+%\category{D.3.4}{Programming Languages}{Processors}[code generation,
+%interpreters, run-time environments]
+%\category{F.3.2}{Logics and Meanings of Programs}{Semantics of Programming
+%Languages}[program analysis]
+
+\begin{abstract}
+In this paper we describe the ongoing research in the PyPy project to write a
+JIT compiler that is automatically adapted to various languages, given an
+interpreter for that language. This is achieved with the help of a slightly
+adapted tracing JIT compiler in combination with some hints by the author of the
+interpreter.  XXX
+
+\end{abstract}
+
+XXX write somewhere that we will be predominantly occupied with bytecode
+interpreters
+XXX write somewhere that one problem of using tracing JITs for dynamic languages
+is that dynamic languages have very complex bytecodes
+
+
+\section{Introduction}
+
+Dynamic languages, rise in popularity, bla bla XXX
+
+One of the often-cited drawbacks of dynamic languages is the performance
+penalties they impose. Typically they are slower than statically typed languages
+\cite{XXX}. Even though there has been a lot of research into improving the
+performance of dynamic languages \cite{XXX}, those techniques are not as widely
+used as one would expect. Many dynamic language implementations use completely
+straightforward bytecode-interpreters without any advanced implementation
+techniques like just-in-time compilation. There are a number of reasons for
+this. Most of them boil down to the inherent complexities of using compilation.
+Interpreters are simple to understand and to implement whereas writing a
+just-in-time compiler is an error-prone task that is even made harder by the
+dynamic features of a language.
+
+writing an interpreter has many advantages... XXX
+
+A recent approach to getting better performance for dynamic languages is that of
+tracing JIT compilers. XXX
+
+The PyPy project is trying to find approaches to generally ease the
+implementation of dynamic languages. It started as a Python implementation in
+Python, but has now extended its goals to be generally useful for implementing
+other dynamic languages as well. The general approach is to implement an
+interpreter for the language in a subset of Python. This subset is chosen in
+such a way that programs in it can be compiled into various target environments,
+such as C/Posix, the CLR or the JVM. The PyPy project is described in more
+details in Section \ref{sect:pypy}.
+
+In this paper we discuss ongoing work in the PyPy project to improve the
+performance of interpreters written with the help of the PyPy toolchain. The
+approach is that of a tracing JIT compiler. Opposed to the tracing JITs for dynamic
+languages that exist so far, PyPy's tracing JIT operates "one level down",
+e.g. traces the execution of the interpreter, as opposed to the execution
+of the user program. The fact that the program the tracing JIT compiles is
+in our case always an interpreter brings its own set of problems. We describe
+tracing JITs and their application to interpreters in Section
+\ref{sect:tracing}.  By this approach we hope to get a JIT compiler that can be
+applied to a variety of dynamic languages, given an interpreter for them. The
+process is not completely automatic but needs a small number of hints from the
+interpreter author, to help the tracing JIT. The details of how the process
+integrates into the rest of PyPy will be explained in Section
+\ref{sect:implementation}. This work is not finished, but already produces some
+promising results, which we will discuss in Section \ref{sect:evaluation}.
+
+
+%- dynamic languages important
+%- notoriously difficult to achieve good performance
+%- even though the techniques exist since a while, not many implementations
+%  actually use them
+%    - hard to get all corner-cases right
+%    - languages evolve
+%    - modern dynamic languages are large
+%    - open source/research communities don't have that many resources
+%
+%- PyPy project: trying find approaches to ease the implementation of dynamic
+%languages
+%- explore general ways to improve on the speed of dynamic languages with reduced
+%effort
+%- approach: write a tracing JIT that is applicable to many different languages,
+%by tracing "one level done"
+%- needs some hints by the interpreter-writer + slightly different optimizations
+%- paper will describe the problems of applying a tracing jit to an interpreter
+%- different integration needed than a typical tracing jit
+
+
+\section{The PyPy Project}
+\label{sect:pypy}
+
+The PyPy project\footnote{http://codespeak.net/pypy} was started to implement a
+new Python interpreter in Python but has now extended its goals to be an
+environment where flexible implementation of dynamic languages can be written.
+To implement a dynamic language with PyPy, an interpreter for that language has
+to be written in RPython. RPython ("Restricted Python") is a subset of Python
+chosen in such a way that type inference can be performed on it. The language
+interpreter can then be translated with the help of PyPy into various target
+environments, such as C/Posix, the CLR and the JVM. This is done by a component
+of PyPy called the \emph{translation toolchain}.
+
+The central idea of this way to implement VMs is that the interpreter
+implementation in RPython should be as free as possible of low-level
+implementation details, such as memory management strategy, threading model or
+object layout. Instead, these details are inserted into the VM during the
+translation process by the translation toolchain. This makes it possible to
+change these details later, if that becomes necessary. This is something that is
+hard to do with a traditional VM written in a low-level language such as C,
+since the low-level details need to be fixed early in the development-process.
+XXX is this paragraph really needed?
+
+In the following we will describe some details of the translation process, since
+they are needed in Section \ref{sect:implementation}. The first step is to produce
+control flow graphs of all functions of the RPython program. Afterwards, type
+inference is performed to gain type information about all the variables in the
+flow graphs. Afterwards, the abstraction level of the operations in the graphs
+is lowered in a stepwise fashion. At the end of this process, all operations in
+the graphs correspond rather directly to a simple operation in C. The variables
+are annotated with a C-like type system containing primitive types (like
+\texttt{Signed}, \texttt{Bool}, etc.) or pointer types pointing to Structs,
+Arrays or Functions.
+
+XXX example. reuse the one of the tracing jit?
+
+The translation process usually just turns these graphs into C code so that they
+can be compiled into an executable. However, they can also be interpreted in
+various ways. This is useful for testing and debugging the translation toolchain
+because the produced error messages in case of a crash are a lot more helpful
+than what would be produced after compilation to C. These low-level graphs are
+also what the tracing JIT takes as input, as we will see later.
+
+
+%- original goal: Python interpreter in Python
+%- general way to write flexible VMs for dynamic languages
+%- interpreter written in RPython, subset of Python to allow type inference
+%- translation toolchain
+%   - naive forward-propagation type inference
+%   - lowering of abstractions
+%   - lltype system, monomorphic C-level operations
+%   - type system: primitives, pointers to structs and arrays
+%   - still assumes presence of GC
+%   - can be interpreted in various ways
+
+\section{Tracing JIT Compilers}
+\label{sect:tracing}
+
+Tracing JITs are an idea explored by the Dynamo project
+\cite{bala_dynamo:transparent_2000} in the context of dynamic optimization of
+machine code at runtime. The techniques were then successfully applied to Java
+VMs \cite{gal_hotpathvm:effective_2006}. It also turned out that they are a
+relatively simple way to implement a JIT compiler for a dynamic language
+\cite{XXX}. The technique is now used by both and are now being used by both Mozilla's
+TraceMonkey JavaScript VM \cite{XXX} and Adobe's Tamarin ActionScript VM
+\cite{XXX}.
+
+Tracing JITs are built on the following basic assumptions:
+
+\begin{itemize}
+ \item programs spend most of their runtime in loops
+ \item several iterations of the same loop are likely to take similar code paths
+\end{itemize}
+
+The basic approach of a tracing JIT is to only generate machine code for the hot
+code paths of commonly executed loops and to interpret the rest of the program.
+The code for those common loops however should be highly optimized, including
+aggressive inlining.
+
+The generation of loops works as follows: At first, everything is interpreted.
+The interpreter does a bit of lightweight profiling to figure out which loops
+are run often. This lightweight profiling is usually done by having a counter on
+each backward jump instruction that counts how often this particular backward jump
+was executed. Since loops need a backward jump somewhere, this method finds
+loops in the user program.
+
+When a common loop is identified, the interpreter enters a
+special mode (called tracing mode). When in tracing mode, the interpreter
+records a history (the \emph{trace}) of all the operations it executes, in addition
+to actually performing the operations. During tracing, the trace is repeatedly
+(XXX make this more precise: when does the check happen?)
+checked whether the interpreter is at a position in the program that it had seen
+earlier in the trace. If this happens, the trace recorded corresponds to a loop
+in the program that the tracing interpreter is running. At this point, this loop
+is turned into machine code by taking the trace and making machine code versions
+of all the operations in it. The machine code can then be immediately executed,
+as it represents exactly the loop that is being interpreted at the moment anyway.
+
+This process assumes that the path through the loop that was traced is a
+"typical" example of possible paths (which is statistically likely). Of course
+it is possible that later another path through the loop is taken, therefore the
+machine code will contain \emph{guards}, which check that the path is still the same.
+If a guard fails during execution of the machine code, the machine code is left
+and execution falls back to using interpretation (there are more complex
+mechanisms in place to still produce more code for the cases of guard failures,
+but they are of no importance for this paper XXX is that true?).
+
+It is important to understand when the tracer considers a loop in the trace to
+be closed. This happens when the \emph{position key} is the same as at an earlier
+point. The position key describes the position of the execution of the program,
+e.g. usually contains things like the function currently being executed and the
+program counter position of the tracing interpreter. The tracing interpreter
+does not need to check all the time whether the position key already occurred
+earlier, but only at instructions that are able to change the position key
+to an earlier value, e.g. a backward branch instruction. Note that this is
+already the second place where backward branches are treated specially: During
+interpretation they are the place where the profiling is performed and where
+tracing is started or already existing assembler code entered; during tracing
+they are the place where the check for a closed loop is performed.
+
+Let's look at a small example. Take the following (slightly contrived) RPython
+code:
+\begin{verbatim}
+def f(a, b):
+    if b % 46 == 41:
+        return a - b
+    else:
+        return a + b
+def strange_sum(n):
+    result = 0
+    while n >= 0:
+        result = f(result, n)
+        n -= 1
+    return result
+\end{verbatim}
+
+At first those functions will be interpreted, but after a while, profiling shows
+that the \texttt{while} loop in \texttt{strange\_sum} is executed often.  The
+tracing JIT will then start trace the execution of that loop.  The trace would
+look as follows:
+\begin{verbatim}
+loop_header(result0, n0)
+i0 = int_mod(n0, Const(46))
+i1 = int_eq(i0, Const(41))
+guard_false(i1)
+result1 = int_add(result0, n0)
+n1 = int_sub(n0, Const(1))
+i2 = int_ge(n1, Const(0))
+guard_true(i2) [result1]
+jump(result1, n1)
+\end{verbatim}
+
+XXX add a note about the SSA-ness of the trace
+
+This trace will then be turned into machine code. Note that the machine code
+loop is by itself infinite and can only be left via a guard failure. Also note
+\texttt{f} was inlined into the loop and how the common \texttt{else} case was
+turned into machine code, while the other one is implemented via a guard
+failure. The variables in square brackets after the guards are the state that
+the interpreter will get when the guard fails.
+
+%- general introduction to tracing
+%- assumptions
+%- mixed-mode execution environment: interpretation, tracing, compilation,
+%  running native code
+%- write why tracing jits are particularly well suited for dynamic languages
+
+\subsection{Applying a Tracing JIT to an Interpreter}
+
+XXX \cite{sullivan_dynamic_2003} somewhere
+
+The tracing JIT of the PyPy project is atypical in that it is not applied to the
+user program, but to the interpreter running the user program. In this section
+we will explore what problems this brings, and how to solve them (at least
+partially). This means that there are two interpreters involved, and we need
+terminology to distinguish them. On the one hand, there is the interpreter that
+the tracing JIT uses to perform tracing. This we will call the \emph{tracing
+interpreter}. On the other hand, there is the interpreter that is running the
+users programs, which we will call the \emph{language interpreter}. The program
+that the language interpreter executes we will call the \emph{user program}
+(from the point of view of a VM author, the "user" is a programmer using the
+VM).
+
+A tracing JIT compiler finds the hot loops of the program it is compiling. In
+our case, this program is the language interpreter. The hot loop of the language
+interpreter is the bytecode dispatch loop. Usually it is also the only hot loop.
+Tracing one iteration of this loop means that the execution of one bytecode was
+seen. This means that the resulting machine code will correspond to a loop, that
+assumes that this particular bytecode will be executed many times in a row,
+which is clearly very unlikely.
+
+\begin{figure}
+\input{code/tlr-paper.py}
+\caption{A very simple bytecode interpreter with registers and an accumulator.}
+\label{fig:tlr-basic}
+\end{figure}
+
+\begin{figure}
+\begin{verbatim}
+    MOV_A_R     0   # i = a
+    MOV_A_R     1   # copy of 'a'
+    
+    # 4:
+    MOV_R_A     0   # i--
+    DECR_A
+    MOV_A_R     0    
+
+    MOV_R_A     2   # res += a
+    ADD_R_TO_A  1
+    MOV_A_R     2
+    
+    MOV_R_A     0   # if i!=0: goto 4
+    JUMP_IF_A   4
+
+    MOV_R_A     2   # return res
+    RETURN_A
+\end{verbatim}
+\caption{Example bytecode: Compute the square of the accumulator}
+\label{fig:square}
+\end{figure}
+
+Let's look at an example. Figure \ref{fig:tlr-basic} shows the code of a very
+simple bytecode interpreter with 256 registers and an accumulator. The
+\texttt{bytecode} argument is a string of bytes and all register and the
+accumulator are integers. A simple program for this interpreter that computes
+the square of the accumulator is shown in Figure \ref{fig:square}. If the
+tracing interpreter traces the execution of the \texttt{DECR\_A} bytecode, the
+trace would look as follows:
+\input{code/normal-tracing.txt}
+
+To improve this situation, the tracing JIT could trace the execution of several
+bytecodes, thus effectively unrolling the bytecode dispatch loop. Ideally, the
+bytecode loop should be unrolled exactly so much, that the unrolled version
+corresponds to a loop on the level of the user program. A loop in the user
+program occurs when the program counter of the language interpreter has the
+same value many times. This program counter is typically one or several
+variables in the language interpreter, for example the bytecode object of the
+currently executed function of the user program and the position of the current
+bytecode within that.
+
+Since the tracing JIT cannot know which parts of the language interpreter are
+the program counter, the author of the language interpreter needs to mark the
+relevant variables of the language interpreter with the help of a \emph{hint}.
+The tracing interpreter will then effectively add the values of these variables
+to the position key. This means, that the loop will only be considered to be
+closed, if these variables that are making up program counter at the language
+interpreter level are the same a second time. Such a loop is a loop of the user
+program. The program counter of the language interpreter can only be the same a
+second time after an instruction in the user program sets it to an earlier
+value. This happens only at backward jumps in the language interpreter. That
+means that the tracing interpreter needs to check for a closed loop only when it
+encounters a backward jump in the language interpreter. Again the tracing JIT
+cannot known where the backward branch is located, so it needs to be told with
+the help of a hint by the author of the language interpreter.
+
+The condition for reusing already existing machine code needs to be adapted to
+this new situation. In a classical tracing JIT there is at most one piece of
+assembler code per loop of the jitted program, which in our case is the language
+interpreter. When applying the tracing JIT to the language interpreter as
+described so far, \emph{all} pieces of assembler code correspond to the bytecode
+dispatch loop of the language interpreter. They correspond to different
+unrollings and paths of that loop though. To figure out which of them to use
+when trying to enter assembler code again, the program counter of the language
+interpreter needs to be checked. If it corresponds to the position key of one of
+the pieces of assembler code, then this assembler code can be entered. This
+check again only needs to be performed at the backward branches of the language
+interpreter.
+
+There is a similar conceptual problem about the point where tracing is started.
+Tracing starts when the tracing interpreter sees one particular loop often
+enough. This loop is always going to be the bytecode dispatch loop of the
+language interpreter, so the tracing interpreter will start tracing all the
+time. This is not sensible. It makes more sense to start tracing only if a
+particular loop in the user program would be seen often enough. Thus we
+need to change the lightweight profiling to identify the loops of the user
+program. Therefore profiling is also done at the backward branches of the
+language interpreter, using one counter per seen program counter of the language
+interpreter.
+
+\begin{figure}
+\input{code/tlr-paper-full.py}
+\caption{Simple bytecode interpreter with hints applied}
+\label{fig:tlr-full}
+\end{figure}
+
+Let's look at which hints would need to be applied to the example interpreter
+from Figure \ref{fig:tlr-basic}. The basic thing needed to apply hints is a
+subclass of \texttt{JitDriver} that lists all the variables of the bytecode
+loop. The variables are classified into two groups, red variables and green
+variables. The green variables are those that the tracing JIT should consider to
+be part of the program counter of the language interpreter. In the case of the
+example, the \texttt{pc} variable is obviously part of the program counter.
+However, the \texttt{bytecode} variable is also counted as green, since the
+\texttt{pc} variable is meaningless without the knowledge of which bytecode
+string is currently being interpreted. All other variables are red.
+
+In addition to the classification of the variables, there are two methods of
+\texttt{JitDriver} that need to be called. Both of them get as arguments the
+current values of the variables listed in the definition of the driver. The
+first one is \texttt{jit\_merge\_point} which needs to be put at the beginning
+of the body of the bytecode dispatch loop. The other, more interesting one, is
+\texttt{can\_enter\_jit}. This method needs to be called at the end of any
+instruction that can set the program counter of the language interpreter to an
+earlier value. For the example this is only the \texttt{JUMP\_IF\_A}
+instruction, and only if it is actually a backward jump. The place where this
+method is called is where the language interpreter performs profiling to decide
+when to start tracing. It is also the place where the tracing JIT checks
+whether a loop is closed. This is considered to be the case when the values of
+the "green" variables are the same as at an earlier call to the
+\texttt{can\_enter\_jit} method.
+
+For the small example the hints look like a lot of work. However, the amount of
+hints is essentially constant no matter how large the interpreter is, which
+makes it seem less significant for larger interpreters.
+
+When executing the Square function of Figure \ref{fig:square}, the profiling
+will identify the loop in the square function to be hot, and start tracing. It
+traces the execution of the interpreter running the loop of the square function
+for one iteration, thus unrolling the interpreter loop of the example
+interpreter eight times. The resulting trace can be seen in Figure 
+\ref{fig:trace-no-green-folding}.
+
+\begin{figure}
+\input{code/no-green-folding.txt}
+\caption{Trace when executing the Square function of Figure \ref{fig:square},
+with the corresponding bytecodes as comments.}
+\label{fig:trace-no-green-folding}
+\end{figure}
+
+XXX summarize at which points the tracing interpreter needed changing
+XXX all changes only to the position key and when to enter/leave the tracer!
+XXX tracing remains essentially the same
+
+\subsection{Improving the Result}
+
+The critical problem of tracing the execution of just one bytecode has been
+solved, the loop corresponds exactly to the loop in the square function.
+However, the resulting trace is a bit too long. Most of its operations are not
+actually doing any computation that is part of the square function. Instead,
+they manipulate the data structures of the language interpreter. While this is
+to be expected, given that the tracing interpreter looks at the execution of the
+language interpreter, it would still be nicer if some of these operations could
+be removed.
+
+The simple insight how to greatly improve the situation is that most of the
+operations in the trace are actually concerned with manipulating the
+bytecode and the program counter. Those are stored in variables that are part of
+the position key (they are "green"), that means that the tracer checks that they
+are some fixed value at the beginning of the loop. In the example the check
+would be that the \texttt{bytecode} variable is the bytecode string
+corresponding to the square function and that the \texttt{pc} variable is
+\texttt{4}. Therefore it is possible to constant-fold computations on them away,
+as long as the operations are side-effect free. Since strings are immutable in
+Python, it is possible to constant-fold the \texttt{strgetitem} operation. The
+\texttt{int\_add} operations can be folded anyway.
+
+With this optimization enabled, the trace looks as in Figure
+\ref{fig:trace-full}. Now a lot of the language interpreter is actually gone
+from the trace and what is left corresponds very closely to the loop of the
+square function. The only vestige of the language interpreter is the fact that
+the register list is still used to store the state of the computation. This
+could be removed by some other optimization, but is maybe not really all that
+bad anyway (in fact we have an experimental optimization that does exactly that,
+but it is not finished).
+
+\begin{figure}
+\input{code/full.txt}
+\caption{Trace when executing the Square function of Figure \ref{fig:square},
+with the corresponding bytecodes as comments. The constant-folding of operations
+on green variables is enabled.}
+\label{fig:trace-full}
+\end{figure}
+
+
+
+%- problem: typical bytecode loops don't follow the general assumption of tracing
+%- needs to unroll bytecode loop
+%    - how often to unroll
+%    - when to start tracing?
+%    - unroll exactly so that unrolled loop corresponds to loop of the user
+%      program
+%- how to improve matters: introducing merge keys
+%- constant-folding of operations on green things
+%    - similarities to BTA of partial evaluation
+
+\section{Implementation Issues}
+\label{sect:implementation}
+
+In this section we will describe some of the practical issues when implementing
+the scheme described in the last section in PyPy. In particular we will describe
+some of the problems of integrating the various parts with each other.
+
+The first integration problem is how to \emph{not} integrate the tracing JIT at
+all. It should be possible to choose when the interpreter is translated to C
+whether the JIT should be built in or not. If the JIT is not enabled, all the
+hints that are possibly in the interpreter source are just ignored by the
+translation process. In this way, the result of the translation is identical to
+as if no hints were present in the interpreter at all.
+
+If the JIT is enabled, things are more interesting. A classical tracing JIT will
+interpret the program it is running until a common loop is identified, at which
+point tracing and ultimately assembler generation starts. The tracing JIT in
+PyPy is operating on the language interpreter, which is written in RPython. But
+RPython programs are translatable to C. This means that interpreting the
+language interpreter before a common loop is found is clearly not desirable,
+since the overhead of this double-interpretation would be significantly too big
+to be practical.
+
+What is done instead is that the language interpreter keeps running as a C
+program, until a common loop in the user program is found. To identify loops the
+C version of the language interpreter is generated in such a way that at the
+place that corresponds to the \texttt{can\_enter\_jit} hint profiling is
+performed using the program counter of the language interpreter. Apart from this
+bit of profiling, the language interpreter behaves in just the same way as
+without a JIT.
+
+When a hot loop in the user program is identified, tracing is started. The
+tracing interpreter is invoked to start tracing the language interpreter that is
+running the user program. Of course the tracing interpreter cannot actually
+trace the execution of the C representation of the language interpreter. Instead
+it takes the state of the execution of the language interpreter and starts
+tracing using a bytecode representation of the language interpreter. That means
+there are two "versions" of the language interpreter embedded in the final
+executable of the VM: On the one hand it is there as executable machine code, on
+the other hand as bytecode for the tracing interpreter. It also means that
+tracing is costly as it incurs exactly a double interpretation overhead.
+
+From then on things proceed like described in Section \ref{sect:tracing}. The
+tracing interpreter tries to find a loop in the user program, if it found one it
+will produce machine code for that loop and this machine code will be
+immediately executed. The machine code is executed until a guard fails. Then the
+execution should fall back to normal interpretation by the language interpreter.
+This falling back is possibly a complex process, since the guard failure can
+have occurred arbitrarily deep in a helper function of the language interpreter,
+which would make it hard to rebuild the state of the language interpreter and
+let it run from that point (e.g. this would involve building a potentially deep
+C stack). Instead the falling back is achieved by a special \emph{fallback
+interpreter} which runs the language interpreter and the user program from the
+point of the guard failure. The fallback interpreter is essentially a variant of
+the tracing interpreter that does not keep a trace. The fallback interpreter
+runs until execution reaches a safe point where it is easy to let the C version
+of the language interpreter resume its operation. Usually this means that the
+fallback interpreter executes at most one bytecode operation of the language
+interpreter. After the language interpreter takes over again, the whole process
+starts again.
+
+\subsection{Various Issues}
+
+This section will hint at some other implementation issues and optimizations
+that we have done that are beyond the scope of this paper (and will be subject
+of a later publication).
+
+\textbf{Assembler Backends:} The tracing interpreter uses a well-defined
+interface to an assembler backend for code generation. This makes it possible to
+easily port the tracing JIT to various architectures (including, we hope, to
+virtual machines such as the JVM where backend could generate bytecode at
+runtime).
+
+\textbf{Trace Trees:} This paper ignored the problem of guards that fail in a
+large percentage of cases because there are several equally likely paths through
+a loop. This of course is not always practicable. Therefore we also start
+tracing from guards that failed many times and produce assembler code for that
+path, instead of always falling back to interpretation. 
+
+\textbf{Allocation Removal:} A key optimization for making the approach
+produce good code for more complex dynamic language is to perform escape
+analysis on the loop operation after tracing has been performed. In this way all
+objects that are allocated during the loop and don't actually escape the loop do
+not need to be allocated on the heap at all but can be exploded into their
+respective fields.  This is very helpful for dynamic languages where primitive
+types are often boxed, as the constant allocation of intermediate results is
+very costly.
+
+\textbf{Optimizing Frame Objects:} One problem with the removal of allocations
+is that many dynamic languages are so reflective that they allow the
+introspection of the frame object that the interpreter uses to store local
+variables (e.g. SmallTalk, Python). This means that intermediate results always
+escape because they are stored into the frame object, rendering the allocation
+removal optimization ineffective. To remedy this problem we make it possible to
+update the frame object lazily only when it is actually accessed from outside of
+the code generated by the JIT.
+
+\section{Evaluation}
+
+\label{sect:evaluation}
+%- benchmarks
+%    - running example
+%    - gameboy?
+
+\section{Related Work}
+
+% dynamorio stuff
+% partial evaluation
+% XXX
+
+\section{Conclusion and Next Steps}
+
+%\begin{verbatim}
+%- next steps:
+%  - Apply to other things, like smalltalk
+%- conclusions
+% - advantages + disadvantages in the meta-level approach
+% - advantages are that the complex operations that occur in dynamic languages
+%   are accessible to the tracer
+\cite{bolz_back_2008}
+\cite{Psyco}
+
+\bigskip
+
+\bibliographystyle{abbrv}
+\bibliography{paper}
+
+\end{document}



More information about the Pypy-commit mailing list