[pypy-commit] pypy disable_pythonapi: merge default into branch

mattip noreply at buildbot.pypy.org
Mon Jun 23 21:27:07 CEST 2014


Author: mattip <matti.picus at gmail.com>
Branch: disable_pythonapi
Changeset: r72180:83ee399f97bb
Date: 2014-06-23 22:25 +0300
http://bitbucket.org/pypy/pypy/changeset/83ee399f97bb/

Log:	merge default into branch

diff too long, truncating to 2000 out of 2132 lines

diff --git a/pypy/doc/extradoc.rst b/pypy/doc/extradoc.rst
--- a/pypy/doc/extradoc.rst
+++ b/pypy/doc/extradoc.rst
@@ -8,6 +8,9 @@
 *Articles about PyPy published so far, most recent first:* (bibtex_ file)
 
 
+* `A Way Forward in Parallelising Dynamic Languages`_,
+  R. Meier, A. Rigo
+
 * `Runtime Feedback in a Meta-Tracing JIT for Efficient Dynamic Languages`_,
   C.F. Bolz, A. Cuni, M. Fijalkowski, M. Leuschel, S. Pedroni, A. Rigo
 
@@ -71,6 +74,7 @@
 
 
 .. _bibtex: https://bitbucket.org/pypy/extradoc/raw/tip/talk/bibtex.bib
+.. _`A Way Forward in Parallelising Dynamic Languages`: https://bitbucket.org/pypy/extradoc/raw/extradoc/talk/icooolps2014/position-paper.pdf
 .. _`Runtime Feedback in a Meta-Tracing JIT for Efficient Dynamic Languages`: https://bitbucket.org/pypy/extradoc/raw/extradoc/talk/icooolps2011/jit-hints.pdf
 .. _`Allocation Removal by Partial Evaluation in a Tracing JIT`: https://bitbucket.org/pypy/extradoc/raw/extradoc/talk/pepm2011/bolz-allocation-removal.pdf
 .. _`Towards a Jitting VM for Prolog Execution`: http://www.stups.uni-duesseldorf.de/mediawiki/images/a/a7/Pub-BoLeSch2010.pdf
@@ -93,6 +97,11 @@
 Talks and Presentations 
 ----------------------------------
 
+*This part is no longer updated.*  The complete list is here__ (in
+alphabetical order).
+
+.. __: https://bitbucket.org/pypy/extradoc/src/extradoc/talk/
+
 Talks in 2010
 +++++++++++++
 
diff --git a/pypy/doc/release-pypy3-2.3.1.rst b/pypy/doc/release-pypy3-2.3.1.rst
new file mode 100644
--- /dev/null
+++ b/pypy/doc/release-pypy3-2.3.1.rst
@@ -0,0 +1,69 @@
+=====================
+PyPy3 2.3.1 - Fulcrum
+=====================
+
+We're pleased to announce the first stable release of PyPy3. PyPy3
+targets Python 3 (3.2.5) compatibility.
+
+We would like to thank all of the people who donated_ to the `py3k proposal`_
+for supporting the work that went into this.
+
+You can download the PyPy3 2.3.1 release here:
+
+    http://pypy.org/download.html#pypy3-2-3-1
+
+Highlights
+==========
+
+* The first stable release of PyPy3: support for Python 3!
+
+* The stdlib has been updated to Python 3.2.5
+
+* Additional support for the u'unicode' syntax (`PEP 414`_) from Python 3.3
+
+* Updates from the default branch, such as incremental GC and various JIT
+  improvements
+
+* Resolved some notable JIT performance regressions from PyPy2:
+
+ - Re-enabled the previously disabled collection (list/dict/set) strategies
+
+ - Resolved performance of iteration over range objects
+
+ - Resolved handling of Python 3's exception __context__ unnecessarily forcing
+   frame object overhead
+
+.. _`PEP 414`: http://legacy.python.org/dev/peps/pep-0414/
+
+What is PyPy?
+==============
+
+PyPy is a very compliant Python interpreter, almost a drop-in replacement for
+CPython 2.7.6 or 3.2.5. It's fast due to its integrated tracing JIT compiler.
+
+This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows,
+and OpenBSD,
+as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux.
+
+While we support 32 bit python on Windows, work on the native Windows 64
+bit python is still stalling, we would welcome a volunteer
+to `handle that`_.
+
+.. _`handle that`: http://doc.pypy.org/en/latest/windows.html#what-is-missing-for-a-full-64-bit-translation
+
+How to use PyPy?
+=================
+
+We suggest using PyPy from a `virtualenv`_. Once you have a virtualenv
+installed, you can follow instructions from `pypy documentation`_ on how
+to proceed. This document also covers other `installation schemes`_.
+
+.. _donated: http://morepypy.blogspot.com/2012/01/py3k-and-numpy-first-stage-thanks-to.html
+.. _`py3k proposal`: http://pypy.org/py3donate.html
+.. _`pypy documentation`: http://doc.pypy.org/en/latest/getting-started.html#installing-using-virtualenv
+.. _`virtualenv`: http://www.virtualenv.org/en/latest/
+.. _`installation schemes`: http://doc.pypy.org/en/latest/getting-started.html#installing-pypy
+
+
+Cheers,
+the PyPy team
diff --git a/pypy/doc/whatsnew-head.rst b/pypy/doc/whatsnew-head.rst
--- a/pypy/doc/whatsnew-head.rst
+++ b/pypy/doc/whatsnew-head.rst
@@ -22,3 +22,11 @@
 conditional_calls). I would expect the net result to be a slight
 slow-down on some simple benchmarks and a speed-up on bigger
 programs.
+
+.. branch: ec-threadlocal
+Change the executioncontext's lookup to be done by reading a thread-
+local variable (which is implemented in C using '__thread' if
+possible, and pthread_getspecific() otherwise). On Linux x86 and
+x86-64, the JIT backend has a special optimization that lets it emit
+directly a single MOV from a %gs- or %fs-based address. It seems
+actually to give a good boost in performance.
diff --git a/pypy/doc/whatsnew-pypy3-2.3.1.rst b/pypy/doc/whatsnew-pypy3-2.3.1.rst
new file mode 100644
--- /dev/null
+++ b/pypy/doc/whatsnew-pypy3-2.3.1.rst
@@ -0,0 +1,6 @@
+=========================
+What's new in PyPy3 2.3.1
+=========================
+
+.. this is a revision shortly after pypy3-release-2.3.x
+.. startrev: 0137d8e6657d
diff --git a/pypy/goal/targetpypystandalone.py b/pypy/goal/targetpypystandalone.py
--- a/pypy/goal/targetpypystandalone.py
+++ b/pypy/goal/targetpypystandalone.py
@@ -30,8 +30,6 @@
     if w_dict is not None: # for tests
         w_entry_point = space.getitem(w_dict, space.wrap('entry_point'))
         w_run_toplevel = space.getitem(w_dict, space.wrap('run_toplevel'))
-        w_call_finish_gateway = space.wrap(gateway.interp2app(call_finish))
-        w_call_startup_gateway = space.wrap(gateway.interp2app(call_startup))
         withjit = space.config.objspace.usemodules.pypyjit
 
     def entry_point(argv):
@@ -53,7 +51,7 @@
             argv = argv[:1] + argv[3:]
         try:
             try:
-                space.call_function(w_run_toplevel, w_call_startup_gateway)
+                space.startup()
                 w_executable = space.wrap(argv[0])
                 w_argv = space.newlist([space.wrap(s) for s in argv[1:]])
                 w_exitcode = space.call_function(w_entry_point, w_executable, w_argv)
@@ -69,7 +67,7 @@
                 return 1
         finally:
             try:
-                space.call_function(w_run_toplevel, w_call_finish_gateway)
+                space.finish()
             except OperationError, e:
                 debug("OperationError:")
                 debug(" operror-type: " + e.w_type.getname(space))
@@ -184,11 +182,6 @@
                          'pypy_thread_attach': pypy_thread_attach,
                          'pypy_setup_home': pypy_setup_home}
 
-def call_finish(space):
-    space.finish()
-
-def call_startup(space):
-    space.startup()
 
 # _____ Define and setup target ___
 
diff --git a/pypy/interpreter/baseobjspace.py b/pypy/interpreter/baseobjspace.py
--- a/pypy/interpreter/baseobjspace.py
+++ b/pypy/interpreter/baseobjspace.py
@@ -395,6 +395,7 @@
 
     def startup(self):
         # To be called before using the space
+        self.threadlocals.enter_thread(self)
 
         # Initialize already imported builtin modules
         from pypy.interpreter.module import Module
@@ -639,30 +640,33 @@
         """NOT_RPYTHON: Abstract method that should put some minimal
         content into the w_builtins."""
 
-    @jit.loop_invariant
     def getexecutioncontext(self):
         "Return what we consider to be the active execution context."
         # Important: the annotator must not see a prebuilt ExecutionContext:
         # you should not see frames while you translate
         # so we make sure that the threadlocals never *have* an
         # ExecutionContext during translation.
-        if self.config.translating and not we_are_translated():
-            assert self.threadlocals.getvalue() is None, (
-                "threadlocals got an ExecutionContext during translation!")
-            try:
-                return self._ec_during_translation
-            except AttributeError:
-                ec = self.createexecutioncontext()
-                self._ec_during_translation = ec
+        if not we_are_translated():
+            if self.config.translating:
+                assert self.threadlocals.get_ec() is None, (
+                    "threadlocals got an ExecutionContext during translation!")
+                try:
+                    return self._ec_during_translation
+                except AttributeError:
+                    ec = self.createexecutioncontext()
+                    self._ec_during_translation = ec
+                    return ec
+            else:
+                ec = self.threadlocals.get_ec()
+                if ec is None:
+                    self.threadlocals.enter_thread(self)
+                    ec = self.threadlocals.get_ec()
                 return ec
-        # normal case follows.  The 'thread' module installs a real
-        # thread-local object in self.threadlocals, so this builds
-        # and caches a new ec in each thread.
-        ec = self.threadlocals.getvalue()
-        if ec is None:
-            ec = self.createexecutioncontext()
-            self.threadlocals.setvalue(ec)
-        return ec
+        else:
+            # translated case follows.  self.threadlocals is either from
+            # 'pypy.interpreter.miscutils' or 'pypy.module.thread.threadlocals'.
+            # the result is assumed to be non-null: enter_thread() was called.
+            return self.threadlocals.get_ec()
 
     def _freeze_(self):
         return True
diff --git a/pypy/interpreter/miscutils.py b/pypy/interpreter/miscutils.py
--- a/pypy/interpreter/miscutils.py
+++ b/pypy/interpreter/miscutils.py
@@ -11,11 +11,11 @@
     """
     _value = None
 
-    def getvalue(self):
+    def get_ec(self):
         return self._value
 
-    def setvalue(self, value):
-        self._value = value
+    def enter_thread(self, space):
+        self._value = space.createexecutioncontext()
 
     def signals_enabled(self):
         return True
diff --git a/pypy/module/_rawffi/interp_rawffi.py b/pypy/module/_rawffi/interp_rawffi.py
--- a/pypy/module/_rawffi/interp_rawffi.py
+++ b/pypy/module/_rawffi/interp_rawffi.py
@@ -508,7 +508,10 @@
     argshapes = unpack_argshapes(space, w_args)
     resshape = unpack_resshape(space, w_res)
     ffi_args = [shape.get_basic_ffi_type() for shape in argshapes]
-    ffi_res = resshape.get_basic_ffi_type()
+    if resshape is not None:
+        ffi_res = resshape.get_basic_ffi_type()
+    else:
+        ffi_res = ffi_type_void
     try:
         ptr = RawFuncPtr('???', ffi_args, ffi_res, rffi.cast(rffi.VOIDP, addr),
                          flags)
diff --git a/pypy/module/_rawffi/test/test__rawffi.py b/pypy/module/_rawffi/test/test__rawffi.py
--- a/pypy/module/_rawffi/test/test__rawffi.py
+++ b/pypy/module/_rawffi/test/test__rawffi.py
@@ -353,6 +353,11 @@
         assert ptr[0] == rawcall.buffer
         ptr.free()
 
+    def test_raw_callable_returning_void(self):
+        import _rawffi
+        _rawffi.FuncPtr(0, [], None)
+        # assert did not crash
+
     def test_short_addition(self):
         import _rawffi
         lib = _rawffi.CDLL(self.lib_name)
diff --git a/pypy/module/pypyjit/test_pypy_c/test_call.py b/pypy/module/pypyjit/test_pypy_c/test_call.py
--- a/pypy/module/pypyjit/test_pypy_c/test_call.py
+++ b/pypy/module/pypyjit/test_pypy_c/test_call.py
@@ -71,13 +71,13 @@
                                     "getfield_gc", "guard_value",
                                     "guard_not_invalidated"]
         ops = entry_bridge.ops_by_id('add', opcode='LOAD_GLOBAL')
-        assert log.opnames(ops) == ["guard_not_invalidated"]
+        assert log.opnames(ops) == []
         #
         ops = entry_bridge.ops_by_id('call', opcode='LOAD_GLOBAL')
         assert log.opnames(ops) == []
         #
         assert entry_bridge.match_by_id('call', """
-            p38 = call(ConstClass(getexecutioncontext), descr=<Callr . EF=1>)
+            p38 = call(ConstClass(_ll_0_threadlocalref_getter___), descr=<Callr . EF=1 OS=5>)
             p39 = getfield_gc(p38, descr=<FieldP pypy.interpreter.executioncontext.ExecutionContext.inst_topframeref .*>)
             i40 = force_token()
             p41 = getfield_gc(p38, descr=<FieldP pypy.interpreter.executioncontext.ExecutionContext.inst_w_tracefunc .*>)
@@ -435,7 +435,7 @@
             p26 = getfield_gc(p7, descr=<FieldP pypy.objspace.std.dictmultiobject.W_DictMultiObject.inst_strategy .*>)
             guard_value(p26, ConstPtr(ptr27), descr=...)
             guard_not_invalidated(descr=...)
-            p29 = call(ConstClass(getexecutioncontext), descr=<Callr . EF=1>)
+            p29 = call(ConstClass(_ll_0_threadlocalref_getter___), descr=<Callr . EF=1 OS=5>)
             p30 = getfield_gc(p29, descr=<FieldP pypy.interpreter.executioncontext.ExecutionContext.inst_topframeref .*>)
             p31 = force_token()
             p32 = getfield_gc(p29, descr=<FieldP pypy.interpreter.executioncontext.ExecutionContext.inst_w_tracefunc .*>)
@@ -448,7 +448,6 @@
             i39 = getfield_gc_pure(p37, descr=<FieldS pypy.objspace.std.intobject.W_IntObject.inst_intval .*>)
             i40 = int_add_ovf(i22, i39)
             guard_no_overflow(descr=...)
-            guard_not_invalidated(descr=...)
             --TICK--
         """)
 
diff --git a/pypy/module/pypyjit/test_pypy_c/test_string.py b/pypy/module/pypyjit/test_pypy_c/test_string.py
--- a/pypy/module/pypyjit/test_pypy_c/test_string.py
+++ b/pypy/module/pypyjit/test_pypy_c/test_string.py
@@ -101,64 +101,38 @@
         log = self.run(main, [1000])
         assert log.result == main(1000)
         loop, = log.loops_by_filename(self.filepath)
-        # NB: since the stringbuilder2-perf branch we get more operations than
-        # before, but a lot less branches that might fail randomly.
         assert loop.match("""
-            i100 = int_gt(i95, 0)
-            guard_true(i100, descr=...)
+            i79 = int_gt(i74, 0)
+            guard_true(i79, descr=...)
             guard_not_invalidated(descr=...)
-            p101 = call(ConstClass(ll_int2dec__Signed), i95, descr=<Callr . i EF=3>)
+            p80 = call(ConstClass(ll_int2dec__Signed), i74, descr=<Callr . i EF=3>)
             guard_no_exception(descr=...)
-            i102 = strlen(p101)
-            i103 = int_is_true(i102)
-            guard_true(i103, descr=...)
-            i104 = strgetitem(p101, 0)
-            i105 = int_eq(i104, 45)
-            guard_false(i105, descr=...)
-            i106 = int_neg(i102)
-            i107 = int_gt(i102, 23)
-            p108 = new(descr=<SizeDescr .+>)
-            p110 = newstr(23)
+            i85 = strlen(p80)
+            p86 = new(descr=<SizeDescr .+>)
+            p88 = newstr(23)
             setfield_gc(..., descr=<Field. stringbuilder.+>)
             setfield_gc(..., descr=<Field. stringbuilder.+>)
             setfield_gc(..., descr=<Field. stringbuilder.+>)
-            cond_call(i107, ConstClass(stringbuilder_append_overflow__stringbuilderPtr_rpy_stringPtr_Signed), p108, p101, i102, descr=<Callv 0 rri EF=4>)
+            call(ConstClass(ll_append_res0__stringbuilderPtr_rpy_stringPtr), p86, p80, descr=<Callv 0 rr EF=4>)
             guard_no_exception(descr=...)
-            i111 = getfield_gc(p108, descr=<FieldS stringbuilder.skip .+>)
-            i112 = int_sub(i102, i111)
-            i113 = getfield_gc(p108, descr=<FieldS stringbuilder.current_pos .+>)
-            p114 = getfield_gc(p108, descr=<FieldP stringbuilder.current_buf .+>)
-            copystrcontent(p101, p114, i111, i113, i112)
-            i115 = int_add(i113, i112)
-            i116 = getfield_gc(p108, descr=<FieldS stringbuilder.current_end .+>)
-            setfield_gc(p108, i115, descr=<FieldS stringbuilder.current_pos .+>)
-            i117 = int_eq(i115, i116)
-            cond_call(i117, ConstClass(stringbuilder_grow__stringbuilderPtr_Signed), p108, 1, descr=<Callv 0 ri EF=4>)
+            i89 = getfield_gc(p86, descr=<FieldS stringbuilder.current_pos .+>)
+            i90 = getfield_gc(p86, descr=<FieldS stringbuilder.current_end .+>)
+            i91 = int_eq(i89, i90)
+            cond_call(i91, ConstClass(ll_grow_by__stringbuilderPtr_Signed), p86, 1, descr=<Callv 0 ri EF=4>)
             guard_no_exception(descr=...)
-            i118 = getfield_gc(p108, descr=<FieldS stringbuilder.current_pos .+>)
-            i119 = int_add(i118, 1)
-            p120 = getfield_gc(p108, descr=<FieldP stringbuilder.current_buf .+>)
-            strsetitem(p120, i118, 32)
-            i121 = getfield_gc(p108, descr=<FieldS stringbuilder.current_end .+>)
-            i122 = int_sub(i121, i119)
-            setfield_gc(..., descr=<FieldS stringbuilder.+>)
-            setfield_gc(..., descr=<FieldS stringbuilder.+>)
-            i123 = int_gt(i102, i122)
-            cond_call(i123, ConstClass(stringbuilder_append_overflow__stringbuilderPtr_rpy_stringPtr_Signed), p108, p101, i102, descr=<Callv 0 rri EF=4>)
+            i92 = getfield_gc(p86, descr=<FieldS stringbuilder.current_pos .+>)
+            i93 = int_add(i92, 1)
+            p94 = getfield_gc(p86, descr=<FieldP stringbuilder.current_buf .+>)
+            strsetitem(p94, i92, 32)
+            setfield_gc(p86, i93, descr=<FieldS stringbuilder.current_pos .+>)
+            call(ConstClass(ll_append_res0__stringbuilderPtr_rpy_stringPtr), p86, p80, descr=<Callv 0 rr EF=4>)
             guard_no_exception(descr=...)
-            i124 = getfield_gc(p108, descr=<FieldS stringbuilder.skip .+>)
-            i125 = int_sub(i102, i124)
-            i126 = getfield_gc(p108, descr=<FieldS stringbuilder.current_pos .+>)
-            p127 = getfield_gc(p108, descr=<FieldP stringbuilder.current_buf .+>)
-            copystrcontent(p101, p127, i124, i126, i125)
-            i128 = int_add(i126, i125)
-            setfield_gc(p108, i128, descr=<FieldS stringbuilder.current_pos .+>)
-            p135 = call(..., descr=<Callr . r EF=4)     # ll_build
+            p95 = call(..., descr=<Callr . r EF=4>)     # ll_build
             guard_no_exception(descr=...)
-            i136 = strlen(p135)
-            i137 = int_add_ovf(i92, i136)
+            i96 = strlen(p95)
+            i97 = int_add_ovf(i71, i96)
             guard_no_overflow(descr=...)
-            i138 = int_sub(i95, 1)
+            i98 = int_sub(i74, 1)
             --TICK--
             jump(..., descr=...)
         """)
diff --git a/pypy/module/thread/__init__.py b/pypy/module/thread/__init__.py
--- a/pypy/module/thread/__init__.py
+++ b/pypy/module/thread/__init__.py
@@ -26,10 +26,11 @@
         "NOT_RPYTHON: patches space.threadlocals to use real threadlocals"
         from pypy.module.thread import gil
         MixedModule.__init__(self, space, *args)
-        prev = space.threadlocals.getvalue()
+        prev_ec = space.threadlocals.get_ec()
         space.threadlocals = gil.GILThreadLocals()
         space.threadlocals.initialize(space)
-        space.threadlocals.setvalue(prev)
+        if prev_ec is not None:
+            space.threadlocals._set_ec(prev_ec)
 
         from pypy.module.posix.interp_posix import add_fork_hook
         from pypy.module.thread.os_thread import reinit_threads
diff --git a/pypy/module/thread/os_thread.py b/pypy/module/thread/os_thread.py
--- a/pypy/module/thread/os_thread.py
+++ b/pypy/module/thread/os_thread.py
@@ -126,6 +126,8 @@
     release = staticmethod(release)
 
     def run(space, w_callable, args):
+        # add the ExecutionContext to space.threadlocals
+        space.threadlocals.enter_thread(space)
         try:
             space.call_args(w_callable, args)
         except OperationError, e:
diff --git a/pypy/module/thread/test/test_gil.py b/pypy/module/thread/test/test_gil.py
--- a/pypy/module/thread/test/test_gil.py
+++ b/pypy/module/thread/test/test_gil.py
@@ -64,13 +64,14 @@
             except Exception, e:
                 assert 0
             thread.gc_thread_die()
+        my_gil_threadlocals = gil.GILThreadLocals()
         def f():
             state.data = []
             state.datalen1 = 0
             state.datalen2 = 0
             state.datalen3 = 0
             state.datalen4 = 0
-            state.threadlocals = gil.GILThreadLocals()
+            state.threadlocals = my_gil_threadlocals
             state.threadlocals.setup_threads(space)
             subident = thread.start_new_thread(bootstrap, ())
             mainident = thread.get_ident()
diff --git a/pypy/module/thread/threadlocals.py b/pypy/module/thread/threadlocals.py
--- a/pypy/module/thread/threadlocals.py
+++ b/pypy/module/thread/threadlocals.py
@@ -1,4 +1,5 @@
 from rpython.rlib import rthread
+from rpython.rlib.objectmodel import we_are_translated
 from pypy.module.thread.error import wrap_thread_error
 from pypy.interpreter.executioncontext import ExecutionContext
 
@@ -13,53 +14,62 @@
     os_thread.bootstrap()."""
 
     def __init__(self):
+        "NOT_RPYTHON"
         self._valuedict = {}   # {thread_ident: ExecutionContext()}
         self._cleanup_()
+        self.raw_thread_local = rthread.ThreadLocalReference(ExecutionContext)
 
     def _cleanup_(self):
         self._valuedict.clear()
         self._mainthreadident = 0
-        self._mostrecentkey = 0        # fast minicaching for the common case
-        self._mostrecentvalue = None   # fast minicaching for the common case
 
-    def getvalue(self):
+    def enter_thread(self, space):
+        "Notification that the current thread is about to start running."
+        self._set_ec(space.createexecutioncontext())
+
+    def _set_ec(self, ec):
         ident = rthread.get_ident()
-        if ident == self._mostrecentkey:
-            result = self._mostrecentvalue
-        else:
-            value = self._valuedict.get(ident, None)
-            # slow path: update the minicache
-            self._mostrecentkey = ident
-            self._mostrecentvalue = value
-            result = value
-        return result
+        if self._mainthreadident == 0 or self._mainthreadident == ident:
+            ec._signals_enabled = 1    # the main thread is enabled
+            self._mainthreadident = ident
+        self._valuedict[ident] = ec
+        # This logic relies on hacks and _make_sure_does_not_move().
+        # It only works because we keep the 'ec' alive in '_valuedict' too.
+        self.raw_thread_local.set(ec)
 
-    def setvalue(self, value):
-        ident = rthread.get_ident()
-        if value is not None:
-            if self._mainthreadident == 0:
-                value._signals_enabled = 1    # the main thread is enabled
-                self._mainthreadident = ident
-            self._valuedict[ident] = value
-        else:
+    def leave_thread(self, space):
+        "Notification that the current thread is about to stop."
+        from pypy.module.thread.os_local import thread_is_stopping
+        ec = self.get_ec()
+        if ec is not None:
             try:
-                del self._valuedict[ident]
-            except KeyError:
-                pass
-        # update the minicache to prevent it from containing an outdated value
-        self._mostrecentkey = ident
-        self._mostrecentvalue = value
+                thread_is_stopping(ec)
+            finally:
+                self.raw_thread_local.set(None)
+                ident = rthread.get_ident()
+                try:
+                    del self._valuedict[ident]
+                except KeyError:
+                    pass
+
+    def get_ec(self):
+        ec = self.raw_thread_local.get()
+        if not we_are_translated():
+            assert ec is self._valuedict.get(rthread.get_ident(), None)
+        return ec
 
     def signals_enabled(self):
-        ec = self.getvalue()
+        ec = self.get_ec()
         return ec is not None and ec._signals_enabled
 
     def enable_signals(self, space):
-        ec = self.getvalue()
+        ec = self.get_ec()
+        assert ec is not None
         ec._signals_enabled += 1
 
     def disable_signals(self, space):
-        ec = self.getvalue()
+        ec = self.get_ec()
+        assert ec is not None
         new = ec._signals_enabled - 1
         if new < 0:
             raise wrap_thread_error(space,
@@ -69,22 +79,15 @@
     def getallvalues(self):
         return self._valuedict
 
-    def leave_thread(self, space):
-        "Notification that the current thread is about to stop."
-        from pypy.module.thread.os_local import thread_is_stopping
-        ec = self.getvalue()
-        if ec is not None:
-            try:
-                thread_is_stopping(ec)
-            finally:
-                self.setvalue(None)
-
     def reinit_threads(self, space):
         "Called in the child process after a fork()"
         ident = rthread.get_ident()
-        ec = self.getvalue()
+        ec = self.get_ec()
+        assert ec is not None
+        old_sig = ec._signals_enabled
         if ident != self._mainthreadident:
-            ec._signals_enabled += 1
+            old_sig += 1
         self._cleanup_()
         self._mainthreadident = ident
-        self.setvalue(ec)
+        self._set_ec(ec)
+        ec._signals_enabled = old_sig
diff --git a/pypy/objspace/fake/objspace.py b/pypy/objspace/fake/objspace.py
--- a/pypy/objspace/fake/objspace.py
+++ b/pypy/objspace/fake/objspace.py
@@ -314,6 +314,9 @@
         t = TranslationContext(config=config)
         self.t = t     # for debugging
         ann = t.buildannotator()
+        def _do_startup():
+            self.threadlocals.enter_thread(self)
+        ann.build_types(_do_startup, [], complete_now=False)
         if func is not None:
             ann.build_types(func, argtypes, complete_now=False)
         if seeobj_w:
diff --git a/pypy/objspace/std/formatting.py b/pypy/objspace/std/formatting.py
--- a/pypy/objspace/std/formatting.py
+++ b/pypy/objspace/std/formatting.py
@@ -379,6 +379,19 @@
         std_wp._annspecialcase_ = 'specialize:argtype(1)'
 
         def std_wp_number(self, r, prefix=''):
+            result = self.result
+            if len(prefix) == 0 and len(r) >= self.width:
+                # this is strictly a fast path: no prefix, and no padding
+                # needed.  It is more efficient code both in the non-jit
+                # case (less testing stuff) and in the jit case (uses only
+                # result.append(), and no startswith() if not f_sign and
+                # not f_blank).
+                if self.f_sign and not r.startswith('-'):
+                    result.append(const('+'))
+                elif self.f_blank and not r.startswith('-'):
+                    result.append(const(' '))
+                result.append(const(r))
+                return
             # add a '+' or ' ' sign if necessary
             sign = r.startswith('-')
             if not sign:
@@ -391,7 +404,6 @@
             # do the padding requested by self.width and the flags,
             # without building yet another RPython string but directly
             # by pushing the pad character into self.result
-            result = self.result
             padding = self.width - len(r) - len(prefix)
             if padding <= 0:
                 padding = 0
diff --git a/rpython/config/translationoption.py b/rpython/config/translationoption.py
--- a/rpython/config/translationoption.py
+++ b/rpython/config/translationoption.py
@@ -22,6 +22,12 @@
 
 IS_64_BITS = sys.maxint > 2147483647
 
+SUPPORT__THREAD = (    # whether the particular C compiler supports __thread
+    sys.platform.startswith("linux"))     # Linux works
+    # OS/X doesn't work, because we still target 10.5/10.6 and the
+    # minimum required version is 10.7.  Windows doesn't work.  Please
+    # add other platforms here if it works on them.
+
 MAINDIR = os.path.dirname(os.path.dirname(__file__))
 CACHE_DIR = os.path.realpath(os.path.join(MAINDIR, '_cache'))
 
@@ -156,7 +162,8 @@
     # portability options
     BoolOption("no__thread",
                "don't use __thread for implementing TLS",
-               default=False, cmdline="--no__thread", negation=False),
+               default=not SUPPORT__THREAD, cmdline="--no__thread",
+               negation=False),
     IntOption("make_jobs", "Specify -j argument to make for compilation"
               " (C backend only)",
               cmdline="--make-jobs", default=detect_number_of_processors()),
diff --git a/rpython/jit/backend/llsupport/test/ztranslation_test.py b/rpython/jit/backend/llsupport/test/ztranslation_test.py
--- a/rpython/jit/backend/llsupport/test/ztranslation_test.py
+++ b/rpython/jit/backend/llsupport/test/ztranslation_test.py
@@ -4,6 +4,8 @@
 from rpython.rlib.jit import PARAMETERS, dont_look_inside
 from rpython.rlib.jit import promote
 from rpython.rlib import jit_hooks
+from rpython.rlib.objectmodel import keepalive_until_here
+from rpython.rlib.rthread import ThreadLocalReference
 from rpython.jit.backend.detect_cpu import getcpuclass
 from rpython.jit.backend.test.support import CCompiledMixin
 from rpython.jit.codewriter.policy import StopAtXPolicy
@@ -21,6 +23,7 @@
         # - profiler
         # - full optimizer
         # - floats neg and abs
+        # - threadlocalref_get
 
         class Frame(object):
             _virtualizable_ = ['i']
@@ -28,6 +31,10 @@
             def __init__(self, i):
                 self.i = i
 
+        class Foo(object):
+            pass
+        t = ThreadLocalReference(Foo)
+
         @dont_look_inside
         def myabs(x):
             return abs(x)
@@ -56,6 +63,7 @@
                 k = myabs(j)
                 if k - abs(j):  raise ValueError
                 if k - abs(-j): raise ValueError
+                if t.get().nine != 9: raise ValueError
             return chr(total % 253)
         #
         from rpython.rtyper.lltypesystem import lltype, rffi
@@ -78,8 +86,12 @@
             return res
         #
         def main(i, j):
+            foo = Foo()
+            foo.nine = -(i + j)
+            t.set(foo)
             a_char = f(i, j)
             a_float = libffi_stuff(i, j)
+            keepalive_until_here(foo)
             return ord(a_char) * 10 + int(a_float)
         expected = main(40, -49)
         res = self.meta_interp(main, [40, -49])
diff --git a/rpython/jit/backend/x86/assembler.py b/rpython/jit/backend/x86/assembler.py
--- a/rpython/jit/backend/x86/assembler.py
+++ b/rpython/jit/backend/x86/assembler.py
@@ -2351,10 +2351,29 @@
         assert isinstance(reg, RegLoc)
         self.mc.MOV_rr(reg.value, ebp.value)
 
+    def threadlocalref_get(self, op, resloc):
+        # this function is only called on Linux
+        from rpython.jit.codewriter.jitcode import ThreadLocalRefDescr
+        from rpython.jit.backend.x86 import stmtlocal
+        assert isinstance(resloc, RegLoc)
+        effectinfo = op.getdescr().get_extra_info()
+        assert effectinfo.extradescrs is not None
+        ed = effectinfo.extradescrs[0]
+        assert isinstance(ed, ThreadLocalRefDescr)
+        addr1 = rffi.cast(lltype.Signed, ed.get_tlref_addr())
+        addr0 = stmtlocal.threadlocal_base()
+        addr = addr1 - addr0
+        assert rx86.fits_in_32bits(addr)
+        mc = self.mc
+        mc.writechar(stmtlocal.SEGMENT_TL)     # prefix
+        mc.MOV_rj(resloc.value, addr)
+
+
 genop_discard_list = [Assembler386.not_implemented_op_discard] * rop._LAST
 genop_list = [Assembler386.not_implemented_op] * rop._LAST
 genop_llong_list = {}
 genop_math_list = {}
+genop_tlref_list = {}
 genop_guard_list = [Assembler386.not_implemented_op_guard] * rop._LAST
 
 for name, value in Assembler386.__dict__.iteritems():
diff --git a/rpython/jit/backend/x86/regalloc.py b/rpython/jit/backend/x86/regalloc.py
--- a/rpython/jit/backend/x86/regalloc.py
+++ b/rpython/jit/backend/x86/regalloc.py
@@ -2,7 +2,7 @@
 """ Register allocation scheme.
 """
 
-import os
+import os, sys
 from rpython.jit.backend.llsupport import symbolic
 from rpython.jit.backend.llsupport.descr import (ArrayDescr, CallDescr,
     unpack_arraydescr, unpack_fielddescr, unpack_interiorfielddescr)
@@ -692,6 +692,15 @@
         loc0 = self.xrm.force_result_in_reg(op.result, op.getarg(1))
         self.perform_math(op, [loc0], loc0)
 
+    TLREF_SUPPORT = sys.platform.startswith('linux')
+
+    def _consider_threadlocalref_get(self, op):
+        if self.TLREF_SUPPORT:
+            resloc = self.force_allocate_reg(op.result)
+            self.assembler.threadlocalref_get(op, resloc)
+        else:
+            self._consider_call(op)
+
     def _call(self, op, arglocs, force_store=[], guard_not_forced_op=None):
         # we need to save registers on the stack:
         #
@@ -769,6 +778,8 @@
                         return
             if oopspecindex == EffectInfo.OS_MATH_SQRT:
                 return self._consider_math_sqrt(op)
+            if oopspecindex == EffectInfo.OS_THREADLOCALREF_GET:
+                return self._consider_threadlocalref_get(op)
         self._consider_call(op)
 
     def consider_call_may_force(self, op, guard_op):
diff --git a/rpython/jit/backend/x86/stmtlocal.py b/rpython/jit/backend/x86/stmtlocal.py
new file mode 100644
--- /dev/null
+++ b/rpython/jit/backend/x86/stmtlocal.py
@@ -0,0 +1,32 @@
+from rpython.rtyper.lltypesystem import lltype, rffi
+from rpython.translator.tool.cbuild import ExternalCompilationInfo
+from rpython.jit.backend.x86.arch import WORD
+
+SEGMENT_FS = '\x64'
+SEGMENT_GS = '\x65'
+
+if WORD == 4:
+    SEGMENT_TL = SEGMENT_GS
+    _instruction = "movl %%gs:0, %0"
+else:
+    SEGMENT_TL = SEGMENT_FS
+    _instruction = "movq %%fs:0, %0"
+
+eci = ExternalCompilationInfo(post_include_bits=['''
+#define RPY_STM_JIT  1
+static long pypy__threadlocal_base(void)
+{
+    /* XXX ONLY LINUX WITH GCC/CLANG FOR NOW XXX */
+    long result;
+    asm("%s" : "=r"(result));
+    return result;
+}
+''' % _instruction])
+
+
+threadlocal_base = rffi.llexternal(
+    'pypy__threadlocal_base',
+    [], lltype.Signed,
+    compilation_info=eci,
+    _nowrapper=True,
+    ) #transactionsafe=True)
diff --git a/rpython/jit/codewriter/effectinfo.py b/rpython/jit/codewriter/effectinfo.py
--- a/rpython/jit/codewriter/effectinfo.py
+++ b/rpython/jit/codewriter/effectinfo.py
@@ -22,6 +22,7 @@
     OS_STR2UNICODE              = 2    # "str.str2unicode"
     OS_SHRINK_ARRAY             = 3    # rgc.ll_shrink_array
     OS_DICT_LOOKUP              = 4    # ll_dict_lookup
+    OS_THREADLOCALREF_GET       = 5    # llop.threadlocalref_get
     #
     OS_STR_CONCAT               = 22   # "stroruni.concat"
     OS_STR_SLICE                = 23   # "stroruni.slice"
diff --git a/rpython/jit/codewriter/jitcode.py b/rpython/jit/codewriter/jitcode.py
--- a/rpython/jit/codewriter/jitcode.py
+++ b/rpython/jit/codewriter/jitcode.py
@@ -117,6 +117,26 @@
         raise NotImplementedError
 
 
+class ThreadLocalRefDescr(AbstractDescr):
+    # A special descr used as the extradescr in a call to a
+    # threadlocalref_get function.  If the backend supports it,
+    # it can use this 'get_tlref_addr()' to get the address *in the
+    # current thread* of the thread-local variable.  If, on the current
+    # platform, the "__thread" variables are implemented as an offset
+    # from some base register (e.g. %fs on x86-64), then the backend will
+    # immediately substract the current value of the base register.
+    # This gives an offset from the base register, and this can be
+    # written down in an assembler instruction to load the "__thread"
+    # variable from anywhere.
+
+    def __init__(self, opaque_id):
+        from rpython.rtyper.lltypesystem.lloperation import llop
+        from rpython.rtyper.lltypesystem import llmemory
+        def get_tlref_addr():
+            return llop.threadlocalref_getaddr(llmemory.Address, opaque_id)
+        self.get_tlref_addr = get_tlref_addr
+
+
 class LiveVarsInfo(object):
     def __init__(self, live_i, live_r, live_f):
         self.live_i = live_i
diff --git a/rpython/jit/codewriter/jtransform.py b/rpython/jit/codewriter/jtransform.py
--- a/rpython/jit/codewriter/jtransform.py
+++ b/rpython/jit/codewriter/jtransform.py
@@ -390,11 +390,15 @@
         lst.append(v)
 
     def handle_residual_call(self, op, extraargs=[], may_call_jitcodes=False,
-                             oopspecindex=EffectInfo.OS_NONE):
+                             oopspecindex=EffectInfo.OS_NONE,
+                             extraeffect=None,
+                             extradescr=None):
         """A direct_call turns into the operation 'residual_call_xxx' if it
         is calling a function that we don't want to JIT.  The initial args
         of 'residual_call_xxx' are the function to call, and its calldescr."""
-        calldescr = self.callcontrol.getcalldescr(op, oopspecindex=oopspecindex)
+        calldescr = self.callcontrol.getcalldescr(op, oopspecindex=oopspecindex,
+                                                  extraeffect=extraeffect,
+                                                  extradescr=extradescr)
         op1 = self.rewrite_call(op, 'residual_call',
                                 [op.args[0]] + extraargs, calldescr=calldescr)
         if may_call_jitcodes or self.callcontrol.calldescr_canraise(calldescr):
@@ -1903,6 +1907,18 @@
                              None)
         return [op0, op1]
 
+    def rewrite_op_threadlocalref_get(self, op):
+        from rpython.jit.codewriter.jitcode import ThreadLocalRefDescr
+        opaqueid = op.args[0].value
+        op1 = self.prepare_builtin_call(op, 'threadlocalref_getter', [],
+                                        extra=(opaqueid,),
+                                        extrakey=opaqueid._obj)
+        extradescr = ThreadLocalRefDescr(opaqueid)
+        return self.handle_residual_call(op1,
+            oopspecindex=EffectInfo.OS_THREADLOCALREF_GET,
+            extraeffect=EffectInfo.EF_LOOPINVARIANT,
+            extradescr=[extradescr])
+
 # ____________________________________________________________
 
 class NotSupported(Exception):
diff --git a/rpython/jit/codewriter/support.py b/rpython/jit/codewriter/support.py
--- a/rpython/jit/codewriter/support.py
+++ b/rpython/jit/codewriter/support.py
@@ -712,6 +712,11 @@
     build_ll_1_raw_free_no_track_allocation = (
         build_raw_free_builder(track_allocation=False))
 
+    def build_ll_0_threadlocalref_getter(opaqueid):
+        def _ll_0_threadlocalref_getter():
+            return llop.threadlocalref_get(rclass.OBJECTPTR, opaqueid)
+        return _ll_0_threadlocalref_getter
+
     def _ll_1_weakref_create(obj):
         return llop.weakref_create(llmemory.WeakRefPtr, obj)
 
diff --git a/rpython/jit/codewriter/test/test_jtransform.py b/rpython/jit/codewriter/test/test_jtransform.py
--- a/rpython/jit/codewriter/test/test_jtransform.py
+++ b/rpython/jit/codewriter/test/test_jtransform.py
@@ -147,6 +147,7 @@
              EI.OS_UNIEQ_LENGTHOK:       ([PUNICODE, PUNICODE], INT),
              EI.OS_RAW_MALLOC_VARSIZE_CHAR: ([INT], ARRAYPTR),
              EI.OS_RAW_FREE:             ([ARRAYPTR], lltype.Void),
+             EI.OS_THREADLOCALREF_GET:   ([], rclass.OBJECTPTR),
             }
             argtypes = argtypes[oopspecindex]
             assert argtypes[0] == [v.concretetype for v in op.args[1:]]
@@ -157,6 +158,8 @@
                 assert extraeffect == EI.EF_CAN_RAISE
             elif oopspecindex == EI.OS_RAW_FREE:
                 assert extraeffect == EI.EF_CANNOT_RAISE
+            elif oopspecindex == EI.OS_THREADLOCALREF_GET:
+                assert extraeffect == EI.EF_LOOPINVARIANT
             else:
                 assert extraeffect == EI.EF_ELIDABLE_CANNOT_RAISE
         return 'calldescr-%d' % oopspecindex
@@ -1300,6 +1303,23 @@
     assert op1.result is None
     assert op2 is None
 
+def test_threadlocalref_get():
+    from rpython.rtyper.lltypesystem import rclass
+    from rpython.rlib.rthread import ThreadLocalReference
+    OS_THREADLOCALREF_GET = effectinfo.EffectInfo.OS_THREADLOCALREF_GET
+    class Foo: pass
+    t = ThreadLocalReference(Foo)
+    v2 = varoftype(rclass.OBJECTPTR)
+    c_opaqueid = const(t.opaque_id)
+    op = SpaceOperation('threadlocalref_get', [c_opaqueid], v2)
+    tr = Transformer(FakeCPU(), FakeBuiltinCallControl())
+    op0 = tr.rewrite_operation(op)
+    assert op0.opname == 'residual_call_r_r'
+    assert op0.args[0].value == 'threadlocalref_getter' # pseudo-function as str
+    assert op0.args[1] == ListOfKind("ref", [])
+    assert op0.args[2] == 'calldescr-%d' % OS_THREADLOCALREF_GET
+    assert op0.result == v2
+
 def test_unknown_operation():
     op = SpaceOperation('foobar', [], varoftype(lltype.Void))
     tr = Transformer()
diff --git a/rpython/jit/metainterp/test/test_string.py b/rpython/jit/metainterp/test/test_string.py
--- a/rpython/jit/metainterp/test/test_string.py
+++ b/rpython/jit/metainterp/test/test_string.py
@@ -688,7 +688,9 @@
             return n
         res = self.meta_interp(f, [10], backendopt=True)
         assert res == 0
-        self.check_resops(call=2)    # (ll_shrink_array) * 2 unroll
+        self.check_resops(call=6,    # (ll_append_res0, ll_append_0_2, ll_build)
+                                     # * 2 unroll
+                          cond_call=0)
 
     def test_stringbuilder_append_len2_2(self):
         jitdriver = JitDriver(reds=['n', 'str1'], greens=[])
@@ -708,7 +710,8 @@
             return n
         res = self.meta_interp(f, [10], backendopt=True)
         assert res == 0
-        self.check_resops(call=2)    # (ll_shrink_array) * 2 unroll
+        self.check_resops(call=4,    # (ll_append_res0, ll_build) * 2 unroll
+                          cond_call=0)
 
     def test_stringbuilder_append_slice_1(self):
         jitdriver = JitDriver(reds=['n'], greens=[])
@@ -724,8 +727,8 @@
             return n
         res = self.meta_interp(f, [10], backendopt=True)
         assert res == 0
-        self.check_resops(call=2,     # (ll_shrink_array) * 2 unroll
-                          copyunicodecontent=4)
+        self.check_resops(call=6, cond_call=0,
+                          copyunicodecontent=0)
 
     def test_stringbuilder_append_slice_2(self):
         jitdriver = JitDriver(reds=['n'], greens=[])
@@ -751,12 +754,14 @@
             while n > 0:
                 jitdriver.jit_merge_point(n=n)
                 sb = UnicodeBuilder()
-                sb.append_multiple_char(u"x", 3)
+                sb.append_multiple_char(u"x", 5)
                 s = sb.build()
-                if len(s) != 3: raise ValueError
+                if len(s) != 5: raise ValueError
                 if s[0] != u"x": raise ValueError
                 if s[1] != u"x": raise ValueError
                 if s[2] != u"x": raise ValueError
+                if s[3] != u"x": raise ValueError
+                if s[4] != u"x": raise ValueError
                 n -= 1
             return n
         res = self.meta_interp(f, [10], backendopt=True)
@@ -770,19 +775,17 @@
             while n > 0:
                 jitdriver.jit_merge_point(n=n)
                 sb = UnicodeBuilder()
-                sb.append_multiple_char(u"x", 5)
+                sb.append_multiple_char(u"x", 35)
                 s = sb.build()
-                if len(s) != 5: raise ValueError
-                if s[0] != u"x": raise ValueError
-                if s[1] != u"x": raise ValueError
-                if s[2] != u"x": raise ValueError
-                if s[3] != u"x": raise ValueError
-                if s[4] != u"x": raise ValueError
+                if len(s) != 35: raise ValueError
+                for c in s:
+                    if c != u"x":
+                        raise ValueError
                 n -= 1
             return n
         res = self.meta_interp(f, [10], backendopt=True)
         assert res == 0
-        self.check_resops(call=4)    # (append, build) * 2 unroll
+        self.check_resops(call=4)    # (_ll_append_multiple_char, build) * 2
 
     def test_stringbuilder_bug1(self):
         jitdriver = JitDriver(reds=['n', 's1'], greens=[])
diff --git a/rpython/jit/metainterp/test/test_threadlocal.py b/rpython/jit/metainterp/test/test_threadlocal.py
new file mode 100644
--- /dev/null
+++ b/rpython/jit/metainterp/test/test_threadlocal.py
@@ -0,0 +1,30 @@
+import py
+from rpython.jit.metainterp.test.support import LLJitMixin
+from rpython.rlib.rthread import ThreadLocalReference
+from rpython.rlib.jit import dont_look_inside
+
+
+class ThreadLocalTest(object):
+
+    def test_threadlocalref_get(self):
+        class Foo:
+            pass
+        t = ThreadLocalReference(Foo)
+        x = Foo()
+
+        @dont_look_inside
+        def setup():
+            t.set(x)
+
+        def f():
+            setup()
+            if t.get() is x:
+                return 42
+            return -666
+
+        res = self.interp_operations(f, [])
+        assert res == 42
+
+
+class TestLLtype(ThreadLocalTest, LLJitMixin):
+    pass
diff --git a/rpython/rlib/rfile.py b/rpython/rlib/rfile.py
--- a/rpython/rlib/rfile.py
+++ b/rpython/rlib/rfile.py
@@ -35,7 +35,7 @@
 FILE = lltype.Struct('FILE')  # opaque type maybe
 
 c_open = llexternal('fopen', [rffi.CCHARP, rffi.CCHARP], lltype.Ptr(FILE))
-c_close = llexternal('fclose', [lltype.Ptr(FILE)], rffi.INT)
+c_close = llexternal('fclose', [lltype.Ptr(FILE)], rffi.INT, releasegil=False)
 c_fwrite = llexternal('fwrite', [rffi.CCHARP, rffi.SIZE_T, rffi.SIZE_T,
                                  lltype.Ptr(FILE)], rffi.SIZE_T)
 c_fread = llexternal('fread', [rffi.CCHARP, rffi.SIZE_T, rffi.SIZE_T,
@@ -57,7 +57,7 @@
                      rffi.CCHARP)
 
 c_popen = llexternal('popen', [rffi.CCHARP, rffi.CCHARP], lltype.Ptr(FILE))
-c_pclose = llexternal('pclose', [lltype.Ptr(FILE)], rffi.INT)
+c_pclose = llexternal('pclose', [lltype.Ptr(FILE)], rffi.INT, releasegil=False)
 
 BASE_BUF_SIZE = 4096
 BASE_LINE_SIZE = 100
diff --git a/rpython/rlib/rthread.py b/rpython/rlib/rthread.py
--- a/rpython/rlib/rthread.py
+++ b/rpython/rlib/rthread.py
@@ -272,3 +272,65 @@
         llop.gc_thread_after_fork(lltype.Void, result_of_fork, opaqueaddr)
     else:
         assert opaqueaddr == llmemory.NULL
+
+# ____________________________________________________________
+#
+# Thread-locals.  Only for references that change "not too often" --
+# for now, the JIT compiles get() as a loop-invariant, so basically
+# don't change them.
+# KEEP THE REFERENCE ALIVE, THE GC DOES NOT FOLLOW THEM SO FAR!
+# We use _make_sure_does_not_move() to make sure the pointer will not move.
+
+ecitl = ExternalCompilationInfo(
+    includes = ['src/threadlocal.h'],
+    separate_module_files = [translator_c_dir / 'src' / 'threadlocal.c'])
+ensure_threadlocal = rffi.llexternal_use_eci(ecitl)
+
+class ThreadLocalReference(object):
+    _COUNT = 1
+    OPAQUEID = lltype.OpaqueType("ThreadLocalRef",
+                                 hints={"threadlocalref": True,
+                                        "external": "C",
+                                        "c_name": "RPyThreadStaticTLS"})
+
+    def __init__(self, Cls):
+        "NOT_RPYTHON: must be prebuilt"
+        import thread
+        self.Cls = Cls
+        self.local = thread._local()      # <- NOT_RPYTHON
+        unique_id = ThreadLocalReference._COUNT
+        ThreadLocalReference._COUNT += 1
+        opaque_id = lltype.opaqueptr(ThreadLocalReference.OPAQUEID,
+                                     'tlref%d' % unique_id)
+        self.opaque_id = opaque_id
+
+        def get():
+            if we_are_translated():
+                from rpython.rtyper.lltypesystem import rclass
+                from rpython.rtyper.annlowlevel import cast_base_ptr_to_instance
+                ptr = llop.threadlocalref_get(rclass.OBJECTPTR, opaque_id)
+                return cast_base_ptr_to_instance(Cls, ptr)
+            else:
+                return getattr(self.local, 'value', None)
+
+        @jit.dont_look_inside
+        def set(value):
+            assert isinstance(value, Cls) or value is None
+            if we_are_translated():
+                from rpython.rtyper.annlowlevel import cast_instance_to_base_ptr
+                from rpython.rlib.rgc import _make_sure_does_not_move
+                from rpython.rlib.objectmodel import running_on_llinterp
+                ptr = cast_instance_to_base_ptr(value)
+                if not running_on_llinterp:
+                    gcref = lltype.cast_opaque_ptr(llmemory.GCREF, ptr)
+                    _make_sure_does_not_move(gcref)
+                llop.threadlocalref_set(lltype.Void, opaque_id, ptr)
+                ensure_threadlocal()
+            else:
+                self.local.value = value
+
+        self.get = get
+        self.set = set
+
+    def _freeze_(self):
+        return True
diff --git a/rpython/rlib/streamio.py b/rpython/rlib/streamio.py
--- a/rpython/rlib/streamio.py
+++ b/rpython/rlib/streamio.py
@@ -37,7 +37,7 @@
 import os, sys, errno
 from rpython.rlib.objectmodel import specialize, we_are_translated
 from rpython.rlib.rarithmetic import r_longlong, intmask
-from rpython.rlib import rposix
+from rpython.rlib import rposix, nonconst
 from rpython.rlib.rstring import StringBuilder
 
 from os import O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC, O_APPEND
@@ -159,6 +159,8 @@
             stream = TextInputFilter(stream)
     elif not binary and os.linesep == '\r\n':
         stream = TextCRLFFilter(stream)
+    if nonconst.NonConstant(False):
+        stream.flush_buffers()     # annotation workaround for untranslated tests
     return stream
 
 
diff --git a/rpython/rlib/test/test_rstring.py b/rpython/rlib/test/test_rstring.py
--- a/rpython/rlib/test/test_rstring.py
+++ b/rpython/rlib/test/test_rstring.py
@@ -239,6 +239,7 @@
             res = res and split('a//b//c//d', '//') == ['a', 'b', 'c', 'd']
             res = res and split(' a\ta\na b') == ['a', 'a', 'a', 'b']
             res = res and split('a//b//c//d', '//', 2) == ['a', 'b', 'c//d']
+            res = res and split('abcd,efghi', ',') == ['abcd', 'efghi']
             res = res and split(u'a//b//c//d', u'//') == [u'a', u'b', u'c', u'd']
             res = res and split(u'endcase test', u'test') == [u'endcase ', u'']
             res = res and rsplit('a|b|c|d', '|', 2) == ['a|b', 'c', 'd']
diff --git a/rpython/rlib/test/test_rthread.py b/rpython/rlib/test/test_rthread.py
--- a/rpython/rlib/test/test_rthread.py
+++ b/rpython/rlib/test/test_rthread.py
@@ -1,4 +1,4 @@
-import gc
+import gc, time
 from rpython.rlib.rthread import *
 from rpython.translator.c.test.test_boehm import AbstractGCTestClass
 from rpython.rtyper.lltypesystem import lltype, rffi
@@ -29,6 +29,23 @@
     else:
         py.test.fail("Did not raise")
 
+def test_tlref_untranslated():
+    class FooBar(object):
+        pass
+    t = ThreadLocalReference(FooBar)
+    results = []
+    def subthread():
+        x = FooBar()
+        results.append(t.get() is None)
+        t.set(x)
+        results.append(t.get() is x)
+        time.sleep(0.2)
+        results.append(t.get() is x)
+    for i in range(5):
+        start_new_thread(subthread, ())
+    time.sleep(0.5)
+    assert results == [True] * 15
+
 
 class AbstractThreadTests(AbstractGCTestClass):
     use_threads = True
@@ -198,6 +215,20 @@
         res = fn()
         assert res >= 0.95
 
+    def test_tlref(self):
+        class FooBar(object):
+            pass
+        t = ThreadLocalReference(FooBar)
+        def f():
+            x1 = FooBar()
+            t.set(x1)
+            import gc; gc.collect()
+            assert t.get() is x1
+            return 42
+        fn = self.getcompiled(f, [])
+        res = fn()
+        assert res == 42
+
 #class TestRunDirectly(AbstractThreadTests):
 #    def getcompiled(self, f, argtypes):
 #        return f
@@ -208,4 +239,4 @@
     gcpolicy = 'boehm'
 
 class TestUsingFramework(AbstractThreadTests):
-    gcpolicy = 'generation'
+    gcpolicy = 'minimark'
diff --git a/rpython/rtyper/llinterp.py b/rpython/rtyper/llinterp.py
--- a/rpython/rtyper/llinterp.py
+++ b/rpython/rtyper/llinterp.py
@@ -919,6 +919,20 @@
     def op_stack_current(self):
         return 0
 
+    def op_threadlocalref_set(self, key, value):
+        try:
+            d = self.llinterpreter.tlrefsdict
+        except AttributeError:
+            d = self.llinterpreter.tlrefsdict = {}
+        d[key._obj] = value
+
+    def op_threadlocalref_get(self, key):
+        d = self.llinterpreter.tlrefsdict
+        return d[key._obj]
+
+    def op_threadlocalref_getaddr(self, key):
+        raise NotImplementedError("threadlocalref_getaddr")
+
     # __________________________________________________________
     # operations on addresses
 
diff --git a/rpython/rtyper/lltypesystem/lloperation.py b/rpython/rtyper/lltypesystem/lloperation.py
--- a/rpython/rtyper/lltypesystem/lloperation.py
+++ b/rpython/rtyper/lltypesystem/lloperation.py
@@ -541,6 +541,10 @@
     'getslice':             LLOp(canraise=(Exception,)),
     'check_and_clear_exc':  LLOp(),
 
+    'threadlocalref_get':   LLOp(sideeffects=False),
+    'threadlocalref_getaddr': LLOp(sideeffects=False),
+    'threadlocalref_set':   LLOp(),
+
     # __________ debugging __________
     'debug_view':           LLOp(),
     'debug_print':          LLOp(canrun=True),
diff --git a/rpython/rtyper/lltypesystem/rbuilder.py b/rpython/rtyper/lltypesystem/rbuilder.py
--- a/rpython/rtyper/lltypesystem/rbuilder.py
+++ b/rpython/rtyper/lltypesystem/rbuilder.py
@@ -2,6 +2,7 @@
 from rpython.rlib.objectmodel import enforceargs
 from rpython.rlib.rarithmetic import ovfcheck, r_uint, intmask
 from rpython.rlib.debug import ll_assert
+from rpython.rlib.unroll import unrolling_iterable
 from rpython.rtyper.rptr import PtrRepr
 from rpython.rtyper.lltypesystem import lltype, rffi, rstr
 from rpython.rtyper.lltypesystem.lltype import staticAdtMethod, nullptr
@@ -34,62 +35,15 @@
 # ------------------------------------------------------------
 
 
+def dont_inline(func):
+    func._dont_inline_ = True
+    return func
+
 def always_inline(func):
     func._always_inline_ = True
     return func
 
 
-def new_grow_funcs(name, mallocfn):
-
-    @enforceargs(None, int)
-    def stringbuilder_grow(ll_builder, needed):
-        try:
-            needed = ovfcheck(needed + ll_builder.total_size)
-            needed = ovfcheck(needed + 63) & ~63
-            total_size = ll_builder.total_size + needed
-        except OverflowError:
-            raise MemoryError
-        #
-        new_string = mallocfn(needed)
-        #
-        PIECE = lltype.typeOf(ll_builder.extra_pieces).TO
-        old_piece = lltype.malloc(PIECE)
-        old_piece.buf = ll_builder.current_buf
-        old_piece.prev_piece = ll_builder.extra_pieces
-        ll_assert(bool(old_piece.buf), "no buf??")
-        ll_builder.current_buf = new_string
-        ll_builder.current_pos = 0
-        ll_builder.current_end = needed
-        ll_builder.total_size = total_size
-        ll_builder.extra_pieces = old_piece
-
-    def stringbuilder_append_overflow(ll_builder, ll_str, size):
-        # First, the part that still fits in the current piece
-        part1 = ll_builder.current_end - ll_builder.current_pos
-        start = ll_builder.skip
-        ll_builder.copy_string_contents(ll_str, ll_builder.current_buf,
-                                        start, ll_builder.current_pos,
-                                        part1)
-        ll_builder.skip += part1
-        stringbuilder_grow(ll_builder, size - part1)
-
-    def stringbuilder_append_overflow_2(ll_builder, char0):
-        # Overflow when writing two chars.  There are two cases depending
-        # on whether one char still fits or not.
-        if ll_builder.current_pos < ll_builder.current_end:
-            ll_builder.current_buf.chars[ll_builder.current_pos] = char0
-            ll_builder.skip = 1
-        stringbuilder_grow(ll_builder, 2)
-
-    return (func_with_new_name(stringbuilder_grow, '%s_grow' % name),
-            func_with_new_name(stringbuilder_append_overflow,
-                               '%s_append_overflow' % name),
-            func_with_new_name(stringbuilder_append_overflow_2,
-                               '%s_append_overflow_2' % name))
-
-stringbuilder_grows = new_grow_funcs('stringbuilder', rstr.mallocstr)
-unicodebuilder_grows = new_grow_funcs('unicodebuilder', rstr.mallocunicode)
-
 STRINGPIECE = lltype.GcStruct('stringpiece',
     ('buf', lltype.Ptr(STR)),
     ('prev_piece', lltype.Ptr(lltype.GcForwardReference())))
@@ -100,12 +54,8 @@
     ('current_pos', lltype.Signed),
     ('current_end', lltype.Signed),
     ('total_size', lltype.Signed),
-    ('skip', lltype.Signed),
     ('extra_pieces', lltype.Ptr(STRINGPIECE)),
     adtmeths={
-        'grow': staticAdtMethod(stringbuilder_grows[0]),
-        'append_overflow': staticAdtMethod(stringbuilder_grows[1]),
-        'append_overflow_2': staticAdtMethod(stringbuilder_grows[2]),
         'copy_string_contents': staticAdtMethod(rstr.copy_string_contents),
         'copy_raw_to_string': staticAdtMethod(rstr.copy_raw_to_string),
         'mallocfn': staticAdtMethod(rstr.mallocstr),
@@ -122,18 +72,330 @@
     ('current_pos', lltype.Signed),
     ('current_end', lltype.Signed),
     ('total_size', lltype.Signed),
-    ('skip', lltype.Signed),
     ('extra_pieces', lltype.Ptr(UNICODEPIECE)),
     adtmeths={
-        'grow': staticAdtMethod(unicodebuilder_grows[0]),
-        'append_overflow': staticAdtMethod(unicodebuilder_grows[1]),
-        'append_overflow_2': staticAdtMethod(unicodebuilder_grows[2]),
         'copy_string_contents': staticAdtMethod(rstr.copy_unicode_contents),
         'copy_raw_to_string': staticAdtMethod(rstr.copy_raw_to_unicode),
         'mallocfn': staticAdtMethod(rstr.mallocunicode),
     }
 )
 
+# ------------------------------------------------------------
+# The generic piece of code to append a string (or a slice of it)
+# to a builder; it is inlined inside various functions below
+
+ at always_inline
+def _ll_append(ll_builder, ll_str, start, size):
+    pos = ll_builder.current_pos
+    end = ll_builder.current_end
+    if (end - pos) < size:
+        ll_grow_and_append(ll_builder, ll_str, start, size)
+    else:
+        ll_builder.current_pos = pos + size
+        ll_builder.copy_string_contents(ll_str, ll_builder.current_buf,
+                                        start, pos, size)
+
+# ------------------------------------------------------------
+# Logic to grow a builder (by adding a new string to it)
+
+ at dont_inline
+ at enforceargs(None, int)
+def ll_grow_by(ll_builder, needed):
+    try:
+        needed = ovfcheck(needed + ll_builder.total_size)
+        needed = ovfcheck(needed + 63) & ~63
+        total_size = ll_builder.total_size + needed
+    except OverflowError:
+        raise MemoryError
+    #
+    new_string = ll_builder.mallocfn(needed)
+    #
+    PIECE = lltype.typeOf(ll_builder.extra_pieces).TO
+    old_piece = lltype.malloc(PIECE)
+    old_piece.buf = ll_builder.current_buf
+    old_piece.prev_piece = ll_builder.extra_pieces
+    ll_assert(bool(old_piece.buf), "no buf??")
+    ll_builder.current_buf = new_string
+    ll_builder.current_pos = 0
+    ll_builder.current_end = needed
+    ll_builder.total_size = total_size
+    ll_builder.extra_pieces = old_piece
+
+ at dont_inline
+def ll_grow_and_append(ll_builder, ll_str, start, size):
+    # First, the part that still fits in the current piece
+    part1 = ll_builder.current_end - ll_builder.current_pos
+    ll_assert(part1 < size, "part1 >= size")
+    ll_builder.copy_string_contents(ll_str, ll_builder.current_buf,
+                                    start, ll_builder.current_pos,
+                                    part1)
+    start += part1
+    size -= part1
+    # Allocate the new piece
+    ll_grow_by(ll_builder, size)
+    ll_assert(ll_builder.current_pos == 0, "current_pos must be 0 after grow()")
+    # Finally, the second part of the string
+    ll_builder.current_pos = size
+    ll_builder.copy_string_contents(ll_str, ll_builder.current_buf,
+                                    start, 0, size)
+
+# ------------------------------------------------------------
+# builder.append()
+
+ at always_inline
+def ll_append(ll_builder, ll_str):
+    if jit.we_are_jitted():
+        ll_jit_append(ll_builder, ll_str)
+    else:
+        # no-jit case: inline the logic of _ll_append() in the caller
+        _ll_append(ll_builder, ll_str, 0, len(ll_str.chars))
+
+ at dont_inline
+def ll_jit_append(ll_builder, ll_str):
+    # jit case: first try special cases for known small lengths
+    if ll_jit_try_append_slice(ll_builder, ll_str, 0, len(ll_str.chars)):
+        return
+    # fall-back to do a residual call to ll_append_res0
+    ll_append_res0(ll_builder, ll_str)
+
+ at jit.dont_look_inside
+def ll_append_res0(ll_builder, ll_str):
+    _ll_append(ll_builder, ll_str, 0, len(ll_str.chars))
+
+# ------------------------------------------------------------
+# builder.append_char()
+
+ at always_inline
+def ll_append_char(ll_builder, char):
+    jit.conditional_call(ll_builder.current_pos == ll_builder.current_end,
+                         ll_grow_by, ll_builder, 1)
+    pos = ll_builder.current_pos
+    ll_builder.current_pos = pos + 1
+    ll_builder.current_buf.chars[pos] = char
+
+# ------------------------------------------------------------
+# builder.append_slice()
+
+ at always_inline
+def ll_append_slice(ll_builder, ll_str, start, end):
+    if jit.we_are_jitted():
+        ll_jit_append_slice(ll_builder, ll_str, start, end)
+    else:
+        # no-jit case: inline the logic of _ll_append() in the caller
+        _ll_append(ll_builder, ll_str, start, end - start)
+
+ at dont_inline
+def ll_jit_append_slice(ll_builder, ll_str, start, end):
+    # jit case: first try special cases for known small lengths
+    if ll_jit_try_append_slice(ll_builder, ll_str, start, end - start):
+        return
+    # fall-back to do a residual call to ll_append_res_slice
+    ll_append_res_slice(ll_builder, ll_str, start, end)
+
+ at jit.dont_look_inside
+def ll_append_res_slice(ll_builder, ll_str, start, end):
+    _ll_append(ll_builder, ll_str, start, end - start)
+
+# ------------------------------------------------------------
+# Special-casing for the JIT: appending strings (or slices) of
+# a known length up to MAX_N.  These functions all contain an
+# inlined copy of _ll_append(), but with a known small N, gcc
+# will compile the copy_string_contents() efficiently.
+
+MAX_N = 10
+
+def make_func_for_size(N):
+    @jit.dont_look_inside
+    def ll_append_0(ll_builder, ll_str):
+        _ll_append(ll_builder, ll_str, 0, N)
+    ll_append_0 = func_with_new_name(ll_append_0, "ll_append_0_%d" % N)
+    #
+    @jit.dont_look_inside
+    def ll_append_start(ll_builder, ll_str, start):
+        _ll_append(ll_builder, ll_str, start, N)
+    ll_append_start = func_with_new_name(ll_append_start,
+                                                  "ll_append_start_%d" % N)
+    return ll_append_0, ll_append_start, N
+
+unroll_func_for_size = unrolling_iterable([make_func_for_size(_n)
+                                           for _n in range(2, MAX_N + 1)])
+
+ at jit.unroll_safe
+def ll_jit_try_append_slice(ll_builder, ll_str, start, size):
+    if jit.isconstant(size):
+        if size == 0:
+            return True
+        # a special case: if the builder's pos and end are still contants
+        # (typically if the builder is still virtual), and if 'size' fits,
+        # then we don't need any reallocation and can just set the
+        # characters in the buffer, in a way that won't force anything.
+        if (jit.isconstant(ll_builder.current_pos) and
+            jit.isconstant(ll_builder.current_end) and
+            size <= (ll_builder.current_end - ll_builder.current_pos) and
+            size <= 16):
+            pos = ll_builder.current_pos
+            buf = ll_builder.current_buf
+            stop = pos + size
+            ll_builder.current_pos = stop
+            while pos < stop:
+                buf.chars[pos] = ll_str.chars[start]
+                pos += 1
+                start += 1
+            return True
+        # turn appends of length 1 into ll_append_char().
+        if size == 1:
+            ll_append_char(ll_builder, ll_str.chars[start])
+            return True
+        # turn appends of length 2 to 10 into residual calls to
+        # specialized functions, for the lengths 2 to 10, where
+        # gcc will optimize the known-length copy_string_contents()
+        # as much as possible.
+        for func0, funcstart, for_size in unroll_func_for_size:
+            if size == for_size:
+                if jit.isconstant(start) and start == 0:
+                    func0(ll_builder, ll_str)
+                else:
+                    funcstart(ll_builder, ll_str, start)
+                return True
+    return False     # use the fall-back path
+
+# ------------------------------------------------------------
+# builder.append_multiple_char()
+
+ at always_inline
+def ll_append_multiple_char(ll_builder, char, times):
+    if jit.we_are_jitted():
+        if ll_jit_try_append_multiple_char(ll_builder, char, times):
+            return
+    _ll_append_multiple_char(ll_builder, char, times)
+
+ at jit.dont_look_inside
+def _ll_append_multiple_char(ll_builder, char, times):
+    part1 = ll_builder.current_end - ll_builder.current_pos
+    if times > part1:
+        times -= part1
+        buf = ll_builder.current_buf
+        for i in xrange(ll_builder.current_pos, ll_builder.current_end):
+            buf.chars[i] = char
+        ll_grow_by(ll_builder, times)
+    #
+    buf = ll_builder.current_buf
+    pos = ll_builder.current_pos
+    end = pos + times
+    ll_builder.current_pos = end
+    for i in xrange(pos, end):
+        buf.chars[i] = char
+
+ at jit.unroll_safe
+def ll_jit_try_append_multiple_char(ll_builder, char, size):
+    if jit.isconstant(size):
+        if size == 0:
+            return True
+        # a special case: if the builder's pos and end are still contants
+        # (typically if the builder is still virtual), and if 'size' fits,
+        # then we don't need any reallocation and can just set the
+        # characters in the buffer, in a way that won't force anything.
+        if (jit.isconstant(ll_builder.current_pos) and
+            jit.isconstant(ll_builder.current_end) and
+            size <= (ll_builder.current_end - ll_builder.current_pos) and
+            size <= 16):
+            pos = ll_builder.current_pos
+            buf = ll_builder.current_buf
+            stop = pos + size
+            ll_builder.current_pos = stop
+            while pos < stop:
+                buf.chars[pos] = char
+                pos += 1
+            return True
+        if size == 1:
+            ll_append_char(ll_builder, char)
+            return True
+    return False     # use the fall-back path
+
+# ------------------------------------------------------------
+# builder.append_charpsize()
+
+ at jit.dont_look_inside
+def ll_append_charpsize(ll_builder, charp, size):
+    part1 = ll_builder.current_end - ll_builder.current_pos
+    if size > part1:
+        # First, the part that still fits
+        ll_builder.copy_raw_to_string(charp, ll_builder.current_buf,
+                                      ll_builder.current_pos, part1)
+        charp = rffi.ptradd(charp, part1)
+        size -= part1
+        ll_grow_by(ll_builder, size)
+    #
+    pos = ll_builder.current_pos
+    ll_builder.current_pos = pos + size
+    ll_builder.copy_raw_to_string(charp, ll_builder.current_buf, pos, size)
+
+# ------------------------------------------------------------
+# builder.getlength()
+
+ at always_inline
+def ll_getlength(ll_builder):
+    num_chars_missing_from_last_piece = (
+        ll_builder.current_end - ll_builder.current_pos)
+    return ll_builder.total_size - num_chars_missing_from_last_piece
+
+# ------------------------------------------------------------
+# builder.build()
+
+ at jit.look_inside_iff(lambda ll_builder: jit.isvirtual(ll_builder))
+def ll_build(ll_builder):
+    # NB. usually the JIT doesn't look inside this function; it does
+    # so only in the simplest example where it could virtualize everything
+    if ll_builder.extra_pieces:
+        ll_fold_pieces(ll_builder)
+    elif ll_builder.current_pos != ll_builder.total_size:
+        ll_shrink_final(ll_builder)
+    return ll_builder.current_buf
+
+def ll_shrink_final(ll_builder):
+    final_size = ll_builder.current_pos
+    ll_assert(final_size <= ll_builder.total_size,
+              "final_size > ll_builder.total_size?")
+    buf = rgc.ll_shrink_array(ll_builder.current_buf, final_size)
+    ll_builder.current_buf = buf
+    ll_builder.current_end = final_size
+    ll_builder.total_size = final_size
+
+def ll_fold_pieces(ll_builder):
+    final_size = BaseStringBuilderRepr.ll_getlength(ll_builder)
+    ll_assert(final_size >= 0, "negative final_size")
+    extra = ll_builder.extra_pieces
+    ll_builder.extra_pieces = lltype.nullptr(lltype.typeOf(extra).TO)
+    #
+    result = ll_builder.mallocfn(final_size)
+    piece = ll_builder.current_buf
+    piece_lgt = ll_builder.current_pos
+    ll_assert(ll_builder.current_end == len(piece.chars),
+              "bogus last piece_lgt")
+    ll_builder.total_size = final_size
+    ll_builder.current_buf = result
+    ll_builder.current_pos = final_size
+    ll_builder.current_end = final_size
+
+    dst = final_size
+    while True:
+        dst -= piece_lgt
+        ll_assert(dst >= 0, "rbuilder build: overflow")
+        ll_builder.copy_string_contents(piece, result, 0, dst, piece_lgt)
+        if not extra:
+            break
+        piece = extra.buf
+        piece_lgt = len(piece.chars)
+        extra = extra.prev_piece
+    ll_assert(dst == 0, "rbuilder build: underflow")
+
+# ------------------------------------------------------------
+# bool(builder)
+
+def ll_bool(ll_builder):
+    return ll_builder != nullptr(lltype.typeOf(ll_builder).TO)
+
+# ------------------------------------------------------------
 
 class BaseStringBuilderRepr(AbstractStringBuilderRepr):
     def empty(self):
@@ -145,211 +407,24 @@
         # Negative values are mapped to 1280.
         init_size = intmask(min(r_uint(init_size), r_uint(1280)))
         ll_builder = lltype.malloc(cls.lowleveltype.TO)
-        ll_builder.current_buf = cls.mallocfn(init_size)
+        ll_builder.current_buf = ll_builder.mallocfn(init_size)
         ll_builder.current_pos = 0
         ll_builder.current_end = init_size
         ll_builder.total_size = init_size
         return ll_builder
 
-    @staticmethod
-    @always_inline
-    def ll_append(ll_builder, ll_str):
-        BaseStringBuilderRepr.ll_append_slice(ll_builder, ll_str,
-                                              0, len(ll_str.chars))
-
-    @staticmethod
-    @always_inline
-    def ll_append_char(ll_builder, char):
-        jit.conditional_call(ll_builder.current_pos == ll_builder.current_end,
-                             ll_builder.grow, ll_builder, 1)
-        pos = ll_builder.current_pos
-        ll_builder.current_pos = pos + 1
-        ll_builder.current_buf.chars[pos] = char
-
-    @staticmethod
-    def ll_append_char_2(ll_builder, char0, char1):
-        # this is only used by the JIT, when appending a small, known-length
-        # string.  Unlike two consecutive ll_append_char(), it can do that
-        # with only one conditional_call.
-        ll_builder.skip = 2
-        jit.conditional_call(
-            ll_builder.current_end - ll_builder.current_pos < 2,
-            ll_builder.append_overflow_2, ll_builder, char0)
-        pos = ll_builder.current_pos
-        buf = ll_builder.current_buf
-        buf.chars[pos] = char0
-        pos += ll_builder.skip
-        ll_builder.current_pos = pos
-        buf.chars[pos - 1] = char1
-        # NB. this usually writes into buf.chars[current_pos] and
-        # buf.chars[current_pos+1], except if we had an overflow right
-        # in the middle of the two chars.  In that case, 'skip' is set to
-        # 1 and only one char is written: the 'char1' overrides the 'char0'.
-
-    @staticmethod
-    @always_inline
-    def ll_append_slice(ll_builder, ll_str, start, end):
-        size = end - start
-        if jit.we_are_jitted():
-            if BaseStringBuilderRepr._ll_jit_try_append_slice(
-                    ll_builder, ll_str, start, size):
-                return
-        ll_builder.skip = start
-        jit.conditional_call(
-            size > ll_builder.current_end - ll_builder.current_pos,
-            ll_builder.append_overflow, ll_builder, ll_str, size)
-        start = ll_builder.skip
-        size = end - start
-        pos = ll_builder.current_pos
-        ll_builder.copy_string_contents(ll_str, ll_builder.current_buf,
-                                        start, pos, size)
-        ll_builder.current_pos = pos + size
-
-    @staticmethod
-    def _ll_jit_try_append_slice(ll_builder, ll_str, start, size):
-        if jit.isconstant(size):
-            if size == 0:
-                return True
-            if size == 1:
-                BaseStringBuilderRepr.ll_append_char(ll_builder,
-                                                     ll_str.chars[start])
-                return True
-            if size == 2:
-                BaseStringBuilderRepr.ll_append_char_2(ll_builder,
-                                                       ll_str.chars[start],
-                                                       ll_str.chars[start + 1])
-                return True
-        return False     # use the fall-back path
-
-    @staticmethod
-    @always_inline
-    def ll_append_multiple_char(ll_builder, char, times):
-        if jit.we_are_jitted():
-            if BaseStringBuilderRepr._ll_jit_try_append_multiple_char(
-                    ll_builder, char, times):
-                return
-        BaseStringBuilderRepr._ll_append_multiple_char(ll_builder, char, times)
-
-    @staticmethod
-    @jit.dont_look_inside
-    def _ll_append_multiple_char(ll_builder, char, times):
-        part1 = ll_builder.current_end - ll_builder.current_pos
-        if times > part1:
-            times -= part1
-            buf = ll_builder.current_buf
-            for i in xrange(ll_builder.current_pos, ll_builder.current_end):
-                buf.chars[i] = char
-            ll_builder.grow(ll_builder, times)
-        #
-        buf = ll_builder.current_buf
-        pos = ll_builder.current_pos
-        end = pos + times
-        ll_builder.current_pos = end
-        for i in xrange(pos, end):
-            buf.chars[i] = char
-
-    @staticmethod
-    def _ll_jit_try_append_multiple_char(ll_builder, char, size):
-        if jit.isconstant(size):
-            if size == 0:
-                return True
-            if size == 1:
-                BaseStringBuilderRepr.ll_append_char(ll_builder, char)
-                return True
-            if size == 2:
-                BaseStringBuilderRepr.ll_append_char_2(ll_builder, char, char)
-                return True
-            if size == 3:
-                BaseStringBuilderRepr.ll_append_char(ll_builder, char)
-                BaseStringBuilderRepr.ll_append_char_2(ll_builder, char, char)
-                return True
-            if size == 4:
-                BaseStringBuilderRepr.ll_append_char_2(ll_builder, char, char)
-                BaseStringBuilderRepr.ll_append_char_2(ll_builder, char, char)
-                return True
-        return False     # use the fall-back path
-
-    @staticmethod
-    @jit.dont_look_inside
-    def ll_append_charpsize(ll_builder, charp, size):
-        part1 = ll_builder.current_end - ll_builder.current_pos
-        if size > part1:
-            # First, the part that still fits
-            ll_builder.copy_raw_to_string(charp, ll_builder.current_buf,
-                                          ll_builder.current_pos, part1)
-            charp = rffi.ptradd(charp, part1)
-            size -= part1
-            ll_builder.grow(ll_builder, size)
-        #
-        pos = ll_builder.current_pos
-        ll_builder.current_pos = pos + size
-        ll_builder.copy_raw_to_string(charp, ll_builder.current_buf, pos, size)
-
-    @staticmethod
-    @always_inline
-    def ll_getlength(ll_builder):
-        num_chars_missing_from_last_piece = (
-            ll_builder.current_end - ll_builder.current_pos)
-        return ll_builder.total_size - num_chars_missing_from_last_piece
-
-    @staticmethod
-    @jit.look_inside_iff(lambda ll_builder: jit.isvirtual(ll_builder))
-    def ll_build(ll_builder):
-        # NB. usually the JIT doesn't look inside this function; it does
-        # so only in the simplest example where it could virtualize everything
-        if ll_builder.extra_pieces:
-            BaseStringBuilderRepr._ll_fold_pieces(ll_builder)
-        elif ll_builder.current_pos != ll_builder.total_size:
-            BaseStringBuilderRepr._ll_shrink_final(ll_builder)
-        return ll_builder.current_buf
-
-    @staticmethod
-    def _ll_shrink_final(ll_builder):
-        final_size = ll_builder.current_pos
-        ll_assert(final_size <= ll_builder.total_size,
-                  "final_size > ll_builder.total_size?")
-        buf = rgc.ll_shrink_array(ll_builder.current_buf, final_size)
-        ll_builder.current_buf = buf
-        ll_builder.current_end = final_size
-        ll_builder.total_size = final_size
-
-    @staticmethod
-    def _ll_fold_pieces(ll_builder):
-        final_size = BaseStringBuilderRepr.ll_getlength(ll_builder)
-        ll_assert(final_size >= 0, "negative final_size")
-        extra = ll_builder.extra_pieces
-        ll_builder.extra_pieces = lltype.nullptr(lltype.typeOf(extra).TO)
-        #
-        result = ll_builder.mallocfn(final_size)
-        piece = ll_builder.current_buf
-        piece_lgt = ll_builder.current_pos
-        ll_assert(ll_builder.current_end == len(piece.chars),
-                  "bogus last piece_lgt")
-        ll_builder.total_size = final_size
-        ll_builder.current_buf = result
-        ll_builder.current_pos = final_size
-        ll_builder.current_end = final_size
-
-        dst = final_size
-        while True:
-            dst -= piece_lgt
-            ll_assert(dst >= 0, "rbuilder build: overflow")
-            ll_builder.copy_string_contents(piece, result, 0, dst, piece_lgt)
-            if not extra:
-                break
-            piece = extra.buf
-            piece_lgt = len(piece.chars)
-            extra = extra.prev_piece
-        ll_assert(dst == 0, "rbuilder build: underflow")
-
-    @classmethod
-    def ll_bool(cls, ll_builder):
-        return ll_builder != nullptr(cls.lowleveltype.TO)
+    ll_append               = staticmethod(ll_append)
+    ll_append_char          = staticmethod(ll_append_char)
+    ll_append_slice         = staticmethod(ll_append_slice)
+    ll_append_multiple_char = staticmethod(ll_append_multiple_char)
+    ll_append_charpsize     = staticmethod(ll_append_charpsize)
+    ll_getlength            = staticmethod(ll_getlength)
+    ll_build                = staticmethod(ll_build)
+    ll_bool                 = staticmethod(ll_bool)
 
 class StringBuilderRepr(BaseStringBuilderRepr):
     lowleveltype = lltype.Ptr(STRINGBUILDER)
     basetp = STR
-    mallocfn = staticmethod(rstr.mallocstr)
     string_repr = string_repr
     char_repr = char_repr
     raw_ptr_repr = PtrRepr(
@@ -359,7 +434,6 @@
 class UnicodeBuilderRepr(BaseStringBuilderRepr):
     lowleveltype = lltype.Ptr(UNICODEBUILDER)
     basetp = UNICODE
-    mallocfn = staticmethod(rstr.mallocunicode)
     string_repr = unicode_repr
     char_repr = unichar_repr
     raw_ptr_repr = PtrRepr(
diff --git a/rpython/rtyper/module/ll_os_environ.py b/rpython/rtyper/module/ll_os_environ.py
--- a/rpython/rtyper/module/ll_os_environ.py
+++ b/rpython/rtyper/module/ll_os_environ.py
@@ -60,7 +60,7 @@
 
 # ____________________________________________________________
 # Access to the 'environ' external variable
-
+prefix = ''
 if sys.platform.startswith('darwin'):
     CCHARPPP = rffi.CArrayPtr(rffi.CCHARPP)
     _os_NSGetEnviron = rffi.llexternal(
@@ -77,6 +77,7 @@
         rffi.CCHARPP, '_environ', eci)
     get__wenviron, _set__wenviron = rffi.CExternVariable(
         CWCHARPP, '_wenviron', eci, c_type='wchar_t **')
+    prefix = '_'    
 else:
     os_get_environ, _os_set_environ = rffi.CExternVariable(
         rffi.CCHARPP, 'environ', ExternalCompilationInfo())
@@ -117,7 +118,7 @@
 
 os_getenv = rffi.llexternal('getenv', [rffi.CCHARP], rffi.CCHARP,
                             releasegil=False)
-os_putenv = rffi.llexternal('putenv', [rffi.CCHARP], rffi.INT)
+os_putenv = rffi.llexternal(prefix + 'putenv', [rffi.CCHARP], rffi.INT)
 if _WIN32:
     _wgetenv = rffi.llexternal('_wgetenv', [rffi.CWCHARP], rffi.CWCHARP,
                                compilation_info=eci, releasegil=False)
diff --git a/rpython/rtyper/test/test_rbuilder.py b/rpython/rtyper/test/test_rbuilder.py
--- a/rpython/rtyper/test/test_rbuilder.py
+++ b/rpython/rtyper/test/test_rbuilder.py
@@ -28,9 +28,13 @@
 
     def test_simple(self):
         sb = StringBuilderRepr.ll_new(3)
+        assert StringBuilderRepr.ll_getlength(sb) == 0
         StringBuilderRepr.ll_append_char(sb, 'x')
+        assert StringBuilderRepr.ll_getlength(sb) == 1
         StringBuilderRepr.ll_append(sb, llstr("abc"))
+        assert StringBuilderRepr.ll_getlength(sb) == 4
         StringBuilderRepr.ll_append_slice(sb, llstr("foobar"), 2, 5)
+        assert StringBuilderRepr.ll_getlength(sb) == 7
         StringBuilderRepr.ll_append_multiple_char(sb, 'y', 3)
         assert StringBuilderRepr.ll_getlength(sb) == 10
         s = StringBuilderRepr.ll_build(sb)
diff --git a/rpython/translator/c/node.py b/rpython/translator/c/node.py
--- a/rpython/translator/c/node.py
+++ b/rpython/translator/c/node.py
@@ -959,12 +959,30 @@
                 args.append('0')
         yield 'RPyOpaque_SETUP_%s(%s);' % (T.tag, ', '.join(args))
 
+class ThreadLocalRefOpaqueNode(ContainerNode):
+    nodekind = 'tlrefopaque'


More information about the pypy-commit mailing list