Re: [pypy-dev] [pypy-commit] pypy default: Do sign checks directly.
Hi Stian, Could the sign check be please put into a nice helper method on rbigint objects? I find it not so nice that longobject.py pokes around in the internals of the big integer implementation. Cheers, Carl Friedrich On 11/05/2012 03:43 PM, Stian Andreassen wrote:
Author: Stian Andreassen Branch: Changeset: r58761:bc1b37bec5b3 Date: 2012-11-05 23:41 +0100 http://bitbucket.org/pypy/pypy/changeset/bc1b37bec5b3/
Log: Do sign checks directly.
diff --git a/pypy/objspace/std/longobject.py b/pypy/objspace/std/longobject.py --- a/pypy/objspace/std/longobject.py +++ b/pypy/objspace/std/longobject.py @@ -251,7 +251,7 @@
def pow__Long_Long_Long(space, w_long1, w_long2, w_long3): # XXX need to replicate some of the logic, to get the errors right - if w_long2.num.lt(rbigint.fromint(0)): + if w_long2.num.sign < 0: raise OperationError( space.w_TypeError, space.wrap( @@ -265,7 +265,7 @@
def pow__Long_Long_None(space, w_long1, w_long2, w_long3): # XXX need to replicate some of the logic, to get the errors right - if w_long2.num.lt(rbigint.fromint(0)): + if w_long2.num.sign < 0: raise FailedToImplementArgs( space.w_ValueError, space.wrap("long pow() too negative")) @@ -288,7 +288,7 @@
def lshift__Long_Long(space, w_long1, w_long2): # XXX need to replicate some of the logic, to get the errors right - if w_long2.num.lt(rbigint.fromint(0)): + if w_long2.num.sign < 0: raise OperationError(space.w_ValueError, space.wrap("negative shift count")) try: @@ -300,7 +300,7 @@
def rshift__Long_Long(space, w_long1, w_long2): # XXX need to replicate some of the logic, to get the errors right - if w_long2.num.lt(rbigint.fromint(0)): + if w_long2.num.sign < 0: raise OperationError(space.w_ValueError, space.wrap("negative shift count")) try: _______________________________________________ pypy-commit mailing list pypy-commit@python.org http://mail.python.org/mailman/listinfo/pypy-commit
Hello, I have some questions: - From what i understand, right now when a python function is called through a callback, the JIT compiler does not notice it, so it doesn't JIT the function at all. So is it possible to specify that some python function should always be JITted? - Is CFFI going to be integrated inside pypy? - Will CFFI support JITted callbacks? As you understand from above questions, my main remaining problem with pypy, is JITted callbacks. Best regards, lefteris.
Hi Eleytherios, On Wed, Nov 7, 2012 at 6:25 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
- From what i understand, right now when a python function is called through a callback, the JIT compiler does not notice it, so it doesn't JIT the function at all. So is it possible to specify that some python function should always be JITted?
No, It works better than that, but still not as well as it could. Right now, the Python function used as a callback will be JITted; but the problem is that it will only be JITted at the point of entering the Python function. What goes on before and after the actual call to the Python function is not JITted. So for example if your callback is of type "int callback(int)", then when called, it will go via libffi's callback mechanism (1st indirection), wrap the int argument in a W_IntObject (2nd indirection), and only then call the Python function (which will invoke JITted code).
- Is CFFI going to be integrated inside pypy?
Yes. Right now only the "_cffi_backend" module is part of a recent "pypy" binary. We will also include the pure Python part of CFFI inside the next PyPy release, 2.0.
- Will CFFI support JITted callbacks?
The situation described above will probably not be improved in time for the upcoming "2.0 beta", but some time later. A bientôt, Armin.
Hello, We have been testing CFFI here for the purpose of speeding up madIS [*], and here are some preliminary results. First of all, under pypy, CFFI is a *lot* faster than ctypes. In callback microbenchmarks (using quicksort to call the callbacks), pypy/CFFI had ~8-10 times less overhead than pypy/ctypes. Using libsqlite3 as the C library calling the callbacks, we found that, compared to Python/APSW [**] the callback overhead of pypy/CFFI was 3-4 times larger. To be able to run madIS, we started from pypy's _sqlite3.py (which uses ctypes) and did a rough conversion of it to CFFI. We then changed it to be APSW compatible. In the rest of the email, we'll refer to this APSW compatible CFFI wrapper, as MSPW. The end results were very encouraging. Using madIS and for SQL queries that didn't produce a lot of columns, we could get speedups up to 2 times. For SQL queries that produce a lot of columns, we get slowdowns of 2 - 2.5 times. Running a profiler on pypy/madIS, we found out that the main bottleneck is not callback performance any more, but it is regular pypy->C call speed. Whereas Python/APSW spends most of its time on madIS' Python execution of functions. Pypy/MSPW spends most of its time (~45-50%) on calling a libsqlite3 function that passes a Python string back to libsqlite3, and 10% in pypy's string.encode('utf-8') function. So for pypy/MSPW most of the time (55-60%) is spend just to pass a string back to libsqlite3. In Python/APSW the time spended to pass the string back to libsqlite3 is <1%. The libsqlite3 function's header info is: void sqlite3_result_text(sqlite3_context*, const char*, int, void(*)(void*)); In the main query that we've used in our tests, above libsqlite3 function is called 1.1 Million times. The times of this query, running under the different options are: Python/APSW: 40s pypy/MSPW: 2m 3s pypy/APSW: 2m 21s Best regards, l. [*] https://code.google.com/p/madis/ [*] https://code.google.com/p/apsw/
On Tue, Dec 11, 2012 at 7:00 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
Hello,
We have been testing CFFI here for the purpose of speeding up madIS [*], and here are some preliminary results.
First of all, under pypy, CFFI is a *lot* faster than ctypes. In callback microbenchmarks (using quicksort to call the callbacks), pypy/CFFI had ~8-10 times less overhead than pypy/ctypes.
Using libsqlite3 as the C library calling the callbacks, we found that, compared to Python/APSW [**] the callback overhead of pypy/CFFI was 3-4 times larger.
To be able to run madIS, we started from pypy's _sqlite3.py (which uses ctypes) and did a rough conversion of it to CFFI. We then changed it to be APSW compatible. In the rest of the email, we'll refer to this APSW compatible CFFI wrapper, as MSPW.
The end results were very encouraging. Using madIS and for SQL queries that didn't produce a lot of columns, we could get speedups up to 2 times.
For SQL queries that produce a lot of columns, we get slowdowns of 2 - 2.5 times. Running a profiler on pypy/madIS, we found out that the main bottleneck is not callback performance any more, but it is regular pypy->C call speed.
Whereas Python/APSW spends most of its time on madIS' Python execution of functions. Pypy/MSPW spends most of its time (~45-50%) on calling a libsqlite3 function that passes a Python string back to libsqlite3, and 10% in pypy's string.encode('utf-8') function.
So for pypy/MSPW most of the time (55-60%) is spend just to pass a string back to libsqlite3.
In Python/APSW the time spended to pass the string back to libsqlite3 is <1%.
The libsqlite3 function's header info is:
void sqlite3_result_text(sqlite3_context*, const char*, int, void(*)(void*));
In the main query that we've used in our tests, above libsqlite3 function is called 1.1 Million times.
The times of this query, running under the different options are:
Python/APSW: 40s pypy/MSPW: 2m 3s pypy/APSW: 2m 21s
Best regards,
l.
[*] https://code.google.com/p/madis/ [*] https://code.google.com/p/apsw/ _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Hi Quick question - can you post your benchmarks somewhere so I can try them? (I'll answer the rest of your mail separately) Cheers, fijal
We cannot publish the benchmark that we used due to its usage of non public data. We'll prepare a benchmark which uses synthetic data, nevertheless it'll have similar performance characteristics to the previous one, and we'll provide it to you. Many many thanks, l. On 11/12/2012 7:48 μμ, Maciej Fijalkowski wrote:
Hi
Quick question - can you post your benchmarks somewhere so I can try them? (I'll answer the rest of your mail separately)
Cheers, fijal
On Wed, Dec 12, 2012 at 11:00 PM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
We cannot publish the benchmark that we used due to its usage of non public data.
We'll prepare a benchmark which uses synthetic data, nevertheless it'll have similar performance characteristics to the previous one, and we'll provide it to you.
Many many thanks,
Thanks!
l.
On 11/12/2012 7:48 μμ, Maciej Fijalkowski wrote:
Hi
Quick question - can you post your benchmarks somewhere so I can try them? (I'll answer the rest of your mail separately)
Cheers, fijal
Hi, On Tue, Dec 11, 2012 at 6:00 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
Python/APSW: 40s pypy/MSPW: 2m 3s pypy/APSW: 2m 21s
Not horribly bad, given that we're comparing with APSW, which is a piece of C code: PyPy makes Python only 3 times slower than hand-crafted C. That was just a general comment; there might be more precise issues that can still be addressed (passing strings around, and callbacks, are two known sub-efficient things). A bientôt, Armin.
Here is the synthetic benchmark. To run it you'll need latest madIS. You can clone it using: hg clone https://code.google.com/p/madis/ For running the test you can use: CPython 2.7 + APSW: https://code.google.com/p/apsw/ Or pypy + APSW (you'll need to clone APSW for it to build on pypy): hg clone https://code.google.com/p/apsw/ Or pypy + MSPW. You'll find attached a "mspw.py". To use it *rename* it to "apsw.py" and put it in pypy's "site-packages" directory. For MSPW to work in pypy, you'll also need CFFI and "libsqlite3" installed. Here i should note, that MSPW is in a extremely rough state. Our main focus at this time is to find out how fast we can make it go. So right now MSPW doesn't do much error checking. Expect segmentation faults if you try a query in mterm and something is wrong with the query. To run the test with pypy/python: pypy mterm.py < mspw_bench.sql or python mterm.py < mspw_bench.sql For some reason pypy + APSW throws an exception when it finishes running mspw_bench, but the timings should be reliable. The timings of "mspw_bench" that we get are: CPython 2.7 + APSW: ~ 1.5 sec pypy + APSW: ~ 9 sec pypy + MSPW: ~4.5 sec There are two ways to adjust the processing load of mspw_bench. One is to change the value in "range(xxxxx)". This will in essence create a bigger/smaller "synthetic text". This puts more pressure on CPython's/pypy's side. The other way is to adjust the window size of textwindow(t, xx, xx). This puts more pressure on the wrapper (APSW/MSPW) infrastructure because it changes the number of columns that CPython/pypy have to send to SQLite (they are send on value at a time). Both Ioannis Foufoulas and me are available to help with the benchmark. Many many thanks, lefteris.
On Fri, Dec 14, 2012 at 7:25 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
Here is the synthetic benchmark.
To run it you'll need latest madIS. You can clone it using:
hg clone https://code.google.com/p/madis/
For running the test you can use:
CPython 2.7 + APSW:
https://code.google.com/p/apsw/
Or pypy + APSW (you'll need to clone APSW for it to build on pypy):
hg clone https://code.google.com/p/apsw/
Or pypy + MSPW. You'll find attached a "mspw.py". To use it *rename* it to "apsw.py" and put it in pypy's "site-packages" directory. For MSPW to work in pypy, you'll also need CFFI and "libsqlite3" installed.
Here i should note, that MSPW is in a extremely rough state. Our main focus at this time is to find out how fast we can make it go. So right now MSPW doesn't do much error checking. Expect segmentation faults if you try a query in mterm and something is wrong with the query.
To run the test with pypy/python:
pypy mterm.py < mspw_bench.sql
or
python mterm.py < mspw_bench.sql
For some reason pypy + APSW throws an exception when it finishes running mspw_bench, but the timings should be reliable.
The timings of "mspw_bench" that we get are:
CPython 2.7 + APSW: ~ 1.5 sec pypy + APSW: ~ 9 sec pypy + MSPW: ~4.5 sec
There are two ways to adjust the processing load of mspw_bench.
One is to change the value in "range(xxxxx)". This will in essence create a bigger/smaller "synthetic text". This puts more pressure on CPython's/pypy's side.
The other way is to adjust the window size of textwindow(t, xx, xx). This puts more pressure on the wrapper (APSW/MSPW) infrastructure because it changes the number of columns that CPython/pypy have to send to SQLite (they are send on value at a time).
Both Ioannis Foufoulas and me are available to help with the benchmark.
Many many thanks,
lefteris.
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Hi. For what is worth roughly 1/3 of the time is spent importing all the things. This is done in the compilation step in the ASPW, so please try running the select few times. Another slightly worrying thing is that a lot of time is spent doing utf8 decoding. Can you explain what in the SQL statement requires UTF8 conversion? Cheers, fijal
On 15/12/2012 12:00 πμ, Maciej Fijalkowski wrote:
Hi.
For what is worth roughly 1/3 of the time is spent importing all the things. This is done in the compilation step in the ASPW, so please try running the select few times. Another slightly worrying thing is that a lot of time is spent doing utf8 decoding. Can you explain what in the SQL statement requires UTF8 conversion?
Cheers, fijal
Concerning running the select many times, you are right. We'll try to pay attention to it in the future. This is easily achieved by copying the query in mspw_bench many times, or increasing the "range" value so the processing load will be bigger compared to all the other things. When MSPW sends something to SQLite it has to encode it to UTF-8 (default SQLite character encoding). When it gets it back it has to convert it back to Python's unicode. l.
On Sat, Dec 15, 2012 at 2:56 AM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
On 15/12/2012 12:00 πμ, Maciej Fijalkowski wrote:
Hi.
For what is worth roughly 1/3 of the time is spent importing all the things. This is done in the compilation step in the ASPW, so please try running the select few times. Another slightly worrying thing is that a lot of time is spent doing utf8 decoding. Can you explain what in the SQL statement requires UTF8 conversion?
Cheers, fijal
Concerning running the select many times, you are right. We'll try to pay attention to it in the future.
This is easily achieved by copying the query in mspw_bench many times, or increasing the "range" value so the processing load will be bigger compared to all the other things.
When MSPW sends something to SQLite it has to encode it to UTF-8 (default SQLite character encoding). When it gets it back it has to convert it back to Python's unicode.
l.
And ASPW does the same right? I understand the general need for UTF8, I just didn't find it in this particular query.
Hi, On Sat, Dec 15, 2012 at 7:51 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
And ASPW does the same right? I understand the general need for UTF8, I just didn't find it in this particular query.
Fwiw, I wonder again if we couldn't have all our unicode strings internally be UTF8 instead of 2- or 4-bytes strings. This would mean a W_UTF8UnicodeObject class that has both a reference to the RPython string and some optional extra data to make it faster to locate the n'th character or the total unicode length. (We discussed it on IRC some time ago.) A bientôt, Armin.
On Sat, Dec 15, 2012 at 12:27 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Sat, Dec 15, 2012 at 7:51 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
And ASPW does the same right? I understand the general need for UTF8, I just didn't find it in this particular query.
Fwiw, I wonder again if we couldn't have all our unicode strings internally be UTF8 instead of 2- or 4-bytes strings. This would mean a W_UTF8UnicodeObject class that has both a reference to the RPython string and some optional extra data to make it faster to locate the n'th character or the total unicode length. (We discussed it on IRC some time ago.)
A bientôt,
Armin.
Some sort of string strategies? like "we know it's ascii" or so as well?
Hi, On Sat, Dec 15, 2012 at 12:00 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Some sort of string strategies? like "we know it's ascii" or so as well?
Something like that, but that "strategy" would be the only one needed. We don't need ascii-only nor two-bytes-only strategies (nor, of course, 4-bytes strategy) if we have the general utf8 strategy, and even the latin1-only strategy is probably not worth it. This utf8 strategy is useful in any program that handles unicodes by converting it from/to utf8, which in the recent years has become the de facto standard for new programs. A bientôt, Armin.
On 12/15/2012 11:27 AM Armin Rigo wrote:
Hi,
On Sat, Dec 15, 2012 at 7:51 AM, Maciej Fijalkowski<fijall@gmail.com> wrote:
And ASPW does the same right? I understand the general need for UTF8, I just didn't find it in this particular query.
Fwiw, I wonder again if we couldn't have all our unicode strings internally be UTF8 instead of 2- or 4-bytes strings. This would mean a W_UTF8UnicodeObject class that has both a reference to the RPython string and some optional extra data to make it faster to locate the n'th character or the total unicode length. (We discussed it on IRC some time ago.)
A bientôt,
Armin.
Since >>> for i in range(256): assert chr(i).decode('latin1') == unichr(i) I wonder whether something could be gained by having an alternative internal unicode representation in the form of latin1 8-bit byte strings. ISTM a lot of English speaking and western European locales would hardly ever need anything else, and generating code to tag and use/transform alternative representations would be an internal optimization matter. I suppose some apps could well result in 8, 16, and 32-bit unicodes and utf8 all coexisting under the hood, but only when actually needed. Regards, Bengt Richter
Hi Bengt, On Sat, Dec 15, 2012 at 12:14 PM, Bengt Richter <bokr@oz.net> wrote:
I wonder whether something could be gained by having an alternative internal unicode representation in the form of latin1 8-bit byte strings.
This has been already implemented in CPython 3.x, based on earlier experiments on PyPy. What has not been tried is to have utf-8. A bientôt, Armin.
We had a "bug" in our previous benchmark ("mspw_bench.sql"). The way it was written allowed SQLite to short-circuit column data retrieval, ending up with minimal exercising of the CFFI layer. The attached query exercises CFFI as it should. We also checked its profiling characteristics, and it is a lot closer to what we are seen with our working query loads. Below are timings when running it from inside mterm (so there is no import overhead): CPython + APSW: 0 sec 956 msec pypy + APSW: 8 sec 673 msec pypy + MSPW: 2 sec 550 msec Best regards. lefteris.
On Mon, Dec 17, 2012 at 3:46 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
We had a "bug" in our previous benchmark ("mspw_bench.sql"). The way it was written allowed SQLite to short-circuit column data retrieval, ending up with minimal exercising of the CFFI layer.
I thought it was a feature :)
The attached query exercises CFFI as it should. We also checked its profiling characteristics, and it is a lot closer to what we are seen with our working query loads.
Below are timings when running it from inside mterm (so there is no import overhead):
CPython + APSW: 0 sec 956 msec pypy + APSW: 8 sec 673 msec pypy + MSPW: 2 sec 550 msec
Best regards.
lefteris.
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
On Mon, Dec 17, 2012 at 10:42 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Mon, Dec 17, 2012 at 3:46 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
We had a "bug" in our previous benchmark ("mspw_bench.sql"). The way it was written allowed SQLite to short-circuit column data retrieval, ending up with minimal exercising of the CFFI layer.
I thought it was a feature :)
Hi Elefterios. We've been working towards improving the situation of this benchmark. There is a branch done by antonio cuni and there are a few smaller things that will be improved in some time. Stay tuned. Cheers, fijal
On 26/12/2012 12:48 μμ, Maciej Fijalkowski wrote:
On Mon, Dec 17, 2012 at 10:42 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Mon, Dec 17, 2012 at 3:46 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
We had a "bug" in our previous benchmark ("mspw_bench.sql"). The way it was written allowed SQLite to short-circuit column data retrieval, ending up with minimal exercising of the CFFI layer.
I thought it was a feature :)
Hi Elefterios.
We've been working towards improving the situation of this benchmark. There is a branch done by antonio cuni and there are a few smaller things that will be improved in some time. Stay tuned.
Cheers, fijal
Thank you for looking into this part of pypy's performance. Whenever something reaches a testable state we would be glad to test/benchmark it. On another front. We tried using SQLite's UTF-16 API to avoid doing the conversion to UTF-8 whenever we returned a string from pypy to SQLite. We used function "sqlite3_result_text16": http://www.sqlite.org/c3ref/result_blob.html defining it in CFFI as: void sqlite3_result_text16(sqlite3_context*, wchar_t*, int, void(*)(void*)); The problem was that giving directly a pypy string to above function (using wchar_t type), only the first character of the string was passed to SQLite. The only way to successfully pass a string to SQLite was by explicitly converting/encoding it to utf-16. So the question i have is, does pypy keep its strings internally to utf-16? Thanks again (and a happy new year) to all. l.
On Wed, Dec 26, 2012 at 5:24 PM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
On 26/12/2012 12:48 μμ, Maciej Fijalkowski wrote:
On Mon, Dec 17, 2012 at 10:42 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Mon, Dec 17, 2012 at 3:46 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
We had a "bug" in our previous benchmark ("mspw_bench.sql"). The way it was written allowed SQLite to short-circuit column data retrieval, ending up with minimal exercising of the CFFI layer.
I thought it was a feature :)
Hi Elefterios.
We've been working towards improving the situation of this benchmark. There is a branch done by antonio cuni and there are a few smaller things that will be improved in some time. Stay tuned.
Cheers, fijal
Thank you for looking into this part of pypy's performance. Whenever something reaches a testable state we would be glad to test/benchmark it.
On another front. We tried using SQLite's UTF-16 API to avoid doing the conversion to UTF-8 whenever we returned a string from pypy to SQLite. We used function "sqlite3_result_text16":
http://www.sqlite.org/c3ref/result_blob.html
defining it in CFFI as:
void sqlite3_result_text16(sqlite3_context*, wchar_t*, int, void(*)(void*));
The problem was that giving directly a pypy string to above function (using wchar_t type), only the first character of the string was passed to SQLite.
The only way to successfully pass a string to SQLite was by explicitly converting/encoding it to utf-16.
So the question i have is, does pypy keep its strings internally to utf-16?
Thanks again (and a happy new year) to all.
l.
Are you talking about strings or unicodes? that is, type str or unicode?
On 26/12/2012 6:19 μμ, Maciej Fijalkowski wrote:
On Wed, Dec 26, 2012 at 5:24 PM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
Thank you for looking into this part of pypy's performance. Whenever something reaches a testable state we would be glad to test/benchmark it.
On another front. We tried using SQLite's UTF-16 API to avoid doing the conversion to UTF-8 whenever we returned a string from pypy to SQLite. We used function "sqlite3_result_text16":
http://www.sqlite.org/c3ref/result_blob.html
defining it in CFFI as:
void sqlite3_result_text16(sqlite3_context*, wchar_t*, int, void(*)(void*));
The problem was that giving directly a pypy string to above function (using wchar_t type), only the first character of the string was passed to SQLite.
The only way to successfully pass a string to SQLite was by explicitly converting/encoding it to utf-16.
So the question i have is, does pypy keep its strings internally to utf-16?
Thanks again (and a happy new year) to all.
l.
Are you talking about strings or unicodes? that is, type str or unicode?
For unicode only. For regular type == str we use SQLite's sqlite3_result_text without doing any conversion/encoding at all. l.
Hi Elefterios, On Wed, Dec 26, 2012 at 4:24 PM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
The problem was that giving directly a pypy string to above function (using wchar_t type), only the first character of the string was passed to SQLite.
Do you mean you unexpectedly got only the first character? That would be a bug. Can you tell me what you did? Normally, you should be able to pass a unicode object directly to a "wchar_t *" argument and get a complete wchar_t-encoded string. (Note that wchar_t is 16-bit on Windows but 32-bit on Linux, for example.) A bientôt, Armin.
On 27/12/12 11:44, Armin Rigo wrote:
Normally, you should be able to pass a unicode object directly to a "wchar_t *" argument and get a complete wchar_t-encoded string. (Note that wchar_t is 16-bit on Windows but 32-bit on Linux, for example.)
Armin thanks for the clarification. From your note, i see what went wrong. We were directly passing pypy's UTF-32 (we are testing on linux) to SQLite's UTF-16 API. So this is why we only got the first character on the SQLite side. Also linux's 32 bit wchars are unfortunate in our case, because SQLite only supports UTF-8 and UTF-16 in their API. So there is no way on linux to directly pass pypy's unicode strings to SQLite. l.
I might be wrong but cpython uses UCS internally on unicode, so even if sqlite had UTF-32 it would not work. On Thu, Dec 27, 2012 at 11:33 AM, Elefterios Stamatogiannakis < estama@gmail.com> wrote:
On 27/12/12 11:44, Armin Rigo wrote:
Normally, you should be able to pass a unicode object directly to a "wchar_t *" argument and get a complete wchar_t-encoded string. (Note that wchar_t is 16-bit on Windows but 32-bit on Linux, for example.)
Armin thanks for the clarification.
From your note, i see what went wrong. We were directly passing pypy's UTF-32 (we are testing on linux) to SQLite's UTF-16 API. So this is why we only got the first character on the SQLite side.
Also linux's 32 bit wchars are unfortunate in our case, because SQLite only supports UTF-8 and UTF-16 in their API. So there is no way on linux to directly pass pypy's unicode strings to SQLite.
l.
______________________________**_________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/**mailman/listinfo/pypy-dev<http://mail.python.org/mailman/listinfo/pypy-dev>
-- Leonardo Santagada
Hi, On Thu, Dec 27, 2012 at 2:33 PM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
Also linux's 32 bit wchars are unfortunate in our case, because SQLite only supports UTF-8 and UTF-16 in their API. So there is no way on linux to directly pass pypy's unicode strings to SQLite.
Yes, given that pypy strings use UTF-32 on Linux, they are not compatible. Note that CPython normally also uses UTF-32 on Linux (at least in most major distributions), but that can be configured at compile-time. Leonardo: UTF-32 and UCS-4 are the same thing; the difference is only UCS-2 which is a subset of UTF-16. A bientôt, Armin.
On 15/12/2012 8:51 πμ, Maciej Fijalkowski wrote:
On Sat, Dec 15, 2012 at 2:56 AM, Elefterios Stamatogiannakis <estama@gmail.com> wrote:
On 15/12/2012 12:00 πμ, Maciej Fijalkowski wrote:
Hi.
For what is worth roughly 1/3 of the time is spent importing all the things. This is done in the compilation step in the ASPW, so please try running the select few times. Another slightly worrying thing is that a lot of time is spent doing utf8 decoding. Can you explain what in the SQL statement requires UTF8 conversion?
Cheers, fijal
Concerning running the select many times, you are right. We'll try to pay attention to it in the future.
This is easily achieved by copying the query in mspw_bench many times, or increasing the "range" value so the processing load will be bigger compared to all the other things.
When MSPW sends something to SQLite it has to encode it to UTF-8 (default SQLite character encoding). When it gets it back it has to convert it back to Python's unicode.
l.
And ASPW does the same right? I understand the general need for UTF8, I just didn't find it in this particular query.
Yes regarding UTF8, ASPW does the same. l.
On 15/12/2012 12:00 πμ, Maciej Fijalkowski wrote:
Hi.
For what is worth roughly 1/3 of the time is spent importing all the things. This is done in the compilation step in the ASPW, so please try running the select few times. Another slightly worrying thing is that a lot of time is spent doing utf8 decoding. Can you explain what in the SQL statement requires UTF8 conversion?
Cheers, fijal
Also i should point out that most of the time we run our tests from inside the mterm console (by copy pasting), so the timings are not affected by the module import. And conserning the utf8 conversions, our data are multilingual so we need to support utf8 throughout madIS. Regards, l.
participants (7)
-
Armin Rigo
-
Bengt Richter
-
Carl Friedrich Bolz
-
Elefterios Stamatogiannakis
-
Eleytherios Stamatogiannakis
-
Leonardo Santagada
-
Maciej Fijalkowski