How to make this unpickling/sorting demo faster?
hdante
hdante at gmail.com
Thu Apr 17 23:33:48 EDT 2008
On Apr 17, 7:33 pm, Steve Bergman <sbergma... at gmail.com> wrote:
> to demonstrate how Python can combine simplicity, readability, *and*
> speed when the standard library is leveraged properly. So, in this
But that's not true. You're just creating an artificial example to
prove your point.
Consider this C code that does what you say. The code is very basic
and readable. In my machine, it takes a little longer than one second
to execute.
#include <stdio.h>
#include <stdlib.h>
#define size 1000000
struct float_pair {
double a;
double b;
};
static struct float_pair s[size];
int compare(const void *a, const void *b)
{
const struct float_pair *sa;
const struct float_pair *sb;
sa = a;
sb = b;
if (sa->b > sb->b) return 1;
if (sa->b == sb->b) return 0;
return -1;
}
int main(void)
{
FILE *f;
f = fopen("floats", "rb");
puts("Reading pairs...");
fread(s, sizeof(s), 1, f);
fclose(f);
qsort(s, size, sizeof(struct float_pair), compare);
puts("Done!");
return 0;
}
Is the code too big ? Yes. Can you make it faster in python ?
Probably not in the 10 minutes required to write this. The code that
generates the files is the following:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
struct float_pair {
double a;
double b;
};
static struct float_pair s[1000000];
int main(void)
{
FILE *f;
unsigned i;
f = fopen("floats", "wb");
srand(time(NULL));
for (i = 0; i < 1000000; i++) {
s[i].a = (double)rand()/RAND_MAX;
s[i].b = (double)rand()/RAND_MAX;
}
fwrite(s, sizeof(s), 1, f);
fclose(f);
return 0;
}
The binary file used is not portable, because the "double" type is
not standardized in C. However:
- if you restrict this code to run only on machines that follow the
IEEE standard, this code will probably be more portable than the
python one
- in python, there may be small rounding errors when exchanging the
data between machines, since "float" is not standardized either.
> case, calling C or assembler code would defeat the purpose, as would
> partitioning into smaller arrays. And my processor is single core.
>
> I started at about 7 seconds to just read in and sort, and at 20
> seconds to read, sort, and write. I can now read in, sort, and write
> back out in almost exactly 5 seconds, thanks to marshal and your
> suggestions.
>
> I'll look into struct and ctypes. Although I know *of* them, I'm not
> very familiar *with* them.
>
> Thank you, again, for your help.
More information about the Python-list
mailing list