How to make this unpickling/sorting demo faster?

Thu Apr 17 23:33:48 EDT 2008

On Apr 17, 7:33 pm, Steve Bergman <sbergma... at gmail.com> wrote:
> to demonstrate how Python can combine simplicity, readability, *and*
> speed when the standard library is leveraged properly.  So, in this

 But that's not true. You're just creating an artificial example to
prove your point.

 Consider this C code that does what you say. The code is very basic
and readable. In my machine, it takes a little longer than one second
to execute.

#include <stdio.h>
#include <stdlib.h>

#define size 1000000

struct float_pair {
	double a;
	double b;
};
static struct float_pair s[size];

int compare(const void *a, const void *b)
{
	const struct float_pair *sa;
	const struct float_pair *sb;
	sa = a;
	sb = b;
	if (sa->b > sb->b) return 1;
	if (sa->b == sb->b) return 0;
	return -1;
}

int main(void)
{
	FILE *f;
	f = fopen("floats", "rb");
	puts("Reading pairs...");
	fread(s, sizeof(s), 1, f);
	fclose(f);
	qsort(s, size, sizeof(struct float_pair), compare);
	puts("Done!");
	return 0;
}

 Is the code too big ? Yes. Can you make it faster in python ?
Probably not in the 10 minutes required to write this. The code that
generates the files is the following:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

struct float_pair {
	double a;
	double b;
};

static struct float_pair s[1000000];

int main(void)
{
	FILE *f;
	unsigned i;
	f = fopen("floats", "wb");
	srand(time(NULL));
	for (i = 0; i < 1000000; i++) {
		s[i].a = (double)rand()/RAND_MAX;
		s[i].b = (double)rand()/RAND_MAX;
	}
	fwrite(s, sizeof(s), 1, f);
	fclose(f);
	return 0;
}

 The binary file used is not portable, because the "double" type is
not standardized in C. However:
 - if you restrict this code to run only on machines that follow the
IEEE standard, this code will probably be more portable than the
python one
 - in python, there may be small rounding errors when exchanging the
data between machines, since "float" is not standardized either.

> case, calling C or assembler code would defeat the purpose, as would
> partitioning into smaller arrays.  And my processor is single core.
>
> I started at about 7 seconds to just read in and sort, and at 20
> seconds to read, sort, and write.  I can now read in, sort, and write
> back out in almost exactly 5 seconds, thanks to marshal and your
> suggestions.
>
> I'll look into struct and ctypes.  Although I know *of* them, I'm not
> very familiar *with* them.
>
> Thank you, again, for your help.