[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Neal Becker
ndbecker2 at gmail.com
Sat Mar 22 21:47:02 EDT 2008
Thomas Grill wrote:
> Hi,
> here's my results:
>
> Intel Core 2 Duo, 2.16GHz, 667MHz bus, 4MB Cache
> running under OSX 10.5.2
>
> please note that the auto-vectorizer of gcc-4.3 is doing really well....
>
> gr~~~
>
> ---------------------
>
> gcc version 4.0.1 (Apple Inc. build 5465)
>
> xbook-2:temp thomas$ gcc -msse -O2 vec_bench.c -o vec_bench
> xbook-2:temp thomas$ ./vec_bench
> Testing methods...
> All OK
>
> Problem size Simple Intrin
> Inline
> 100 0.0002ms (100.0%) 0.0001ms ( 83.2%) 0.0001ms (
> 85.1%)
> 1000 0.0014ms (100.0%) 0.0014ms ( 99.5%) 0.0014ms (
> 97.6%)
> 10000 0.0180ms (100.0%) 0.0137ms ( 76.1%) 0.0103ms (
> 56.9%)
> 100000 0.1307ms (100.0%) 0.1153ms ( 88.2%) 0.0952ms (
> 72.8%)
> 1000000 4.0309ms (100.0%) 4.1641ms (103.3%) 4.0129ms (
> 99.6%)
> 10000000 43.2557ms (100.0%) 43.5919ms (100.8%) 42.6391ms (
> 98.6%)
>
>
>
> gcc version 4.3.0 20080125 (experimental) (GCC)
>
> xbook-2:temp thomas$ gcc-4.3 -msse -O2 vec_bench.c -o vec_bench
> xbook-2:temp thomas$ ./vec_bench
> Testing methods...
> All OK
>
> Problem size Simple Intrin
> Inline
> 100 0.0002ms (100.0%) 0.0001ms ( 77.4%) 0.0001ms (
> 72.0%)
> 1000 0.0017ms (100.0%) 0.0014ms ( 84.4%) 0.0014ms (
> 79.4%)
> 10000 0.0173ms (100.0%) 0.0148ms ( 85.4%) 0.0104ms (
> 59.9%)
> 100000 0.1276ms (100.0%) 0.1243ms ( 97.4%) 0.0952ms (
> 74.6%)
> 1000000 4.0466ms (100.0%) 4.1168ms (101.7%) 4.0348ms (
> 99.7%)
> 10000000 43.1842ms (100.0%) 43.2989ms (100.3%) 44.2171ms
> (102.4%)
>
> xbook-2:temp thomas$ gcc-4.3 -msse -O2 -ftree-vectorize vec_bench.c -o
> vec_bench xbook-2:temp thomas$ ./vec_bench
> Testing methods...
> All OK
>
> Problem size Simple Intrin
> Inline
> 100 0.0001ms (100.0%) 0.0001ms (126.6%) 0.0001ms
> (120.3%)
> 1000 0.0011ms (100.0%) 0.0014ms (136.3%) 0.0014ms
> (127.9%)
> 10000 0.0144ms (100.0%) 0.0153ms (106.3%) 0.0103ms (
> 72.0%)
> 100000 0.1027ms (100.0%) 0.1243ms (121.0%) 0.0953ms (
> 92.8%)
> 1000000 3.9691ms (100.0%) 4.1197ms (103.8%) 4.0252ms
> (101.4%)
> 10000000 42.1922ms (100.0%) 43.6721ms (103.5%) 43.4035ms
> (102.9%)
gcc version 4.3.0 20080307 (Red Hat 4.3.0-2) (GCC)
gcc -msse -O2 -ftree-vectorize vec_bench.c -o vec_bench
mock-chroot> ./vec_bench
Testing methods...
All OK
Problem size Simple Intrin Inline
100 0.0001ms (100.0%) 0.0001ms (141.6%) 0.0001ms (108.0%)
1000 0.0008ms (100.0%) 0.0011ms (149.9%) 0.0008ms (100.4%)
10000 0.0135ms (100.0%) 0.0197ms (145.8%) 0.0133ms ( 98.8%)
100000 0.6415ms (100.0%) 0.4918ms ( 76.7%) 0.5052ms ( 78.8%)
1000000 7.5364ms (100.0%) 7.9987ms (106.1%) 7.4832ms ( 99.3%)
10000000 76.3927ms (100.0%) 76.8933ms (100.7%) 75.1002ms ( 98.3%)
model name : AMD Athlon(tm) 64 Processor 3200+
stepping : 10
cpu MHz : 2000.068
cache size : 1024 KB
Now same, but with gcc --version
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
Testing methods...
All OK
Problem size Simple Intrin Inline
100 0.0002ms (100.0%) 0.0001ms ( 77.2%) 0.0001ms ( 58.7%)
1000 0.0015ms (100.0%) 0.0011ms ( 73.5%) 0.0008ms ( 52.6%)
10000 0.0214ms (100.0%) 0.0195ms ( 90.9%) 0.0363ms (169.3%)
100000 0.6620ms (100.0%) 0.5614ms ( 84.8%) 0.5527ms ( 83.5%)
1000000 7.5975ms (100.0%) 7.3826ms ( 97.2%) 7.3380ms ( 96.6%)
10000000 75.8361ms (100.0%) 84.0476ms (110.8%) 77.2884ms (101.9%)
More information about the NumPy-Discussion
mailing list