Optimized version (-O2), long sincos

[TRC][../src/core/dsp.cpp: 315] init: Initializing DSP
[TRC][../src/core/dsp/x86.cpp:  38] dsp_init: Optimizing system with some assembly code
[TRC][../src/core/dsp/sse.cpp:  51] dsp_init: Optimizing DSP for SSE instruction set
[TRC][../src/core/dsp/sse3.cpp:  27] dsp_init: Optimizing DSP for SSE3 instruction set
Testing FFT of size 256 (rank = 8)...
Time = 30.0 s, iterations = 10995000, performance = 366480.5 [i/s], average time = 0.00273 [ms/i]
Testing FFT of size 512 (rank = 9)...
Time = 30.0 s, iterations = 5311000, performance = 177029.5 [i/s], average time = 0.00565 [ms/i]
Testing FFT of size 1024 (rank = 10)...
Time = 30.0 s, iterations = 2494000, performance = 83100.6 [i/s], average time = 0.01203 [ms/i]
Testing FFT of size 2048 (rank = 11)...
Time = 30.0 s, iterations = 1157000, performance = 38563.2 [i/s], average time = 0.02593 [ms/i]
Testing FFT of size 4096 (rank = 12)...
Time = 30.0 s, iterations = 535000, performance = 17831.2 [i/s], average time = 0.05608 [ms/i]
Testing FFT of size 8192 (rank = 13)...
Time = 30.1 s, iterations = 216000, performance = 7184.2 [i/s], average time = 0.13919 [ms/i]
Testing FFT of size 16384 (rank = 14)...
Time = 30.1 s, iterations = 86000, performance = 2861.0 [i/s], average time = 0.34953 [ms/i]
Testing FFT of size 32768 (rank = 15)...
Time = 30.6 s, iterations = 38000, performance = 1240.5 [i/s], average time = 0.80616 [ms/i]
Testing FFT of size 65536 (rank = 16)...
Time = 31.5 s, iterations = 16000, performance = 508.7 [i/s], average time = 1.96573 [ms/i]

Optimized version (-O2), shortened sincos

[TRC][../src/core/dsp.cpp: 315] init: Initializing DSP
[TRC][../src/core/dsp/x86.cpp:  38] dsp_init: Optimizing system with some assembly code
[TRC][../src/core/dsp/sse.cpp:  51] dsp_init: Optimizing DSP for SSE instruction set
[TRC][../src/core/dsp/sse3.cpp:  27] dsp_init: Optimizing DSP for SSE3 instruction set
Testing FFT of size 256 (rank = 8)...
Time = 30.0 s, iterations = 10845000, performance = 361483.2 [i/s], average time = 0.00277 [ms/i]
Testing FFT of size 512 (rank = 9)...
Time = 30.0 s, iterations = 5246000, performance = 174843.8 [i/s], average time = 0.00572 [ms/i]
Testing FFT of size 1024 (rank = 10)...
Time = 30.0 s, iterations = 2584000, performance = 86116.9 [i/s], average time = 0.01161 [ms/i]
Testing FFT of size 2048 (rank = 11)...
Time = 30.0 s, iterations = 1190000, performance = 39647.9 [i/s], average time = 0.02522 [ms/i]
Testing FFT of size 4096 (rank = 12)...
Time = 30.0 s, iterations = 540000, performance = 17978.9 [i/s], average time = 0.05562 [ms/i]
Testing FFT of size 8192 (rank = 13)...
Time = 30.0 s, iterations = 210000, performance = 6990.6 [i/s], average time = 0.14305 [ms/i]
Testing FFT of size 16384 (rank = 14)...
Time = 30.2 s, iterations = 85000, performance = 2818.8 [i/s], average time = 0.35476 [ms/i]
Testing FFT of size 32768 (rank = 15)...
Time = 30.1 s, iterations = 37000, performance = 1228.6 [i/s], average time = 0.81397 [ms/i]
Testing FFT of size 65536 (rank = 16)...
Time = 31.3 s, iterations = 16000, performance = 511.9 [i/s], average time = 1.95369 [ms/i]

Optimized version (-O2), no prefetch

[TRC][../src/core/dsp.cpp: 315] init: Initializing DSP
[TRC][../src/core/dsp/x86.cpp:  38] dsp_init: Optimizing system with some assembly code
[TRC][../src/core/dsp/sse.cpp:  51] dsp_init: Optimizing DSP for SSE instruction set
[TRC][../src/core/dsp/sse3.cpp:  27] dsp_init: Optimizing DSP for SSE3 instruction set
Testing FFT of size 256 (rank = 8)...
Time = 30.0 s, iterations = 11225000, performance = 374149.2 [i/s], average time = 0.00267 [ms/i]
Testing FFT of size 512 (rank = 9)...
Time = 30.0 s, iterations = 5386000, performance = 179501.8 [i/s], average time = 0.00557 [ms/i]
Testing FFT of size 1024 (rank = 10)...
Time = 30.0 s, iterations = 2601000, performance = 86668.8 [i/s], average time = 0.01154 [ms/i]
Testing FFT of size 2048 (rank = 11)...
Time = 30.0 s, iterations = 1201000, performance = 40016.1 [i/s], average time = 0.02499 [ms/i]
Testing FFT of size 4096 (rank = 12)...
Time = 30.0 s, iterations = 525000, performance = 17498.8 [i/s], average time = 0.05715 [ms/i]
Testing FFT of size 8192 (rank = 13)...
Time = 30.1 s, iterations = 213000, performance = 7072.2 [i/s], average time = 0.14140 [ms/i]
Testing FFT of size 16384 (rank = 14)...
Time = 30.2 s, iterations = 87000, performance = 2881.2 [i/s], average time = 0.34708 [ms/i]
Testing FFT of size 32768 (rank = 15)...
Time = 30.1 s, iterations = 38000, performance = 1261.4 [i/s], average time = 0.79276 [ms/i]
Testing FFT of size 65536 (rank = 16)...
Time = 30.2 s, iterations = 16000, performance = 530.1 [i/s], average time = 1.88647 [ms/i]

Optimized version (-O2), with rotation instead of sines/cosines

[TRC][../src/core/dsp.cpp: 315] init: Initializing DSP
[TRC][../src/core/dsp/x86.cpp:  38] dsp_init: Optimizing system with some assembly code
[TRC][../src/core/dsp/sse.cpp:  51] dsp_init: Optimizing DSP for SSE instruction set
[TRC][../src/core/dsp/sse3.cpp:  27] dsp_init: Optimizing DSP for SSE3 instruction set
Testing FFT of size 256 (rank = 8)...
Time = 30.0 s, iterations = 24548000, performance = 818260.1 [i/s], average time = 0.00122 [ms/i]
Testing FFT of size 512 (rank = 9)...
Time = 30.0 s, iterations = 11391000, performance = 379670.6 [i/s], average time = 0.00263 [ms/i]
Testing FFT of size 1024 (rank = 10)...
Time = 30.0 s, iterations = 5506000, performance = 183525.1 [i/s], average time = 0.00545 [ms/i]
Testing FFT of size 2048 (rank = 11)...
Time = 30.0 s, iterations = 2264000, performance = 75465.8 [i/s], average time = 0.01325 [ms/i]
Testing FFT of size 4096 (rank = 12)...
Time = 30.0 s, iterations = 982000, performance = 32689.6 [i/s], average time = 0.03059 [ms/i]
Testing FFT of size 8192 (rank = 13)...
Time = 30.0 s, iterations = 336000, performance = 11197.9 [i/s], average time = 0.08930 [ms/i]
Testing FFT of size 16384 (rank = 14)...
Time = 30.1 s, iterations = 117000, performance = 3889.0 [i/s], average time = 0.25714 [ms/i]
Testing FFT of size 32768 (rank = 15)...
Time = 30.2 s, iterations = 49000, performance = 1622.8 [i/s], average time = 0.61622 [ms/i]
Testing FFT of size 65536 (rank = 16)...
Time = 30.4 s, iterations = 20000, performance = 656.9 [i/s], average time = 1.52219 [ms/i]

[TRC][../src/core/dsp.cpp: 315] init: Initializing DSP                                                                                                                                                                                       
[TRC][../src/core/dsp/x86.cpp:  38] dsp_init: Optimizing system with some assembly code                                                                                                                                                      
[TRC][../src/core/dsp/sse.cpp:  51] dsp_init: Optimizing DSP for SSE instruction set                                                                                                                                                         
[TRC][../src/core/dsp/sse3.cpp:  27] dsp_init: Optimizing DSP for SSE3 instruction set                                                                                                                                                       
Testing FFT of size 256 (rank = 8)...                                                                                                                                                                                                        
Time = 30.0 s, iterations = 27686000, performance = 922837.2 [i/s], average time = 0.00108 [ms/i]                                                                                                                                            
Testing FFT of size 512 (rank = 9)...                                                                                                                                                                                                        
Time = 30.0 s, iterations = 13042000, performance = 434721.9 [i/s], average time = 0.00230 [ms/i]                                                                                                                                            
Testing FFT of size 1024 (rank = 10)...                                                                                                                                                                                                      
Time = 30.0 s, iterations = 3465000, performance = 115466.6 [i/s], average time = 0.00866 [ms/i]                                                                                                                                             
Testing FFT of size 2048 (rank = 11)...                                                                                                                                                                                                      
Time = 30.0 s, iterations = 927000, performance = 30880.9 [i/s], average time = 0.03238 [ms/i]                                                                                                                                               
Testing FFT of size 4096 (rank = 12)...                                                                                                                                                                                                      
Time = 30.0 s, iterations = 341000, performance = 11349.7 [i/s], average time = 0.08811 [ms/i]                                                                                                                                               
Testing FFT of size 8192 (rank = 13)...                                                                                                                                                                                                      
Time = 30.2 s, iterations = 113000, performance = 3736.7 [i/s], average time = 0.26761 [ms/i]                                                                                                                                                
Testing FFT of size 16384 (rank = 14)...                                                                                                                                                                                                     
Time = 30.5 s, iterations = 44000, performance = 1441.4 [i/s], average time = 0.69376 [ms/i]                                                                                                                                                 
Testing FFT of size 32768 (rank = 15)...                                                                                                                                                                                                     
Time = 31.2 s, iterations = 18000, performance = 577.0 [i/s], average time = 1.73307 [ms/i]                                                                                                                                                  
Testing FFT of size 65536 (rank = 16)...                                                                                                                                                                                                     
Time = 30.2 s, iterations = 7000, performance = 231.6 [i/s], average time = 4.31749 [ms/i]