So there are a bunch of compiling methods/software's for the P2, from Basic to C; but how do they each fair in a pin toggling speed test.
I suspect PASM will destroy all, but will it ?
Anyone have a fast high speed frequency counter who can test each software at toggling a pin and record the results here?
Comments
Is using smartpins cheating? If not, then even the slowest P2 language can toggle a pin at the theoretical maximum rate of sysclk/2. The hard part of fast I/O on the P2 tends to be more a matter of getting the cog to dance in step with the smartpins rather than of just manually twiddling pins as fast as possible, which is nowhere near as fast.
I am thinking more of the compiled code than the direct ability of the P2.
That challenge is too easy for Flexspin. Flexspin will compile all its supported languages to the same sequence of instructions. Which will be the same speed as hand crafted Pasm. That's because there is primitives added in each language for the DRVx/FLTL instructions, eg: Spin uses PINH/PINL/PINT/PINF.
The only question mark is how to trigger the REP instruction compiler optimisation.
Here's the Pasm for bit-bashing at sysclock/4 pin frequency - Assuming it's running in cogRAM.
rep @.rend, #0 ' loop forever drvnot #pinnum .rend
An equivalent (same speed) would be:
rep @.rend, #0 ' loop forever drvh #pinnum drvl #pinnum .rend
Yep, here's an example:
void main(void) { while(1) { _pinh(56); _pinl(56); } }
And the resulting compiled .p2asm:
_main ' { ' while(1) callpa #(@LR__0003-@LR__0001)>>2,fcache_load_ptr_ LR__0001 rep @LR__0004, #0 LR__0002 drvh #56 drvl #56 LR__0003 LR__0004
So that CALLPA even automatically drops the loop into cogRAM for max speed.
The really interesting question is, how can you split a task into 8, which can run on 8 cogs in parallel? I have not yet seen a compiler, which can do this?
You haven't seen a compiler that can do this because, in the general case, it's mathematically impossible...
Nine women can't make a baby in one month.
Catalina can.
Here is a a trivial implementation of the Sieve of Eratosthenes, which Catalina speeds up by a factor of 4 by spreading the algorithm across multiple cogs (in general, the actual speed up factor depends on both the algorithm and the number of available cogs):
/****************************************************************************** * * * The Sieve of Eratosthenes * * * * This is a "classic" version of the sieve program, augmented * * with the new Catalina Multi-processing pragmas, which will * * enable multi-processing if this program is compiled using the * * Catalina compiler, but ignored if it is compiled with another * * compiler. Multi-processing typically improves the program * * performance by 3 or 4 times (depending on the number of cogs * * available). * * * * Commands to compile this program as a serial program on a P2 might be: * * * * catalina -p2 sieve.c -lci -O5 * * * * Commands to compile this program as a parallel program on a P2 might be: * * * * catalina -p2 -Z sieve.c -lthreads -lci -O5 * * * ******************************************************************************/ #include <stdio.h> #include <stdlib.h> // define the size of the sieve (if not already defined): #ifndef SIEVE_SIZE #if defined(__P2__)||defined(__CATALINA_P2) #define SIEVE_SIZE 400000 #else #define SIEVE_SIZE 12000 #endif #endif unsigned char *primes = NULL; #pragma propeller worker(unsigned long i) local(unsigned long j) stack(60) // main : allocate and initialize the sieve, then eliminate all multiples // of primes, then print the time taken, and all the resulting primes. void main(void){ unsigned long i, j; unsigned long k = 1; unsigned long count; // allocate a byte array of suitable size primes = malloc(SIEVE_SIZE); if (primes == NULL) { // cannot allocate array exit(1); } // initialize sieve array to zero for (i = 0; i < SIEVE_SIZE; i++) { primes[i] = 0; } t_printf("starting ...\n"); #pragma propeller start // remember starting time count = _cnt(); // eliminate multiples of primes for (i = 2; i < SIEVE_SIZE/2; i++) { if (primes[i] == 0) { #pragma propeller begin for (j = 2; i*j < SIEVE_SIZE; j++) { primes[i*j] = 1; } #pragma propeller end } } #pragma propeller wait // calculate time taken count = _cnt() - count; t_printf("... done - %ld clocks\n", count); t_printf("\npress a key to see results\n"); k_wait(); // print the resulting primes, starting from 2 for (i = 2; i < SIEVE_SIZE; i++) { if (primes[i] == 0) { t_printf("prime(%d)= %d, ", k++, i); } } while(1); }
In the Catalina 'Parallelizer' documentation, you will find another example - speeding up a Fast Fourier Transform algorithm. In that case, the speed improvement is about 2.5 times.
Ross.
Ross,
Just been looking at Catalina. You've got something strange going on with the .TXT doc files. Every time Sometimes when you've used the double quote character, your editor seems to be inserting something not from ASCII. And it doesn't simply convert to ASCII in my editor.
Eg:
xcopy /e /i %LCCDIR%\demos %HOMEPATH%\demos\
EDIT: Ah, reply to this message to see it. Its code is 0x94.
EDIT2: Actually, might be just README_P2.TXT alone. I just happened to look at this one first.
Yes, I see what you mean. Some spurious binary characters have ended up in that file, with value 0x94. Only that file, it seems - probably from cutting and pasting characters from a terminal window.
Attached is the file with those characters removed.
Thanks,
Ross.
Oops: Forgot to add back in the double quotes - file updated again!
Thanks for the information!
Christof