Seriously, is this the speed?
AmbientPower
Posts: 16
in Propeller 1
while(1)
{
high(23);
low(23);
}
This gives me a 31.6kHz output. How can this be so? Even with a rubbish compiler, I'd expect a bit better, and the default clock speed is supposed to be 80MHz. I'm missing the point of something here!
Comments
Try Eric's Flexspin C - https://github.com/totalspectrum/spin2cpp/releases
or FlexProp as the IDE - https://github.com/totalspectrum/flexprop/releases
@AmbientPower, It's not the compiler, it's the functions being used. Both the "high()" and "low()" functions from the Simple library are very slow - they do a lot of stuff that isn't strictly necessary, but make them extra foolproof.
You might try this instead:
or even faster:
I don't remember if these get compiled into a single PASM instruction or not. If not, you can force it like so:
All of the above was taken from https://github.com/parallaxinc/PropWare/blob/develop/PropWare/gpio/port.h
Thanks. That makes sense, although it's still not stellar.
The first example was 104kHz, the second 83.3kHz (slower because of the loop, I'm sure), and the last 167kHz.
The -1 is incorrect in the bit position algorithm by the way!
Thanks for that. I'm still testing, but I think I'm going to do better with an ARM micro or ESP32. The parallel cores was what attracted me, but I think even with the overhead of timebase processing, I can still run faster in a traditional device. With any peripherals having to be bit bashed, it means that I2C or SPI is going to be much slower than a device with on-chip peripherals.
That is sometimes true, for sure. With an 80 MHz clock, and needing to bit bash all of the serial protocols, you do have a limited transfer speed.
But, it's surprising what you can get with some clever inline assembly. We can reach up to 4 MHz SPI clocks and... I don't remember how high for I2C, but I'm pretty sure 1 MHz was easily within reach.
I ran some simple benchmarks for SD card read-write, focused entirely on comparing PropWare against the Simple libraries. I didn't record the raw read and write speeds, but in 3.978 seconds PropWare was able to mount the FAT filesystem, open two files (one for read, one for write) and then copy a 25.9 KB file over character-by-character. That amounts to an average copy speed of 52 Kbps, or an average communication speed of 104 Kbps. And all of that runs in just one cog, leaving another 7 cogs to do whatever else is needed, uninterrupted by the SPI/SD card comms.
UART performance testing showed that PropWare can handle up to 2.680 Mbps sustained throughput. Again, this runs entirely in one cog, allowing the other 7 to do whatever else you want.
You might want to look at PropWare's "serial" folder, which holds efficient bit bashing routines for UART, SPI, and I2C: https://github.com/parallaxinc/PropWare/tree/develop/PropWare/serial
By the way, PropWare's PropWare::Runnable might help you with concurrent programming.
And some shortcut links:
SPI routine for sending an arbitrary block of 8-bit words at a 4 MHz clock with a sustained write of 3.33 Mbps
UART routine for sending an arbitrary block of 8-bit words at 4.444 Mbaud with sustained throughput of 2.680 Mbps
20 MHz (minus small pause every 32 bits) is the max that's theoretically possible. Not sure if viable in inline ASM though (does GCC do fcached inline ASM?).
Which is why I'm a bit confused about my toggling the I/O bit using inline assembler only giving 167kHz. Doing SPI, you have the clock to do and stuff in-between!
Yea I'd swear I had an example of 20 MHz SPI somewhere, but couldn't find it for the life of me when I was searching through PropWare's code lol. GCC can do fcache + inline ASM, and that's exactly how the majority of PropWare's serial routines are written.
Looks like you'll need fcache + inline assembly to get any faster. Are you running CMM or LMM? Switching to LMM would probably give you a huge speed boost, but for something like this, CMM + fcache + inline assembly is a fantastic combination (majority of the code is not speed critical, but small tight functions need to be highly optimized).
In Tachyon forth, I tested a 1,000,000 loop that raises and lowers pin 10:-
: TEST LAP 1000000 0 DO 10 HIGH 10 LOW LOOP LAP .LAP ;
This executed at 80MHz clock in 5.4s, so the period of the pin 10 signal was about 5.4uS or 185.2kHz
Are you really running 80Mhz or are you maybe running in RCfast/slow without Crystal?
your results are in Khz and should be Mhz, something on your setup is wrong.
Did you set the clock frequency to 80Mhz?
Can you maybe create a listing file of the generated assembly?
Mike
There are some notes in https://forums.parallax.com/discussion/173495/putty-prop-plug-what-is-the-highest-baud-rate-feasible
Peak speeds of 20MHz is possible with peripherals HW assist, as said in #6.
Next steps down are a added NOP for SysCLK/8, or SysCLK/10 allows WAIT to be used for granular control, and that can also support fractional baud.
For more sustained burst slave UART use, COG local, I think something like 12M.8.M.2 allows 1 byte per microsecond, and gives more stop bit time to store the byte, and re-sync on the next start edge. Both ends will need to agree on how many bytes per burst.
I'm not sure what that drops to if the buffer is moved into HUB, but it will slow down from COG buffer speeds.
Code snippet for P1 20MHz SPI Master read of SD card is here, which uses a spare pin for CTR adder gate.
https://forums.parallax.com/discussion/comment/1466234/#Comment_1466234 and also post #7 describes how that works.
https://forums.parallax.com/discussion/comment/1482752/#Comment_1482752 has both read and write code, 20MHz SPI master
The documentation says that the default is 80MHz, and I'm using the Quickstart profile to upload.
It is in CMM at the moment. I'll re-test in LMM, thanks.
Hi @AmbientPower
Welcome to the forums!
Sorry if you already covered this.... but would you like to post the entire code file you are testing with? Just in case there's a config / clock option (or such like) out of place, one of us could quickly sanity check the code to make sure you're not missing out on something important.
Looks good. That's the .c file. Could you post the .side file too, as that includes the compiler options.
ps. I've added the code display tags in your above post. Three backticks on the lines above and below the code.
IOtest.c
There are a couple of problems with the line "OUTA &= !PIN_21;". In C, the "!" operator is a logical compliment. The operator that you really should use is "~", which is the bitwise compliment. Also, the symbol PIN_21 is defined as "#define PIN_21 1<<21". The pre-processor will convert "OUTA &= ~PIN_21;" to "OUTA &= ~1<<21;". The "~" operator has precedence over the "<<" operator, so this will result in an unintended value. When doing a #define like this it is best to put parentheses around the value, such as "#define PIN_21 (1<<21)". The pre-processor will then generate "OUTA &= ~(1<<21);".
Doh! Good catch. Been a while since I've written any embedded
1 MASK 1000000 0 LAP DO T LOOP LAP .LAP --> 80,000,272 cycles = 1.000sec ok
= 500kHz
by using pre assigned PINMASK
and using the single character
H - high
L - low
T - toggle
F - float
P - pulse (HL)
words
1 million pulses HL:
1 MASK 1000000 0 LAP DO P LOOP LAP .LAP --> 96,000,272 cycles = 1.200sec ok
= 600 kHz but with different pulse / pause widths
don't have the scope attached
ADDtoIT:
with special SPI words sending 32 bits at 3.7 - almost 4MHz
SPIWR32 ( long -- long ) send 32-bits 8.6us
If you want to know how to write something in Forth, all you gotta do is write it in C and complain that it isn't ___ enough
Well it's just a bit of fun to compare execution rates, programming in forth isn't compulsory - yet
deleted
The Forth numbers are interesting, but they don't really apply to the original post since it refers to the P1. Forth numbers for the P1 might be useful if they were for the P1. It might be useful to know the toggle speed in PASM, which I think would be 5 MHz. You could even toggle the pin using a counter. What's the highest frequency using a counter on the P1?
40 MHz in NCO mode, 128 MHz in PLL mode
my numbers are for Tachyon 5.7 on the P1
One correction about PASM. The minimal toggle loop would take 12 cycles per loop. With an 80 MHz system clock the pin frequency would be 6.667 MHz.
I think you meant toggles at 6.667MHz, ( for 3,333Mhz frequency )
It also depends on what sort of loop you create.
The manual shows a XOR+JMP which is 8 cycles per toggle, but has no exit means.
If you add a XOR + DJNZ for counted toggles this applies, still 8 cycles.
DJNZ requires a different amount of clock cycles depending on whether or not it has to jump. If it must jump it takes 4 clock cycles, if no jump occurs it takes 8 clock cycles. Since loops utilizing DJNZ need to be fast, it is optimized in this way for speed.
If you need to exit on a pin state from another COG, a 3rd test line is needed, for 12 sysclks per toggle.
Those are compile-time locked speeds, if you add WAITCNT for run-time control of the loop delay, that is then 4+(6+)+4, and pin frequency can be any value 40MHz/N, where N > = 14
No, I meant what I said. The pin frequency is 6.667 MHz. The toggle rate would be 12.333 MHz. This is based on the 3-instruction loop:
loop
xor outa, bitmask
xor outa, bitmask
jmp #loop
Of course with this loop the duty cycle is not 50%. A 50% duty cycle would require an instruction between the XORs, which would make it a 16-cycle loop. In that case the pin frequency would be 5 MHz, and the toggle rate would be 10 MHz.
This program:
compiles to the following loop in FlexC:
So it should run at 6.67 MHz on a P1.