If you give the COGC program a chunk of hub memory to use as a stack you don't have to worry about this. It is only necessary if you want to operate without a stack which is kind of unnatural for C anyway.
The general premise/expectation behind COG mode is to run COG local, with PASM like speed.
ie to use C as a high level assembler, and that will of course be somewhat constrained.
The link mentions 2 calls deep ?
What is the speed impact, when it does flip to use a Hub Stack ?
What CLK speeds does that use ?
I see 1MHz is widespread, and 3.4MHz is mentioned on some parts (eg Cypress FM24V10) not cheap, but could test 3.4MHz code.
1MHz parts can likely be over-clocked with tuning of pullup ressitors and code.
Not sure how fast it is trying to go. Jazzed wrote the EEPROM external memory driver. I've attached it here in case you want to look over the code. It doesn't use waitcnt for timing but instead uses djnz loops.
The general premise/expectation behind COG mode is to run COG local, with PASM like speed.
ie to use C as a high level assembler, and that will of course be somewhat constrained.
The link mentions 2 calls deep ?
What is the speed impact, when it does flip to use a Hub Stack ?
Remember, PropGCC is built on GCC. The compiler itself doesn't have any Propeller-specific constructs. It just has a Propeller code generator. The GCC compiler and C itself pretty much assume a stack so any code that can get away without one is exceptional. You *can* generate that sort of code by being a bit careful about how you use nested functions but it isn't really that straight forward. It would, of course, be possible to write an entirely new compiler that is tuned to generate code for the COG but that would be a much bigger effort than writing a code generator for GCC. We didn't do that. Even Catalina didn't do that.
What is the speed impact, when it does flip to use a Hub Stack ?
I just tried the fibo program in LMM and COG modes, and COG is about twice as fast as LMM. Of course, the fibo program uses recursive calling, so has to use the stack. I would expect a COG program to be at least 4 times faster than LMM as long as it doesn't use the stack or hub variables.
Remember, PropGCC is built on GCC. The compiler itself doesn't have any Propeller-specific constructs. It just has a Propeller code generator. The GCC compiler and C itself pretty much assume a stack so any code that can get away without one is exceptional. You *can* generate that sort of code by being a bit careful about how you use nested functions but it isn't really that straight forward. It would, of course, be possible to write an entirely new compiler that is tuned to generate code for the COG but that would be a much bigger effort than writing a code generator for GCC. We didn't do that. Even Catalina didn't do that.
It's impressive that PropGCC can generate COG-mode code.
Does the use of the HUB-stack appear in reports, or can it generate a warning ?
I just tried the fibo program in LMM and COG modes, and COG is about twice as fast as LMM. Of course, the fibo program uses recursive calling, so has to use the stack. I would expect a COG program to be at least 4 times faster than LMM as long as it doesn't use the stack or hub variables.
Sounds tolerable, what about the PropBASIC Frequency counter I linked in #27 as a more real-world example ?
Can that fit into COG mode, (sans-stacks?), and produce similar size / speed code ?
COG mode is thanks to Eric. I didn't think it could be done!
No, no warning or message of any kind is generated for stack usage. Stack usage is normal and expected in C.
Yes, I was meaning more as a user option ?
When they want COG mode and no stacks, (which is a special case) being able to avoid the need to manually double-check ASM listing, makes code a lot more maintainable.
Yes, I was meaning more as a user option ?
When they want COG mode and no stacks, (which is a special case) being able to avoid the need to manually double-check ASM listing, makes code a lot more maintainable.
I suppose it would be possible to do that by setting a flag in the code generator every time a stack operation is generated. That is, if it is possible at that level to distinguish between stack operations and other hub accesses. I'm not sure how useful it would be though. It would basically just tell you that a stack was needed but not why.
I suppose it would be possible to do that by setting a flag in the code generator every time a stack operation is generated. That is, if it is possible at that level to distinguish between stack operations and other hub accesses. I'm not sure how useful it would be though. It would basically just tell you that a stack was needed but not why.
Any warning is going to need additional investigation, it is the change in warning level that matters.
More useful than a simple flag, could be a stack counter, that INCs on every code generator stack case.
Reporting that would give a reference point for code maintainers ( & retains usefulness in >0 cases ).
WOW, a lot of good information here. I only expected a few quick answers.
So as I had thought, the use of stack space is still a concern (and a big one with only 496 longs of memory to work with). And much of the code I have seen uses some macros (if I remember correctly the header file is something like prop.h), macros that look a lot like spin commands, though have different parameters.
Just as a thought:
Why does not someone concentrate on a subset of C, and tuning it for COG execution. Perhaps start with something very simple like CUCU and add a little bit to take care of redundant code generated (to save space), and also have it generate native Propeller code directly. Then if you add a stack limit, most things should be fairly simple.
I think a simple subset C compiler like that would be a perfect fit on the Propeller. That is just my view.
Just as a thought:
Why does not someone concentrate on a subset of C, and tuning it for COG execution. Perhaps start with something very simple like CUCU and add a little bit to take care of redundant code generated (to save space), and also have it generate native Propeller code directly. Then if you add a stack limit, most things should be fairly simple.
I think a simple subset C compiler like that would be a perfect fit on the Propeller. That is just my view.
Things are pretty close to that already - the discussion is around how to make that 'simple subset C' a little more control-able and maintainable.
The Stack use is code-optional, but currently any use is a little hidden from users, hence my suggestion of some reporting (eg counter?)
I think a COG-Mode working example, that is real-world in nature (not fibc), would help demonstrate the issues here, and the PropBASIC freqCounter I linked seems a good reference.
That has some maths, some control flow, some string work, & comms, and gives a useful result, and seems to fit easily entirely in a COG ( ~152 Longs) with the small number-string in HUB..
It is small enough, that Prop GCC can be 2x the size and still fit to demonstrate COG-mode.
Here is a COG mode example that works and uses no stack space. In fact, this is the driver I was working on when I wrote those notes about using COG mode and eliminating stack usage.
Here is a COG mode example that works and uses no stack space. In fact, this is the driver I was working on when I wrote those notes about using COG mode and eliminating stack usage.
Can you include the ASM listing from that too ?
Is it easy to then make it use a stack, and include the .C/ASM for that one too ?
My question on BASIC still remains open. Compiled structured BASIC is a great language, and I see no reason not to use it. PropBASIC is great, though I would like to know of other options before I commit to a particular version of BASIC.
My question on BASIC still remains open. Compiled structured BASIC is a great language, and I see no reason not to use it. PropBASIC is great, though I would like to know of other options before I commit to a particular version of BASIC.
I don't know of any other Basic for the Propeller that produces PASM or LMM code. The others are either straight interpreters like FemtoBasic or byte code compilers like my xbasic or ebasic.
I don't know of any other Basic for the Propeller that produces PASM or LMM code. The others are either straight interpreters like FemtoBasic or byte code compilers like my xbasic or ebasic.
Thank you for that information. It is to bad we do not have more BASIC compilers for the Propeller.
Thank you for that information. It is to bad we do not have more BASIC compilers for the Propeller.
PropBASIC is being added to the Propeller IDE, (after being somewhat ignored) so that should boost the usage and support of PropBASIC.
Not sure why you'd want more than one ?
I'd rather see a PropPASCAL, before a second PropBASIC.
PropBASIC is being added to the Propeller IDE, (after being somewhat ignored) so that should boost the usage and support of PropBASIC.
Not sure why you'd want more than one ?
I'd rather see a PropPASCAL, before a second PropBASIC.
Just that having more than one helps push the use and development of what there is.
Sounds tolerable, what about the PropBASIC Frequency counter I linked in #27 as a more real-world example ?
Can that fit into COG mode, (sans-stacks?), and produce similar size / speed code ?
I converted the PropBASIC frequency counter program to C, and it's in the attached zip file. FreqCounter3.c doesn't use any of the special cog attributes, and it uses the stack. FreqCounter3cog.c uses the _NAKED, _NATIVE and _COGMEM attributes, and does not use the stack. It does use hub memory for strings and constants.
I use CTRB to generate a signal, which is measured by CTRA. You can change the frequency by changing the #define for FREQ. The signal pin number is defined by Signal.
I also generated the assembly output, which is included in the zip file.
In your code, I see you disabled the standard serial driver. I would assume you did this to reduce program overhead. If so, what kind of reduction in size is there?
Just that having more than one helps push the use and development of what there is.
Or, the opposite can occur as you can disperse the effort, and just confuse new users.
( rather like PropIDE and SimpleIDE is doing right now... )
That said, I can see a place for ByteCode BASIC and compiled Basic, tho ideally they are compatible...
Comments
The general premise/expectation behind COG mode is to run COG local, with PASM like speed.
ie to use C as a high level assembler, and that will of course be somewhat constrained.
The link mentions 2 calls deep ?
What is the speed impact, when it does flip to use a Hub Stack ?
eeprom_xmem.spin
It's impressive that PropGCC can generate COG-mode code.
Does the use of the HUB-stack appear in reports, or can it generate a warning ?
No, no warning or message of any kind is generated for stack usage. Stack usage is normal and expected in C.
Sounds tolerable, what about the PropBASIC Frequency counter I linked in #27 as a more real-world example ?
Can that fit into COG mode, (sans-stacks?), and produce similar size / speed code ?
Yes, I was meaning more as a user option ?
When they want COG mode and no stacks, (which is a special case) being able to avoid the need to manually double-check ASM listing, makes code a lot more maintainable.
More useful than a simple flag, could be a stack counter, that INCs on every code generator stack case.
Reporting that would give a reference point for code maintainers ( & retains usefulness in >0 cases ).
So as I had thought, the use of stack space is still a concern (and a big one with only 496 longs of memory to work with). And much of the code I have seen uses some macros (if I remember correctly the header file is something like prop.h), macros that look a lot like spin commands, though have different parameters.
Just as a thought:
Why does not someone concentrate on a subset of C, and tuning it for COG execution. Perhaps start with something very simple like CUCU and add a little bit to take care of redundant code generated (to save space), and also have it generate native Propeller code directly. Then if you add a stack limit, most things should be fairly simple.
I think a simple subset C compiler like that would be a perfect fit on the Propeller. That is just my view.
The Stack use is code-optional, but currently any use is a little hidden from users, hence my suggestion of some reporting (eg counter?)
I think a COG-Mode working example, that is real-world in nature (not fibc), would help demonstrate the issues here, and the PropBASIC freqCounter I linked seems a good reference.
That has some maths, some control flow, some string work, & comms, and gives a useful result, and seems to fit easily entirely in a COG ( ~152 Longs) with the small number-string in HUB..
It is small enough, that Prop GCC can be 2x the size and still fit to demonstrate COG-mode.
i2c_driver.c
Is it easy to then make it use a stack, and include the .C/ASM for that one too ?
.text .balign 4 _i2cStart mov r7, DIRA andn r7, _scl_mask mov DIRA, r7 mov r7, DIRA andn r7, _sda_mask mov DIRA, r7 mov r7, CNT add r7, _half_cycle waitcnt r7,#0 or DIRA,_sda_mask mov r7, CNT add r7, _half_cycle waitcnt r7,#0 or DIRA,_scl_mask 'native return _i2cStart_ret ret .balign 4 _i2cSendByte mov r7, #9 '' loop_start register r7 level #1 jmp #.L3 .L6 mov r6, DIRA test r0,#0x80 wz IF_NE andn r6, _sda_mask IF_E or r6, _sda_mask mov DIRA, r6 mov r6, CNT add r6, _half_cycle waitcnt r6,#0 and DIRA,r5 mov r6, CNT add r6, _half_cycle waitcnt r6,#0 mov r6, DIRA or r6, _scl_mask shl r0, #1 mov DIRA, r6 and r0,#255 .L3 mov r5, _scl_mask xor r5,__MASK_FFFFFFFF djnz r7,#.L6 mov r7, DIRA andn r7, _sda_mask mov DIRA, r7 mov r7, CNT add r7, _half_cycle waitcnt r7,#0 and DIRA,r5 mov r7, INA test r7,_sda_mask wz mov r0, #0 mov r7, CNT muxnz r0,#1 add r7, _half_cycle waitcnt r7,#0 or DIRA,_scl_mask or DIRA,_sda_mask 'native return _i2cSendByte_ret ret .balign 4 _i2cReceiveByte mov r3, _sda_mask mov r7, DIRA xor r3,__MASK_FFFFFFFF and r7, r3 mov DIRA, r7 mov r6, #9 mov r7, #0 '' loop_start register r6 level #1 jmp #.L9 .L10 mov r5, CNT add r5, _half_cycle waitcnt r5,#0 and DIRA,r4 mov r5, INA test r5,_sda_mask wz shl r7, #1 mov r5, #0 muxnz r5,#1 and r7, #254 or r7, r5 mov r5, CNT add r5, _half_cycle waitcnt r5,#0 or DIRA,_scl_mask .L9 mov r4, _scl_mask xor r4,__MASK_FFFFFFFF djnz r6,#.L10 mov r6, DIRA cmps r0, #0 wz,wc IF_NE or r6, _sda_mask IF_E and r6, r3 mov DIRA, r6 mov r6, CNT add r6, _half_cycle waitcnt r6,#0 and DIRA,r4 mov r6, CNT add r6, _half_cycle waitcnt r6,#0 or DIRA,_scl_mask or DIRA,_sda_mask mov r0, r7 'native return _i2cReceiveByte_ret ret .balign 4 _i2cStop mov r7, CNT add r7, _half_cycle waitcnt r7,#0 mov r7, DIRA andn r7, _scl_mask mov DIRA, r7 mov r7, DIRA andn r7, _sda_mask mov DIRA, r7 'native return _i2cStop_ret ret .balign 4 .global _main _main mov r6, PAR mov r4, #1 mov r3, r4 rdlong r7, r6 shl r3, r7 mov r7, r6 add r7, #4 mov _scl_mask, r3 rdlong r7, r7 shl r4, r7 mov r7, r6 add r7, #8 add r6, #12 mov _sda_mask, r4 rdlong r7, r7 shr r7, #1 cmp r7, #32 wz,wc mov _half_cycle, r7 IF_A sub r7, #32 IF_A mov _half_cycle, r7 mov r7, #0 rdlong r6, r6 mov _mailbox, r6 wrlong r7, r6 mov r6, r3 mov r5, DIRA xor r6,__MASK_FFFFFFFF and r5, r6 mov DIRA, r5 mov r7, r4 mov r5, DIRA xor r7,__MASK_FFFFFFFF and r5, r7 mov DIRA, r5 and OUTA,r6 and OUTA,r7 .L30 mov r6, _mailbox .L17 rdlong r7, r6 cmps r7, #0 wz,wc IF_E jmp #.L17 cmp r7, #2 wz,wc IF_B jmp #.L31 cmp r7, #3 wz,wc IF_BE jmp #.L19 cmp r7, #5 wz,wc IF_A jmp #.L31 jmp #.L41 .L19 mov r5, r6 add r5, #12 add r6, #16 cmps r7, #2 wz,wc rdlong lr, r5 rdlong r14, r6 IF_NE jmp #.L38 call #_i2cStart mov r7, _mailbox add r7, #8 rdlong r0, r7 and r0,#255 call #_i2cSendByte cmps r0, #0 wz,wc IF_NE mov lr, #2 IF_NE jmp #.L18 jmp #.L38 .L25 rdbyte r0, lr call #_i2cSendByte cmps r0, #0 wz,wc IF_NE jmp #.L33 add lr, #1 sub r14, #1 .L38 cmps r14, #0 wz,wc IF_NE jmp #.L25 mov lr, #0 jmp #.L24 .L33 mov lr, #3 .L24 mov r7, _mailbox add r7, #20 rdlong r7, r7 cmps r7, #0 wz,wc IF_E jmp #.L18 jmp #.L40 .L41 mov r5, r6 add r5, #12 add r6, #16 cmps r7, #4 wz,wc rdlong r14, r5 rdlong lr, r6 IF_NE jmp #.L39 call #_i2cStart mov r7, _mailbox add r7, #8 rdlong r0, r7 and r0,#255 call #_i2cSendByte cmps r0, #0 wz,wc IF_NE mov lr, #4 IF_NE jmp #.L18 jmp #.L39 .L29 cmps lr, #1 wz,wc mov r0, #0 muxnz r0,#1 sub lr, #1 call #_i2cReceiveByte wrbyte r0, r14 add r14, #1 .L39 cmps lr, #0 wz,wc IF_NE jmp #.L29 mov r7, _mailbox add r7, #20 rdlong r7, r7 cmps r7, #0 wz,wc IF_E mov lr, #0 IF_E jmp #.L18 .L40 call #_i2cStop jmp #.L18 .L31 mov lr, #1 .L18 mov r7, _mailbox mov r6, r7 add r6, #4 wrlong lr, r6 mov r6, #0 wrlong r6, r7 jmp #.L30 _scl_mask long 0 _sda_mask long 0 _half_cycle long 0 _mailbox long 0
Comments :
byte <<= 1;
if (INA & sda_mask) byte++;
should code smaller than
byte <<= 1;
byte |= (INA & sda_mask) ? 1 : 0;
and I notice using vars of byte, can give larger code than using the native prop size, as the compiler adds a masking step.
addit : The compiler knows about andn, so nicely does this (open drain)
mov r6, DIRA ; if (byte & 0x80) test r0,#0x80 wz ; i2c_set_sda_high(); IF_NE andn r6, _sda_mask ; else IF_E or r6, _sda_mask ; i2c_set_sda_low(); mov DIRA, r6 ;
but in other places, it seems to forget that, and instead uses another register to load the mask, then flips the bits and uses and, instead of andn ?!
My question on BASIC still remains open. Compiled structured BASIC is a great language, and I see no reason not to use it. PropBASIC is great, though I would like to know of other options before I commit to a particular version of BASIC.
Not sure why you'd want more than one ?
I'd rather see a PropPASCAL, before a second PropBASIC.
I use CTRB to generate a signal, which is measured by CTRA. You can change the frequency by changing the #define for FREQ. The signal pin number is defined by Signal.
I also generated the assembly output, which is included in the zip file.
In your code, I see you disabled the standard serial driver. I would assume you did this to reduce program overhead. If so, what kind of reduction in size is there?
( rather like PropIDE and SimpleIDE is doing right now... )
That said, I can see a place for ByteCode BASIC and compiled Basic, tho ideally they are compatible...