What are the limits of in COG with C or BASIC?

jmg · 2015-04-09 14:30

David Betz wrote: »

If you give the COGC program a chunk of hub memory to use as a stack you don't have to worry about this. It is only necessary if you want to operate without a stack which is kind of unnatural for C anyway.

The general premise/expectation behind COG mode is to run COG local, with PASM like speed.
ie to use C as a high level assembler, and that will of course be somewhat constrained.
The link mentions 2 calls deep ?
What is the speed impact, when it does flip to use a Hub Stack ?

David Betz · 2015-04-09 14:34

jmg wrote: »

What CLK speeds does that use ?
I see 1MHz is widespread, and 3.4MHz is mentioned on some parts (eg Cypress FM24V10) not cheap, but could test 3.4MHz code.
1MHz parts can likely be over-clocked with tuning of pullup ressitors and code.

Not sure how fast it is trying to go. Jazzed wrote the EEPROM external memory driver. I've attached it here in case you want to look over the code. It doesn't use waitcnt for timing but instead uses djnz loops.

eeprom_xmem.spin

David Betz · 2015-04-09 14:37

jmg wrote: »

The general premise/expectation behind COG mode is to run COG local, with PASM like speed.
ie to use C as a high level assembler, and that will of course be somewhat constrained.
The link mentions 2 calls deep ?
What is the speed impact, when it does flip to use a Hub Stack ?

Remember, PropGCC is built on GCC. The compiler itself doesn't have any Propeller-specific constructs. It just has a Propeller code generator. The GCC compiler and C itself pretty much assume a stack so any code that can get away without one is exceptional. You *can* generate that sort of code by being a bit careful about how you use nested functions but it isn't really that straight forward. It would, of course, be possible to write an entirely new compiler that is tuned to generate code for the COG but that would be a much bigger effort than writing a code generator for GCC. We didn't do that. Even Catalina didn't do that.

Dave Hein · 2015-04-09 14:48

jmg wrote: »

What is the speed impact, when it does flip to use a Hub Stack ?

I just tried the fibo program in LMM and COG modes, and COG is about twice as fast as LMM. Of course, the fibo program uses recursive calling, so has to use the stack. I would expect a COG program to be at least 4 times faster than LMM as long as it doesn't use the stack or hub variables.

jmg · 2015-04-09 14:59

David Betz wrote: »

Remember, PropGCC is built on GCC. The compiler itself doesn't have any Propeller-specific constructs. It just has a Propeller code generator. The GCC compiler and C itself pretty much assume a stack so any code that can get away without one is exceptional. You *can* generate that sort of code by being a bit careful about how you use nested functions but it isn't really that straight forward. It would, of course, be possible to write an entirely new compiler that is tuned to generate code for the COG but that would be a much bigger effort than writing a code generator for GCC. We didn't do that. Even Catalina didn't do that.

It's impressive that PropGCC can generate COG-mode code.
Does the use of the HUB-stack appear in reports, or can it generate a warning ?

David Betz · 2015-04-09 15:02

jmg wrote: »

It's impressive that PropGCC can generate COG-mode code.
Does the use of the HUB-stack appear in reports, or can it generate a warning ?

COG mode is thanks to Eric. I didn't think it could be done!

No, no warning or message of any kind is generated for stack usage. Stack usage is normal and expected in C.

jmg · 2015-04-09 15:03

Dave Hein wrote: »

I just tried the fibo program in LMM and COG modes, and COG is about twice as fast as LMM. Of course, the fibo program uses recursive calling, so has to use the stack. I would expect a COG program to be at least 4 times faster than LMM as long as it doesn't use the stack or hub variables.

Sounds tolerable, what about the PropBASIC Frequency counter I linked in #27 as a more real-world example ?
Can that fit into COG mode, (sans-stacks?), and produce similar size / speed code ?

jmg · 2015-04-09 15:05

David Betz wrote: »

COG mode is thanks to Eric. I didn't think it could be done!

No, no warning or message of any kind is generated for stack usage. Stack usage is normal and expected in C.

Yes, I was meaning more as a user option ?
When they want COG mode and no stacks, (which is a special case) being able to avoid the need to manually double-check ASM listing, makes code a lot more maintainable.

David Betz · 2015-04-09 15:34

jmg wrote: »

Yes, I was meaning more as a user option ?
When they want COG mode and no stacks, (which is a special case) being able to avoid the need to manually double-check ASM listing, makes code a lot more maintainable.

I suppose it would be possible to do that by setting a flag in the code generator every time a stack operation is generated. That is, if it is possible at that level to distinguish between stack operations and other hub accesses. I'm not sure how useful it would be though. It would basically just tell you that a stack was needed but not why.

jmg · 2015-04-09 15:42

David Betz wrote: »

I suppose it would be possible to do that by setting a flag in the code generator every time a stack operation is generated. That is, if it is possible at that level to distinguish between stack operations and other hub accesses. I'm not sure how useful it would be though. It would basically just tell you that a stack was needed but not why.

Any warning is going to need additional investigation, it is the change in warning level that matters.

More useful than a simple flag, could be a stack counter, that INCs on every code generator stack case.
Reporting that would give a reference point for code maintainers ( & retains usefulness in >0 cases ).

davidsaunders · 2015-04-09 16:54

WOW, a lot of good information here. I only expected a few quick answers.

So as I had thought, the use of stack space is still a concern (and a big one with only 496 longs of memory to work with). And much of the code I have seen uses some macros (if I remember correctly the header file is something like prop.h), macros that look a lot like spin commands, though have different parameters.

Just as a thought:
Why does not someone concentrate on a subset of C, and tuning it for COG execution. Perhaps start with something very simple like CUCU and add a little bit to take care of redundant code generated (to save space), and also have it generate native Propeller code directly. Then if you add a stack limit, most things should be fairly simple.

I think a simple subset C compiler like that would be a perfect fit on the Propeller. That is just my view.

David Betz · 2015-04-09 17:06

davidsaunders wrote: »

So as I had thought, the use of stack space is still a concern (and a big one with only 496 longs of memory to work with).

The stack, if one is needed, goes in hub memory not COG memory.

jmg · 2015-04-09 17:23

davidsaunders wrote: »

Just as a thought:
Why does not someone concentrate on a subset of C, and tuning it for COG execution. Perhaps start with something very simple like CUCU and add a little bit to take care of redundant code generated (to save space), and also have it generate native Propeller code directly. Then if you add a stack limit, most things should be fairly simple.

I think a simple subset C compiler like that would be a perfect fit on the Propeller. That is just my view.

Things are pretty close to that already - the discussion is around how to make that 'simple subset C' a little more control-able and maintainable.

The Stack use is code-optional, but currently any use is a little hidden from users, hence my suggestion of some reporting (eg counter?)

I think a COG-Mode working example, that is real-world in nature (not fibc), would help demonstrate the issues here, and the PropBASIC freqCounter I linked seems a good reference.
That has some maths, some control flow, some string work, & comms, and gives a useful result, and seems to fit easily entirely in a COG ( ~152 Longs) with the small number-string in HUB..
It is small enough, that Prop GCC can be 2x the size and still fit to demonstrate COG-mode.

David Betz · 2015-04-09 17:26

Here is a COG mode example that works and uses no stack space. In fact, this is the driver I was working on when I wrote those notes about using COG mode and eliminating stack usage.

i2c_driver.c

jmg · 2015-04-09 17:43

David Betz wrote: »

Here is a COG mode example that works and uses no stack space. In fact, this is the driver I was working on when I wrote those notes about using COG mode and eliminating stack usage.

Can you include the ASM listing from that too ?

Is it easy to then make it use a stack, and include the .C/ASM for that one too ?

David Betz · 2015-04-09 17:44

jmg wrote: »

Can you include the ASM listing from that too ?

Is it easy to then make it use a stack, and include the .C/ASM for that one too ?

It will use a stack automatically if you call nested procedures that aren't declared _NAKED or _NATIVE or whatever.

David Betz · 2015-04-09 17:49

jmg wrote: »

Can you include the ASM listing from that too ?

Is it easy to then make it use a stack, and include the .C/ASM for that one too ?

Here is the assembly generated using -mcog -Os.

.text
	.balign	4
_i2cStart
	mov	r7, DIRA
	andn	r7, _scl_mask
	mov	DIRA, r7
	mov	r7, DIRA
	andn	r7, _sda_mask
	mov	DIRA, r7
	mov	r7, CNT
	add	r7, _half_cycle
	waitcnt	r7,#0
	or	DIRA,_sda_mask
	mov	r7, CNT
	add	r7, _half_cycle
	waitcnt	r7,#0
	or	DIRA,_scl_mask
	'native return
_i2cStart_ret
	ret
	.balign	4
_i2cSendByte
	mov	r7, #9
	'' loop_start register r7 level #1
	jmp	#.L3
.L6
	mov	r6, DIRA
	test	r0,#0x80 wz
	IF_NE andn	r6, _sda_mask
	IF_E  or	r6, _sda_mask
	mov	DIRA, r6
	mov	r6, CNT
	add	r6, _half_cycle
	waitcnt	r6,#0
	and	DIRA,r5
	mov	r6, CNT
	add	r6, _half_cycle
	waitcnt	r6,#0
	mov	r6, DIRA
	or	r6, _scl_mask
	shl	r0, #1
	mov	DIRA, r6
	and	r0,#255
.L3
	mov	r5, _scl_mask
	xor	r5,__MASK_FFFFFFFF
	djnz	r7,#.L6
	mov	r7, DIRA
	andn	r7, _sda_mask
	mov	DIRA, r7
	mov	r7, CNT
	add	r7, _half_cycle
	waitcnt	r7,#0
	and	DIRA,r5
	mov	r7, INA
	test	r7,_sda_mask wz
	mov	r0, #0
	mov	r7, CNT
	muxnz	r0,#1
	add	r7, _half_cycle
	waitcnt	r7,#0
	or	DIRA,_scl_mask
	or	DIRA,_sda_mask
	'native return
_i2cSendByte_ret
	ret
	.balign	4
_i2cReceiveByte
	mov	r3, _sda_mask
	mov	r7, DIRA
	xor	r3,__MASK_FFFFFFFF
	and	r7, r3
	mov	DIRA, r7
	mov	r6, #9
	mov	r7, #0
	'' loop_start register r6 level #1
	jmp	#.L9
.L10
	mov	r5, CNT
	add	r5, _half_cycle
	waitcnt	r5,#0
	and	DIRA,r4
	mov	r5, INA
	test	r5,_sda_mask wz
	shl	r7, #1
	mov	r5, #0
	muxnz	r5,#1
	and	r7, #254
	or	r7, r5
	mov	r5, CNT
	add	r5, _half_cycle
	waitcnt	r5,#0
	or	DIRA,_scl_mask
.L9
	mov	r4, _scl_mask
	xor	r4,__MASK_FFFFFFFF
	djnz	r6,#.L10
	mov	r6, DIRA
	cmps	r0, #0 wz,wc
	IF_NE or	r6, _sda_mask
	IF_E  and	r6, r3
	mov	DIRA, r6
	mov	r6, CNT
	add	r6, _half_cycle
	waitcnt	r6,#0
	and	DIRA,r4
	mov	r6, CNT
	add	r6, _half_cycle
	waitcnt	r6,#0
	or	DIRA,_scl_mask
	or	DIRA,_sda_mask
	mov	r0, r7
	'native return
_i2cReceiveByte_ret
	ret
	.balign	4
_i2cStop
	mov	r7, CNT
	add	r7, _half_cycle
	waitcnt	r7,#0
	mov	r7, DIRA
	andn	r7, _scl_mask
	mov	DIRA, r7
	mov	r7, DIRA
	andn	r7, _sda_mask
	mov	DIRA, r7
	'native return
_i2cStop_ret
	ret
	.balign	4
	.global	_main
_main
	mov	r6, PAR
	mov	r4, #1
	mov	r3, r4
	rdlong	r7, r6
	shl	r3, r7
	mov	r7, r6
	add	r7, #4
	mov	_scl_mask, r3
	rdlong	r7, r7
	shl	r4, r7
	mov	r7, r6
	add	r7, #8
	add	r6, #12
	mov	_sda_mask, r4
	rdlong	r7, r7
	shr	r7, #1
	cmp	r7, #32 wz,wc
	mov	_half_cycle, r7
	IF_A  sub	r7, #32
	IF_A  mov	_half_cycle, r7
	mov	r7, #0
	rdlong	r6, r6
	mov	_mailbox, r6
	wrlong	r7, r6
	mov	r6, r3
	mov	r5, DIRA
	xor	r6,__MASK_FFFFFFFF
	and	r5, r6
	mov	DIRA, r5
	mov	r7, r4
	mov	r5, DIRA
	xor	r7,__MASK_FFFFFFFF
	and	r5, r7
	mov	DIRA, r5
	and	OUTA,r6
	and	OUTA,r7
.L30
	mov	r6, _mailbox
.L17
	rdlong	r7, r6
	cmps	r7, #0 wz,wc
	IF_E 	jmp	#.L17
	cmp	r7, #2 wz,wc
	IF_B 	jmp	#.L31
	cmp	r7, #3 wz,wc
	IF_BE	jmp	#.L19
	cmp	r7, #5 wz,wc
	IF_A 	jmp	#.L31
	jmp	#.L41
.L19
	mov	r5, r6
	add	r5, #12
	add	r6, #16
	cmps	r7, #2 wz,wc
	rdlong	lr, r5
	rdlong	r14, r6
	IF_NE	jmp	#.L38
	call	#_i2cStart
	mov	r7, _mailbox
	add	r7, #8
	rdlong	r0, r7
	and	r0,#255
	call	#_i2cSendByte
	cmps	r0, #0 wz,wc
	IF_NE mov	lr, #2
	IF_NE	jmp	#.L18
	jmp	#.L38
.L25
	rdbyte	r0, lr
	call	#_i2cSendByte
	cmps	r0, #0 wz,wc
	IF_NE	jmp	#.L33
	add	lr, #1
	sub	r14, #1
.L38
	cmps	r14, #0 wz,wc
	IF_NE	jmp	#.L25
	mov	lr, #0
	jmp	#.L24
.L33
	mov	lr, #3
.L24
	mov	r7, _mailbox
	add	r7, #20
	rdlong	r7, r7
	cmps	r7, #0 wz,wc
	IF_E 	jmp	#.L18
	jmp	#.L40
.L41
	mov	r5, r6
	add	r5, #12
	add	r6, #16
	cmps	r7, #4 wz,wc
	rdlong	r14, r5
	rdlong	lr, r6
	IF_NE	jmp	#.L39
	call	#_i2cStart
	mov	r7, _mailbox
	add	r7, #8
	rdlong	r0, r7
	and	r0,#255
	call	#_i2cSendByte
	cmps	r0, #0 wz,wc
	IF_NE mov	lr, #4
	IF_NE	jmp	#.L18
	jmp	#.L39
.L29
	cmps	lr, #1 wz,wc
	mov	r0, #0
	muxnz	r0,#1
	sub	lr, #1
	call	#_i2cReceiveByte
	wrbyte	r0, r14
	add	r14, #1
.L39
	cmps	lr, #0 wz,wc
	IF_NE	jmp	#.L29
	mov	r7, _mailbox
	add	r7, #20
	rdlong	r7, r7
	cmps	r7, #0 wz,wc
	IF_E  mov	lr, #0
	IF_E 	jmp	#.L18
.L40
	call	#_i2cStop
	jmp	#.L18
.L31
	mov	lr, #1
.L18
	mov	r7, _mailbox
	mov	r6, r7
	add	r6, #4
	wrlong	lr, r6
	mov	r6, #0
	wrlong	r6, r7
	jmp	#.L30
_scl_mask
	long	0
_sda_mask
	long	0
_half_cycle
	long	0
_mailbox
	long	0

jmg · 2015-04-09 18:22

David Betz wrote: »

Here is the assembly generated using -mcog -Os.

Thanks, interesting code generation,
Comments :

byte <<= 1;
if (INA & sda_mask) byte++;

should code smaller than
byte <<= 1;
byte |= (INA & sda_mask) ? 1 : 0;

and I notice using vars of byte, can give larger code than using the native prop size, as the compiler adds a masking step.

addit : The compiler knows about andn, so nicely does this (open drain)

	mov	r6, DIRA                           ;         if (byte & 0x80)
	test	r0,#0x80 wz                        ;             i2c_set_sda_high();
	IF_NE andn	r6, _sda_mask              ;         else
	IF_E  or	r6, _sda_mask              ;             i2c_set_sda_low();
	mov	DIRA, r6                           ;

but in other places, it seems to forget that, and instead uses another register to load the mask, then flips the bits and uses and, instead of andn ?!

davidsaunders · 2015-04-09 19:17

Well a lot of good C examples.

My question on BASIC still remains open. Compiled structured BASIC is a great language, and I see no reason not to use it. PropBASIC is great, though I would like to know of other options before I commit to a particular version of BASIC.

David Betz · 2015-04-09 19:24

davidsaunders wrote: »

Well a lot of good C examples.

My question on BASIC still remains open. Compiled structured BASIC is a great language, and I see no reason not to use it. PropBASIC is great, though I would like to know of other options before I commit to a particular version of BASIC.

I don't know of any other Basic for the Propeller that produces PASM or LMM code. The others are either straight interpreters like FemtoBasic or byte code compilers like my xbasic or ebasic.

davidsaunders · 2015-04-09 19:53

David Betz wrote: »

I don't know of any other Basic for the Propeller that produces PASM or LMM code. The others are either straight interpreters like FemtoBasic or byte code compilers like my xbasic or ebasic.

Thank you for that information. It is to bad we do not have more BASIC compilers for the Propeller.

jmg · 2015-04-09 20:43

davidsaunders wrote: »

Thank you for that information. It is to bad we do not have more BASIC compilers for the Propeller.

PropBASIC is being added to the Propeller IDE, (after being somewhat ignored) so that should boost the usage and support of PropBASIC.
Not sure why you'd want more than one ?
I'd rather see a PropPASCAL, before a second PropBASIC.

potatohead · 2015-04-09 20:59

Seconded jmg.

davidsaunders · 2015-04-10 05:00

jmg wrote: »

PropBASIC is being added to the Propeller IDE, (after being somewhat ignored) so that should boost the usage and support of PropBASIC.
Not sure why you'd want more than one ?
I'd rather see a PropPASCAL, before a second PropBASIC.

Just that having more than one helps push the use and development of what there is.

davidsaunders · 2015-04-10 05:31

Though I do agree that it would be nice to see a PropPascal. And having PropBASIC with PropellerIDE would likely increase its usage.

Dave Hein · 2015-04-10 10:39

jmg wrote: »

Sounds tolerable, what about the PropBASIC Frequency counter I linked in #27 as a more real-world example ?
Can that fit into COG mode, (sans-stacks?), and produce similar size / speed code ?

I converted the PropBASIC frequency counter program to C, and it's in the attached zip file. FreqCounter3.c doesn't use any of the special cog attributes, and it uses the stack. FreqCounter3cog.c uses the _NAKED, _NATIVE and _COGMEM attributes, and does not use the stack. It does use hub memory for strings and constants.

I use CTRB to generate a signal, which is measured by CTRA. You can change the frequency by changing the #define for FREQ. The signal pin number is defined by Signal.

I also generated the assembly output, which is included in the zip file.

idbruce · 2015-04-10 11:31

Dave

In your code, I see you disabled the standard serial driver. I would assume you did this to reduce program overhead. If so, what kind of reduction in size is there?

Dave Hein · 2015-04-10 11:33

It didn't fit in the cog with the standard driver. It was something like 70 bytes too large.

idbruce · 2015-04-10 11:46

I see.... I should have paid closer attention to the discussion.

, but cool anyhow, nice to see how it is done. Thanks.

jmg · 2015-04-10 13:26

davidsaunders wrote: »

Just that having more than one helps push the use and development of what there is.

Or, the opposite can occur as you can disperse the effort, and just confuse new users.
( rather like PropIDE and SimpleIDE is doing right now... )
That said, I can see a place for ByteCode BASIC and compiled Basic, tho ideally they are compatible...