Shop OBEX P1 Docs P2 Docs Learn Events
P2 COG and HUB exec now ~100% Binary Compatible! — Parallax Forums

P2 COG and HUB exec now ~100% Binary Compatible!

jmgjmg Posts: 15,148
edited 2015-10-03 23:44 in Propeller 2
I think this milestone post deserves its own thread:
cgracey wrote: »
I got the new memory and branching model working. I also got REP working in hub exec.

There's full binary compatibility now between cog/lut code and hub code that use relative addressing.

In cog code now, we are back to the good old 1:1 addressing - no more 4x'd register addresses. What a relief!

Here's the new map for code execution :

00000..001FF = cog
00200..003FF = lut
00400..FFFFF = hub

Downloaded programs start at $400.

When in the cog, all registers are long, with their addresses being contiguous integers. The PC steps by 1.

When in the hub, instructions take 4 bytes. The PC steps by 4.

To bridge the two contexts, there are two simple things done:

The 9-bit-constant relative branches DJNZ/DJZ/TJZ/... encode the -256..+255 instruction range into their S field. When in cog exec, that value is sign-extended and added to the PC. When in hub exec, it is shifted left two bits and used the same way. This way, both cog and hub contexts get the max use out of these instructions and maintain binary compatibility.

The 20-bit-constant relative branches JMP/CALL/CALLA/... are encoded for hub exec as you imagine they would be, where they track byte offset. When the cog uses these branches, it shifts them right two bits to get cog-relative values. They are assembled pre-4x'd in cog code that way. So, these instructions are now binary compatible between cog/lut and hub code.

REP now works in hub exec by forcing a jump during the last instruction in the repeat block. It didn't take much logic to implement and it works just as you'd expect. Even though it's slow in hub exec, because of the branching on each iteration, it is a convenient instruction to have for doing simple loops.

The assembler generates the same code for relative branches and REP in both cog/lut exec and hub exec contexts.

I will have updated FPGA files done tomorrow. I just finished the Prop123-A7 compile and now I need to make the DE2-115 version.

Here's what the all_cogs_blink program looks like now. Note the ORGH and the REP:
dat
	orgh	$400

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

	loc	x,@blink

	rep	@repend,#15
	coginit	#16,x
repend

blink	cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x		'flip its output state
	add	x,#16		'add to my id
	shl	x,#18		'shift up to make it big
	waitx	x		'wait that many clocks
	jmp	@blink		'do it again

	org
x	res	1		'variable at cog register 8

Comments

  • jmgjmg Posts: 15,148
    and this new feature is related
    cgracey wrote: »
    Here's an example of the JMPREL instruction. It works in both cog and hub. There needed to be some mechanism like this that automatically scales the branch offset (<<2 for hub), so that variable-relative branches could be realized in binary-compatible code.
    dat
    	orgh	$400
    
    
    pgm	mov	dira,#$FF	'entry
    	mov	x,#0
    
    loop	shl	x,#1		'loop
    	call	@spread
    	shr	x,#1
    	incmod	x,#7
    	jmp	@loop
    
    
    spread	jmprel	x
    
    	notb	outa,#0
    	ret
    
    	notb	outa,#1
    	ret
    
    	notb	outa,#2
    	ret
    
    	notb	outa,#3
    	ret
    
    	notb	outa,#4
    	ret
    
    	notb	outa,#5
    	ret
    
    	notb	outa,#6
    	ret
    
    	notb	outa,#7
    	ret
    
    
    	org
    x	res	1
    

  • jmgjmg Posts: 15,148
    edited 2015-10-03 22:08
    Why does this matter ? :
    It saves a shipload of explaining and admin, and means discussion can focus on features, not on caveats and gotchas.
    It also allows one mindset in code development, and late-in-design-flow choices on what code will run where.

    HUB Exec, including in Assembler, now has all the features of COG exec, and users can craft a design to use COG exec where it really matters.

    P2 now has gained some things in common with PC level higher end MPUs - in those, you have a local Cache that is limited, but very fast and much larger SDRAM that is less deterministic.
    (and usually an OS as well, to make things increasingly less deterministic)

    Add Cache-lock thinking, where a design can lock small code into that faster memory area, and the P2 tracks that mindset, but adds the feature that such fastest, deterministic code also gets it own core.

    Easy to explain to new users, and the potential for hard real time use, is obvious.
  • Well, they're not entirely compatible. You still can't have byte-aligned instructions in COG memory which means that you can't use the trick that Chip mentioned about putting inline byte data in your code if you intend to run it in COG memory. Requiring instructions to be long-aligned even in hub memory would improve that a bit but you still wouldn't be able to run Chip's code because COG memory isn't byte addressable. So, while it's true that any code that will run in COG memory will also run in hub memory, the reverse is not necessarily true unless you follow some restrictions.
  • jmgjmg Posts: 15,148
    edited 2015-10-03 22:27
    David Betz wrote: »
    Well, they're not entirely compatible. You still can't have byte-aligned instructions in COG memory which means that you can't use the trick that Chip mentioned about putting inline byte data in your code if you intend to run it in COG memory. Requiring instructions to be long-aligned even in hub memory would improve that a bit but you still wouldn't be able to run Chip's code because COG memory isn't byte addressable. So, while it's true that any code that will run in COG memory will also run in hub memory, the reverse is not necessarily true unless you follow some restrictions.

    The bigger limit here seems to be that COG(LUT?) memory isn't byte addressable ?
    Alignment for portability is more of a tool setting issue.

    I'll paste Chip's code snippet here, as it is small and shows the points
    cgracey wrote: »
    Here is an example of why unaligned hub code is important:
    	call	@send_string
    	byte	13,13,"The time is ",0
    	mov	val,hours
    	call	@send_decimal2
    	call	@send_string
    	byte	':',0
    	mov	val,minutes
    	call	@send_decimal2
    	call	@send_string
    	byte	" and the date is ",0
    	...
    
    You can do things like that, which is way better than having to get pointers to data located elsewhere.
    Edit: changed 'db' to 'byte'

    One could argue that placing casual string constants into COG, is not a good idea, and send_string is not likely to be the sort of hard-real-time code that is cog-destined.
    Because COG can call HUB anytime, users have some choices here of place this code in HUB, or place the strings in HUB with a prefix call.
  • David beat me to it. It's not 100% binary compatible, but at least all the instructions work the same no matter where they are run from.

    Also, it's fairly trivial to make your code work in both by avoiding things like Chip's string example. Especially in generated code like from gcc.

    I assume Chip's assembler already complains if you try to compile his example in COG space.

    Anyway, I think we are in a good place on all this stuff.
  • jmgjmg Posts: 15,148
    edited 2015-10-04 00:01
    Binary Compatible is usually related to 'all the instructions work the same no matter where they are run from'.
    - but yes, there are memory area and opcode reach caveats that mean mixed code and data can have issues, those are not so much related to opcode execution, but more data mapping. (I've changed the title to ~100% Binary Compatible)

    As you say, the tools should report data mapping errors.
  • Cluso99Cluso99 Posts: 18,069
    Just one more baby step required...

    Make hubexec long-aligned.
    Then the last simplification will be possible....

    All instructions can be "longs" everywhere. PC will always be +1.
    Only as the instruction is fetched from hub will 2 LSBs of %00 be appended.

    Then we will ultimately have a simple programming model equally applied to hub/cog/lut.
  • I don't want it aligned. Being able to inline data is too sweet.
  • Besides, when we make SPIN 2, it will have inline PASM, and that with a byte code and native PASM options will make nice, dense programs.
  • cgraceycgracey Posts: 14,133
    edited 2015-10-05 03:46
    Cluso99 wrote: »
    Just one more baby step required...

    Make hubexec long-aligned.
    Then the last simplification will be possible....

    All instructions can be "longs" everywhere. PC will always be +1.
    Only as the instruction is fetched from hub will 2 LSBs of %00 be appended.

    Then we will ultimately have a simple programming model equally applied to hub/cog/lut.

    That would mean:

    cog exec $00000..$007FF
    lut exec $00800..$00FFF
    hub exec $01000..$FFFFF

    A jump to $01000 would read a long from $04000.

    Say you had a code label for instruction $01000. How would you get an address from that label, in order to do a RDBYTE? What would that look like?
  • Cluso99Cluso99 Posts: 18,069
    cgracey wrote: »
    Cluso99 wrote: »
    Just one more baby step required...

    Make hubexec long-aligned.
    Then the last simplification will be possible....

    All instructions can be "longs" everywhere. PC will always be +1.
    Only as the instruction is fetched from hub will 2 LSBs of %00 be appended.

    Then we will ultimately have a simple programming model equally applied to hub/cog/lut.

    That would mean:

    cog exec $00000..$007FF
    lut exec $00800..$00FFF
    hub exec $01000..$FFFFF

    A jump to $01000 would read a long from $04000.

    Say you had a code label for instruction $01000. How would you get an address from that label, in order to do a RDBYTE? What would that look like?
    No!

    cog exec $00000..$001FF
    lut exec $00200..$003FF
    hub exec $00400..$3FFFF (it is in longs; <<2 and append %00 for byte addresses for data)

    For hub addresses, they will be addresses as bytes for all data accesses. However, for DJxx/TJxx/JMP/CALLx/RETx operands, hub addresses will always be shifted >>2 and masked to 18-bits by the compiler. The PC will always treat instructions as being longs and long-aligned, so the PC will always increment by +1 and be 18 bits. When fetching instructions from hub, the PC will append %00 which makes a long-aligned byte address.

    Now it can be explained simply to the user that all instruction addresses are long-aligned long addresses, for hub/cog/lut.
    And we have reduced the instruction address requirements to 18-bits. As a side benefit, it simplifies and frees up some opcode space in the process.
  • jmgjmg Posts: 15,148
    Cluso99 wrote: »
    Just one more baby step required...

    Make hubexec long-aligned.
    Then the last simplification will be possible....

    All instructions can be "longs" everywhere. PC will always be +1.
    Only as the instruction is fetched from hub will 2 LSBs of %00 be appended.

    Then we will ultimately have a simple programming model equally applied to hub/cog/lut.

    I'm not following - there already is "a simple programming model equally applied to hub/cog/lut", -

    Chip now has hardware manage the differences, so the binary code is identical, and can be copied in blocks from HUB to COG or LUT, (RJMPs assumed) with only data access caveats.
    Those caveats are more related to BYTE pointers than to alignment.

    The user model is not improved by forcing Hubexec long aligned, but you do remove some more compact code options.

    If some MHz gain were achieved by forcing align, that becomes a different question.
  • Cluso99Cluso99 Posts: 18,069
    from Chip
    Say you had a code label for instruction $01000. How would you get an address from that label, in order to do a RDBYTE? What would that look like?

    Here is an example
    dat
                  orgh              $04000                  'hub byte addresses
                  mov               y,x                     '<-- compiler forces long-alignement (or errors out)
                  loc               hubptr,datab            ' set hub byte address for "datab" label 
                  rdbyte            y,hubptr
                  loc               hubptr,dataw            ' set hub byte address for "dataw" label 
                  rdword            y,hubptr
                  loc               hubptr,datal            ' set hub byte address for "datal" label 
                  rdlong            y,hubptr
    
                    orgh            $08000                  'hub byte addresses
    
    datab           byte            "A"
                    byte            0                       'filler
    dataw           word            $1234
    datal           long            $1234_4321
    
    
                  org               $0100                   'cog long addresses
    x             long              $10
    y             res               1
    hubptr        res               1
    
  • jmgjmg Posts: 15,148
    cgracey wrote: »
    Say you had a code label for instruction $01000. How would you get an address from that label, in order to do a RDBYTE? What would that look like?

    If you want to intermix BYTE and code, I think you are forced to use packers, if code is forced long-aligned.

    That's not drop-dead, but it is wasteful.

    I'm not seeing a compelling case for long alignment.
    IIRC you said there was minimal speed impact ? - and what is there now works.
  • cgraceycgracey Posts: 14,133
    Cluso99 wrote: »
    from Chip
    Say you had a code label for instruction $01000. How would you get an address from that label, in order to do a RDBYTE? What would that look like?

    Here is an example
    dat
                  orgh              $04000                  'hub byte addresses
                  mov               y,x                     '<-- compiler forces long-alignement (or errors out)
                  loc               hubptr,datab            ' set hub byte address for "datab" label 
                  rdbyte            y,hubptr
                  loc               hubptr,dataw            ' set hub byte address for "dataw" label 
                  rdword            y,hubptr
                  loc               hubptr,datal            ' set hub byte address for "datal" label 
                  rdlong            y,hubptr
    
                    orgh            $08000                  'hub byte addresses
    
    datab           byte            "A"
                    byte            0                       'filler
    dataw           word            $1234
    datal           long            $1234_4321
    
    
                  org               $0100                   'cog long addresses
    x             long              $10
    y             res               1
    hubptr        res               1
    

    What I was getting, was how do you reconcile between code and data addresses via labels? How do you take a code label and get a long address out of it that you can use with RDLONG, in order to, say, load code into lut?
  • Cluso99Cluso99 Posts: 18,069
    Here is how I think it should work...
    DAT
                    orgh    $04000
    
    entry
                    loc     hptr,#code                      ' hub byte address
                    setq    #(x-begin)                      ' count in longs
                    rdlong  begin,hptr                      ' "begin"is cog (long) reg/addr; hptr point to hub (byte) addr
    
                    jmp     #begin
    
    code            .....                                   'code to be loaded from hub to cog
                    .....
    
                    org     $008
    
    begin           ......
    
    data            long    0
    hptr            res     1
    x               res     1
    
    
  • cgraceycgracey Posts: 14,133
    edited 2015-10-05 13:17
    Cluso99 wrote: »
    Here is how I think it should work...
    DAT
                    orgh    $04000
    
    entry
                    loc     hptr,#code                      ' hub byte address
                    setq    #(x-begin)                      ' count in longs
                    rdlong  begin,hptr                      ' "begin"is cog (long) reg/addr; hptr point to hub (byte) addr
    
                    jmp     #begin
    
    code            .....                                   'code to be loaded from hub to cog
                    .....
    
                    org     $008
    
    begin           ......
    
    data            long    0
    hptr            res     1
    x               res     1
    
    

    Okay, but what about using a code label as both a JMP address and a RDLONG address?

    I'm wondering how you handle the 4x difference from PC to hub address. How do you bridge the two address schemes?
  • 9 and 20 bit embedded addresses have implied '00' extension, managed by the assembler when not prefixed by ALTDS

    LOC of an immediate 20 bit address would shift it left two bits when not prefixed by ALTDS

    long addr_label

    would explicitly get the two 0 lsb's


  • Cluso99Cluso99 Posts: 18,069
    Chip,
    Does this answer your question?
                    orgh    $04000
    
    entry
                    loc     hptr,#code                      ' hub byte address
                    setq    #(x-begin)                      ' count in longs
                    rdlong  begin,hptr                      ' "begin"is cog (long) reg/addr; hptr point to hub (byte) addr
    
    'some silly code to show jmp #abs in hubexec
                    .....
                    calla   #hub_sub                        ' assembler inserts hub long addr (now_begin>>2) into call instr
                    .....
                    jmp     #now_begin                      ' assembler inserts hub long addr (now_begin>>2) into jmp instr 
    
    
    now_begin       .....
                    jmp     #begin
    
    'some subroutine
    hub_sub         .....
                    .....
                    reta
    
    code            .....                                   'code to be loaded from hub to cog
                    .....
    
                    org     $008
    
    begin           ......
    
    data            long    0
    hptr            res     1
    x               res     1
    
    

  • Cluso99Cluso99 Posts: 18,069
    Is there any case where LOC is used to get an instruction address?
    I am not really sure of all of LOCs usage.
Sign In or Register to comment.