HUB EXEC Update Here

cgracey · 2014-01-26 06:03

--- UPDATED AGAIN February 6, 2014 ---
--- UPDATED January 28, 2014 ---

Okay. I finally got it done. Here is the file:

Terasic_Prop2_Emulation_2014_02_06.zip

Be sure to see the HUB EXECUTION section in Prop2_Docs.txt. It explains everything and has a simple example.

Here's that section with the example at the end:

HUB EXECUTION
-------------

When a cog is started, registers $000..$1F3 are loaded sequentially from hub memory and then
execution commences at register $000. Executing code in this initial mode, from within the
cog, is fastest and deterministic, though cog space is limited, with some of the registers
invariably serving as data and variables, possibly limiting your code size.

Large programs, or programs which don't need to be deterministic and would like to free up the
cog register space for data, may be executed from hub memory, instead. These programs address
the 256K byte hub memory as 64k longs, ranging from $0000..$FFFF. To accommodate this, all cog
program counters are 16-bit, and there are 16-bit-constant 'jump', 'call', and 'return'
instructions.

To execute from the hub, simply branch outside of the cog address space of $000..$1FF to the
executable hub address space of $0200..$FFFF. You can jump, call, and return to and from
any address. If an instruction's address is $000..$1FF, it is fetched from cog memory. If an
instruction's address is $0200..$FFFF, it is fetched from hub memory.

Each cog has four instruction cache lines of eight longs, each, which serve as intermediaries
between the hub memory and instruction pipeline. Whenever an instruction is needed from the
hub that is not currently cached, a cache line is loaded on the next hub cycle, temporarily
stalling the pipeline. Cache lines are reloaded on a least-recently-used basis. A prefetch
mode, enabled on cog start, allows straight-line code without hub instructions to execute at
full-speed, as if it was running in the cog memory. Prefetch may be turned off to speed up
programs which have multiple tasks executing from the hub, and would be hindered by irrelevant
prefetches. It may also be turned off to allow a single-task program to cache four lines that
can be looped within, without cache disruption.

Here are the instructions which govern the instruction cache:

        ICACHEX         'invalid instruction cache, forces reloads on next hub instructions
        ICACHEP         'enable prefetch (this mode is enabled on cog start)
        ICACHEN         'disable prefetch


To help make hub execution practical, there are two instructions, AUGS and AUGD, which each
provide 23 bits of data to extend 9-bit constants in subsequent instructions to 32 bits:

        AUGS    #longvalue >> 9
        MOV     reg,#longvalue & $1FF

        AUGD    #longvalue >> 9
        SETXCH  #longvalue & $1FF

        AUGS    #frq32a >> 9
        AUGD    #frq32b >> 9
        SETFRQS #frq32b & $1FF,#frq32a & $1FF


For simplicity, these can be coded as such:

        MOV     reg,##longvalue

        SETXCH  ##longvalue

        SETFRQS ##frq32b,##frq32a


AUGS is cancelled when a subsequent instruction expresses a constant S. AUGD is cancelled when
a subsequent instruction expresses a constant D. There are separate AUGS/AUGD circuits for each
of the four tasks within a cog.

Remember that for every ##, you are generating an AUGS/AUGD instruction.


All 'jump' and 'call' instructions have 16-bit-constant and D-register variants:

        (delayed '-D' versions omitted for brevity)

        JMP     #absolute16     'jump to 16-bit absolute address
        JMP     @relative16     'jump to 16-bit relative address
        JMP     D               'jump to D[15:0]

        CALL    #absolute16     'call to 16-bit absolute address, push {Z,C,PC+1} into task's 4-level stack
        CALL    @relative16     'call to 16-bit relative address, push {Z,C,PC+1} into task's 4-level stack
        CALL    D               'call to D[15:0], push {Z,C,PC+1} into task's 4-level stack

        CALLA   #absolute16     'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRA++
        CALLA   @relative16     'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRA++
        CALLA   D               'call to D[15:0], WRLONG {Z,C,PC+1},PTRA++

        CALLB   #absolute16     'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRB++
        CALLB   @relative16     'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRB++
        CALLB   D               'call to D[15:0], WRLONG {Z,C,PC+1},PTRB++

        CALLX   #absolute16     'call to 16-bit absolute address, WRAUX {Z,C,PC+1},PTRX++
        CALLX   @relative16     'call to 16-bit relative address, WRAUX {Z,C,PC+1},PTRX++
        CALLX   D               'call to D[15:0], WRAUX {Z,C,PC+1},PTRX++

        CALLY   #absolute16     'call to 16-bit absolute address, WRAUXR {Z,C,PC+1},PTRY++
        CALLY   @relative16     'call to 16-bit relative address, WRAUXR {Z,C,PC+1},PTRY++
        CALLY   D               'call to D[15:0], WRAUXR {Z,C,PC+1},PTRY++


The non-delayed 'calls' shown above all push PC+1, or the next address. The delayed 'calls' push PC+1+n,
where n is the number of instructions in the pipeline which belong to the same task executing the 'call'.


The 'return' instructions can use WZ/WC to restore Z/C to the caller's states:

        RET                     'return, pop {Z,C,PC} from task's 4-level stack
        RETA                    'return, RDLONG {Z,C,PC},--PTRA
        RETB                    'return, RDLONG {Z,C,PC},--PTRB
        RETX                    'return, RDAUX  {Z,C,PC},--PTRX
        RETY                    'return, RDAUXR {Z,C,PC},--PTRY


The 'push' and 'pop' instructions:

        PUSH    D/#             'push D/# into task's 4-level stack
        PUSHA   D/#             'WRLONG D/#,PTRA++
        PUSHB   D/#             'WRLONG D/#,PTRB++
        PUSHX   D/#             'WRAUX  D/#,PTRX++
        PUSHY   D/#             'WRAUXR D/#,PTRY++

        POP     D               'pop D from task's 4-level stack
        POPA    D               'RDLONG D,--PTRA
        POPB    D               'RDLONG D,--PTRB
        POPX    D               'RDAUX  D,--PTRX
        POPY    D               'RDAUXR D,--PTRY


The conditional jumps, which specify a register or a 9-bit constant for their branch address,
all sign-extend their 9-bit constants for use as a relative address - unless AUGS is used to
expresses a full 16-bit relative address:

        IJZ     D,@relative9        'increment D and jump to 9-bit relative address if zero
        IJZ     D,@@relative16      'increment D and jump to 16-bit relative address if zero
        IJZ     D,S                 'increment D and jump to S[15:0] if zero

        IJNZ    D,@relative9        'increment D and jump to 9-bit relative address if not zero
        IJNZ    D,@@relative16      'increment D and jump to 16-bit relative address if not zero
        IJNZ    D,S                 'increment D and jump to S[15:0] if not zero

        DJZ     D,@relative9        'decrement D and jump to 9-bit relative address if zero
        DJZ     D,@@relative16      'decrement D and jump to 16-bit relative address if zero
        DJZ     D,S                 'decrement D and jump to S[15:0] if zero

        DJNZ    D,@relative9        'decrement D and jump to 9-bit relative address if not zero
        DJNZ    D,@@relative16      'decrement D and jump to 16-bit relative address if not zero
        DJNZ    D,S                 'decrement D and jump to S[15:0] if not zero

        JZ      D,@relative9        'test D and jump to 9-bit relative address if zero
        JZ      D,@@relative16      'test D and jump to 16-bit relative address if zero
        JZ      D,S                 'test D and jump to S[15:0] if zero

        JNZ     D,@relative9        'test D and jump to 9-bit relative address if not zero
        JNZ     D,@@relative16      'test D and jump to 16-bit relative address if not zero
        JNZ     D,S                 'test D and jump to S[15:0] if not zero

        JP      D/#,@relative9      'jump to 9-bit relative address if pin D/# reads high
        JP      D/#,@@relative16    'jump to 16-bit relative address if pin D/# reads high
        JP      D/#,S               'jump to S[15:0] if pin D/# reads high

        JNP     D/#,@relative9      'jump to 9-bit relative address if pin D/# reads low
        JNP     D/#,@@relative16    'jump to 16-bit relative address if pin D/# reads low
        JNP     D/#,S               'jump to S[15:0] if pin D/# reads low


JMPSW jumps according to the S field and stores {Z,C,PC} into D. WZ and WC can be used to load
{Z,C} from S[17:16]:

        JMPSW   D,@relative9        'jump to 9-bit relative address, store [Z,C,PC} into D
        JMPSW   D,@@relative16      'jump to 16-bit relative address, store [Z,C,PC} into D
        JMPSW   D,S                 'jump to S[15:0], store [Z,C,PC} into D
        JMPSW   D,S    WZ,WC        'jump to S[15:0], store [Z,C,PC} into D, Z=S[17], C=S[16]

        SWITCH                      'alias for 'JMPSW INDB,++INDB WZ,WC'
                                    'For round-robin switching among threads
                                    'Use FIXINDB to set up a loop of {Z,C,PC) registers for threads
                                    'Can be used with register remapping for multiple program instances
                                    'Instructions trailing SWITCHD are contextually in the next thread


JMPLIST jumps to a base address (S/@/@@) plus index (D).

        JMPLIST D,@relative9        'jump to D plus 9-bit relative address
        JMPLIST D,@@relative16      'jump to D plus 16-bit relative address
        JMPLIST D,S                 'jump to D plus S


LOCBASE converts a 16-bit hub instruction address into a normal 18-bit hub address for use
with RDxxxx/WRxxxx instructions:

        LOCBASE D,@relative9        'get 18-bit hub address from 9-bit relative address into D
        LOCBASE D,@@relative16      'get 18-bit hub address from 16-bit relative address into D
        LOCBASE D,S                 'get 18-bit hub address from S[15:0] into D


LOCBYTE/LOCWORD/LOCLONG are like LOCBASE, but use the initial D value as an index which gets
scaled and added to the normal 18-bit hub address:

        LOCBYTE D,@relative9        'get 18-bit byte-indexed hub address from 9-bit relative address into D
        LOCBYTE D,@@relative16      'get 18-bit byte-indexed hub address from 16-bit relative address into D
        LOCBYTE D,S                 'get 18-bit byte-indexed hub address from S[15:0] into D

        LOCWORD D,@relative9        'get 18-bit word-indexed hub address from 9-bit relative address into D
        LOCWORD D,@@relative16      'get 18-bit word-indexed hub address from 16-bit relative address into D
        LOCWORD D,S                 'get 18-bit word-indexed hub address from S[15:0] into D

        LOCLONG D,@relative9        'get 18-bit long-indexed hub address from 9-bit relative address into D
        LOCLONG D,@@relative16      'get 18-bit long-indexed hub address from 16-bit relative address into D
        LOCLONG D,S                 'get 18-bit long-indexed hub address from S[15:0] into D


Remember that @@ is going to generate an AUGS instruction.


LOCPTRA/LOCPTRB convert 16-bit constant hub instruction addresses into normal 18-bit hub addresses and then store
them into into PTRA/PTRB:

        LOCPTRA #absolute16         'get 18-bit hub address into PTRA from 16-bit absolute instruction address
        LOCPTRA @relative16         'get 18-bit hub address into PTRA from 16-bit relative instruction address

        LOCPTRB #absolute16         'get 18-bit hub address into PTRB from 16-bit absolute instruction address
        LOCPTRB @relative16         'get 18-bit hub address into PTRB from 16-bit relative instruction address


There are five assembler directives which are used to position instructions and set cog vs hub assembly modes:

        ORGH    absolute16          'set 16-bit-address hub mode, advances to absolute16 and sets origin
        ORGH                        'set 16-bit-address hub mode, initial state in DAT block

        ORG     absolute9           'set 9-bit-address cog mode, sets origin to absolute9
        ORG                         'set 9-bit-address cog mode, sets origin to 0

        ORGF    absolute9           'advances to absolute9, must be in cog mode

        RES     regcount            'reserves regcount locations, must be in cog mode
        RES                         'reserves 0 locations, must be in cog mode

        FIT     address             'errors out if address exceeded, works in both modes
        FIT                         'if cog mode, error if origin > $1F2; if hub mode, error if origin > $10000


Here is an example PASM application (use F11 to download) which demonstrates hub execution:


        orgh    $380            '$380 = 18-bit load address $E00

        org                     'internal cog code

        jmp     @go             'jump to hub memory

x       long    3               'cog register variable

        orgh    $1000           'some hub code at $1000

go      incmod  x,#3
        jmplist x,@@list

        orgh    $1400           'some hub code at $1400

list    jmp     @z0
        jmp     @z1
        jmp     @z2
        jmp     @z3

        orgh    $1800           'some hub code at $1800

z0      notp    #0
        jmp     @go

z1      notp    #1
        jmp     @go

z2      notp    #2
        jmp     @go

z3      notp    #3
        jmp     @go

Bill Henning · 2014-01-26 06:14

WOHOOO!!!!!

Excellent news Chip, now we all get to play

Thanks!!!!

mindrobots · 2014-01-26 06:34

WooHoo!!! It loaded to my Nano. At least Quartus says it loaded the new image and I have a monitor to talk to!

Time to go do some reading and playing!

Thanks Chip!!!!

You may want to retitle this thread - I thought it was just an update about how HubExec was implemented, not an actual Chipmas present attached!!

Cluso99 · 2014-01-26 06:39

Chip: you certainly have been busy. There are lots of hub mode support instructions there.

time to play

Baggers · 2014-01-26 07:30

Update installed, balls demo working, time to play indeed Ray!

Thanks Chip, amazing work as always

mindrobots · 2014-01-26 07:44

I wanted to SEE HUBEXEC working but I didn't have have a 'scope handy. I changed the code from Chip's demo a bit and hooked up some BRIGHT WHITE LEDs.

Not quite a balls demo but it IS running in HUBEXEC mode!

DAT
    orgh    $380            '$380 = 18-bit load address $E00

        org                     'internal cog code

        jmp     @go             'jump to hub memory

x       long    3               'cog register variable
tick0    long    80_000_000
delay0    long    10_000_000

        orgh    $1000           'some hub code at $1000

go      waitcnt    tick0, delay0
    incmod  x,#3
        jmplist x,@@list

        orgh    $1400           'some hub code at $1400

list    jmp     @z0
        jmp     @z1
        jmp     @z2
        jmp     @z3

        orgh    $1800           'some hub code at $1800

z0      notp    #0
        jmp     @go

z1      notp    #2
        jmp     @go

z2      notp    #4
        jmp     @go

z3      notp    #6
        jmp     @go

Bill Henning · 2014-01-26 07:49

Chip,

Monitor area $800-$DFF ($200-$37F hubexec) is a great place for monitor / loader / crypto routines as they could be called from cogs/hubexec (but I am guessing you already take advantage of this)

cgracey wrote: »

Large programs, or programs which don't need to be deterministic and would like to free up the
cog register space for data, may be executed from hub memory, instead. These programs address
the 256K byte hub memory as 64k longs, ranging from $0000..$FFFF. To accommodate this, all cog
program counters are 16-bit, and there are 16-bit-constant 'jump', 'call', and 'return'
instructions.

To execute from the hub, simply branch outside of the cog address space of $000..$1FF to the
executable hub address space of $0200..$FFFF. You can jump, call, and return to and from
any address. If an instruction's address is $000..$1FF, it is fetched from cog memory. If an
instruction's address is $0200..$FFFF, it is fetched from hub memory.

ctwardell · 2014-01-26 08:34

Most excellent, thanks Chip!

I had to rebuild my dev box since last using the Quartus tools so I'm starting from the ground up.

C.W.

rjo__ · 2014-01-26 09:03

All samples work as advertised.

I started back at the beginning of the Prop2_docs.txt. Seems a little dense or maybe it is me.

Maybe it is the English part of it seems like it would read better in Mandarin or maybe Australian:)

Brian?

Leon · 2014-01-26 09:48

The file seemed to load OK into my DE0-Nano, but I don't have the expansion board. All that happened was that one of the LEDs was lit, is that OK?

ctwardell · 2014-01-26 09:56

Up and running!

The Balls demo is pretty hypnotic!

Leon, I seem to recall one of the previous releases mentioning that the Parallax board for the Nano was now required to run the emulation.

C.W.

Leon · 2014-01-26 10:05

I thought that was the case. I was wondering if anyone else got the same thing without the Parallax board.

Sapieha · 2014-01-26 10:09

Hi Leon.

That is OK.

For runing monitor You not need addon PCB --- You need only find RX,TX and Res pins and connect serial to them

Leon wrote: »

The file seemed to load OK into my DE0-Nano, but I don't have the expansion board. All that happened was that one of the LEDs was lit, is that OK?

Leon · 2014-01-26 10:31

Thanks, Sapieha. I think I did that with the old version, I'll try it if I can find my USB to RS-232 adapter.

nutson · 2014-01-26 11:21

Running ok on DE0-Nano and DE2-115. Great work!

Heater. · 2014-01-26 11:22

I do like the way you can slide in and out of HUBEX mode at will with out any mode setting contortions.

Brilliant!

ctwardell · 2014-01-26 12:10

Here are the connections for using a Prop Plug without the add-on board.

C.W.

cgracey · 2014-01-26 12:41

Heater. wrote: »

I do like the way you can slide in and out of HUBEX mode at will with out any mode setting contortions.

Brilliant!

These ideas all came from you guys. I just implemented them. It's way better than what I would have thought to make, myself.

cgracey · 2014-01-26 12:42

Leon wrote: »

Thanks, Sapieha. I think I did that with the old version, I'll try it if I can find my USB to RS-232 adapter.

Leon, I believe we are making some more DE0-Nano adapter boards. If so, we'll send you one.

cgracey · 2014-01-26 12:51

rjo__ wrote: »

All samples work as advertised.

I started back at the beginning of the Prop2_docs.txt. Seems a little dense… or maybe it is me.

Maybe it is the English part of it… seems like it would read better in Mandarin… or maybe Australian:)

Brian?

Writing good manuals is an art that I'm not so good at, yet. But, I'm finding it easier to do than at first. To be able to convey some concepts clearly in a brief amount of text is a skill I hope to develop.

There's really no reason the data sheet for this chip needs to be more than 50 pages. The original Microchip PIC16C5x data sheet was a gem. When I first read it, I thought "How do you get anything done with this simple of a chip?", but by experimenting, I learned. Everything I needed to know was eventually gleaned from that datasheet. It was only maybe 20 pages. I love stuff like that. A lot of things today seem so haphazard and disjointed that I'm not even inspired to learn how to use them. If I thought something was done right, I'd be all over it.

Leon · 2014-01-26 13:14

cgracey wrote: »

Leon, I believe we are making some more DE0-Nano adapter boards. If so, we'll send you one.

Thanks, Chip.

Sapieha · 2014-01-26 13:16

Hi Chip.

JMPLIST jumps to a base address (S/@/@@) plus index (D).

Why You not named this instruction ---> JMPRELS ---
I think that name are more logical

jmg · 2014-01-26 13:23

Sapieha wrote: »

Hi Chip.

JMPLIST jumps to a base address (S/@/@@) plus index (D).

Why You not named this instruction ---> JMPRELS ---
I think that name are more logical

Or even more logical, to follow what others have done in the past - if the Instruction is doing this

JMP @S+D

then simply call it that.

Makes code more readable, and the opcode is pretty much self documenting.
Assemblers have been parsing that sort of code for decades.

David Betz · 2014-01-26 13:50

cgracey wrote: »

Okay. I finally got it done. Here is the file:

Attachment not found.

Be sure to see the HUB EXECUTION section in Prop2_Docs.txt. It explains everything and has a simple example.

Hi Chip,

Thanks for the update! I don't see the CALL instruction that puts its return address in a register. Did I miss something? That instruction will be very useful for PropGCC and probably other compiled languages.

Thanks,
David

ozpropdev · 2014-01-26 15:35

Thanks Chip, Nice work!
HUB EXEC looks good, TRACE extensions look good too.

Cheers
Brian

cgracey · 2014-01-26 16:11

Sapieha wrote: »

Hi Chip.

JMPLIST jumps to a base address (S/@/@@) plus index (D).

Why You not named this instruction ---> JMPRELS ---
I think that name are more logical

I think its only purpose will be to jump into a list of jumps to realize a jump table. Relative addresses are already part of the scheme. This represents a compound relative address whose run-time term (D) is not known at compile time, so is likely only useful for jump tables, or lists of jumps.

A simple name like 'JMP @S+D' won't work in the case where @S is just S, because 'S+D' would be something that the assembler would want to resolve at compile time. So, it needs to have some name other than just JMP. The name JMPREL, to me, implies 'relative to where we are', not relative to some other place, like where a list is.

cgracey · 2014-01-26 16:20

David Betz wrote: »

Hi Chip,

Thanks for the update! I don't see the CALL instruction that puts its return address in a register. Did I miss something? That instruction will be very useful for PropGCC and probably other compiled languages.

Thanks,
David

That's the next thing I'm going to address. I haven't forgotten. I had so much on my plate with hub exec that I couldn't deal with anything extra.

David Betz · 2014-01-26 17:24

cgracey wrote: »

That's the next thing I'm going to address. I haven't forgotten. I had so much on my plate with hub exec that I couldn't deal with anything extra.

Thanks! Sorry for bringing it up again. Nice work on the docs. They seem very clear and your implementation of hub execution mode is vary easy to understand.

jmg · 2014-01-26 17:25

cgracey wrote: »

I think its only purpose will be to jump into a list of jumps to realize a jump table. Relative addresses are already part of the scheme. This represents a compound relative address whose run-time term (D) is not known at compile time, so is likely only useful for jump tables, or lists of jumps.

A simple name like 'JMP @S+D' won't work in the case where @S is just S, because 'S+D' would be something that the assembler would want to resolve at compile time. So, it needs to have some name other than just JMP. The name JMPREL, to me, implies 'relative to where we are', not relative to some other place, like where a list is.

It just needs a slightly more context aware assembler, and you can support this

        JMP     #absolute16     'jump to 16-bit absolute address
        JMP     @relative16     'jump to 16-bit relative address
        JMP     D               'jump to D[15:0]
        JMP     D+@relative9    'jump to D plus 9-bit relative address
        JMP     D+@@relative16  'jump to D plus 16-bit relative address
        JMP     D+S             'jump to D plus S

Cluso99 · 2014-01-26 18:10

Docs: I have made a couple of fixes (hope they are correct) regarding the quads becoming octets and the 16..0 hub address bits becoming 17..0.
Prop2_Docs(rr).txt

As an alternative to JMPLIST perhaps something like JTABLE could be used? Otherwise I prefer JMPLIST over anything else so far.

cgracey · 2014-01-26 19:06

Cluso99 wrote: »

Docs: I have made a couple of fixes (hope they are correct) regarding the quads becoming octets and the 16..0 hub address bits becoming 17..0.
Prop2_Docs(rr).txt

As an alternative to JMPLIST perhaps something like JTABLE could be used? Otherwise I prefer JMPLIST over anything else so far.

Thanks for making those corrections, Cluso99.

HUB EXEC Update Here

Comments