HUB EXEC Update Here
cgracey
Posts: 14,287
--- UPDATED AGAIN February 6, 2014 ---
--- UPDATED January 28, 2014 ---
Okay. I finally got it done. Here is the file:
Terasic_Prop2_Emulation_2014_02_06.zip
Be sure to see the HUB EXECUTION section in Prop2_Docs.txt. It explains everything and has a simple example.
Here's that section with the example at the end:
--- UPDATED January 28, 2014 ---
Okay. I finally got it done. Here is the file:
Terasic_Prop2_Emulation_2014_02_06.zip
Be sure to see the HUB EXECUTION section in Prop2_Docs.txt. It explains everything and has a simple example.
Here's that section with the example at the end:
HUB EXECUTION
-------------
When a cog is started, registers $000..$1F3 are loaded sequentially from hub memory and then
execution commences at register $000. Executing code in this initial mode, from within the
cog, is fastest and deterministic, though cog space is limited, with some of the registers
invariably serving as data and variables, possibly limiting your code size.
Large programs, or programs which don't need to be deterministic and would like to free up the
cog register space for data, may be executed from hub memory, instead. These programs address
the 256K byte hub memory as 64k longs, ranging from $0000..$FFFF. To accommodate this, all cog
program counters are 16-bit, and there are 16-bit-constant 'jump', 'call', and 'return'
instructions.
To execute from the hub, simply branch outside of the cog address space of $000..$1FF to the
executable hub address space of $0200..$FFFF. You can jump, call, and return to and from
any address. If an instruction's address is $000..$1FF, it is fetched from cog memory. If an
instruction's address is $0200..$FFFF, it is fetched from hub memory.
Each cog has four instruction cache lines of eight longs, each, which serve as intermediaries
between the hub memory and instruction pipeline. Whenever an instruction is needed from the
hub that is not currently cached, a cache line is loaded on the next hub cycle, temporarily
stalling the pipeline. Cache lines are reloaded on a least-recently-used basis. A prefetch
mode, enabled on cog start, allows straight-line code without hub instructions to execute at
full-speed, as if it was running in the cog memory. Prefetch may be turned off to speed up
programs which have multiple tasks executing from the hub, and would be hindered by irrelevant
prefetches. It may also be turned off to allow a single-task program to cache four lines that
can be looped within, without cache disruption.
Here are the instructions which govern the instruction cache:
ICACHEX 'invalid instruction cache, forces reloads on next hub instructions
ICACHEP 'enable prefetch (this mode is enabled on cog start)
ICACHEN 'disable prefetch
To help make hub execution practical, there are two instructions, AUGS and AUGD, which each
provide 23 bits of data to extend 9-bit constants in subsequent instructions to 32 bits:
AUGS #longvalue >> 9
MOV reg,#longvalue & $1FF
AUGD #longvalue >> 9
SETXCH #longvalue & $1FF
AUGS #frq32a >> 9
AUGD #frq32b >> 9
SETFRQS #frq32b & $1FF,#frq32a & $1FF
For simplicity, these can be coded as such:
MOV reg,##longvalue
SETXCH ##longvalue
SETFRQS ##frq32b,##frq32a
AUGS is cancelled when a subsequent instruction expresses a constant S. AUGD is cancelled when
a subsequent instruction expresses a constant D. There are separate AUGS/AUGD circuits for each
of the four tasks within a cog.
Remember that for every ##, you are generating an AUGS/AUGD instruction.
All 'jump' and 'call' instructions have 16-bit-constant and D-register variants:
(delayed '-D' versions omitted for brevity)
JMP #absolute16 'jump to 16-bit absolute address
JMP @relative16 'jump to 16-bit relative address
JMP D 'jump to D[15:0]
CALL #absolute16 'call to 16-bit absolute address, push {Z,C,PC+1} into task's 4-level stack
CALL @relative16 'call to 16-bit relative address, push {Z,C,PC+1} into task's 4-level stack
CALL D 'call to D[15:0], push {Z,C,PC+1} into task's 4-level stack
CALLA #absolute16 'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRA++
CALLA @relative16 'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRA++
CALLA D 'call to D[15:0], WRLONG {Z,C,PC+1},PTRA++
CALLB #absolute16 'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRB++
CALLB @relative16 'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRB++
CALLB D 'call to D[15:0], WRLONG {Z,C,PC+1},PTRB++
CALLX #absolute16 'call to 16-bit absolute address, WRAUX {Z,C,PC+1},PTRX++
CALLX @relative16 'call to 16-bit relative address, WRAUX {Z,C,PC+1},PTRX++
CALLX D 'call to D[15:0], WRAUX {Z,C,PC+1},PTRX++
CALLY #absolute16 'call to 16-bit absolute address, WRAUXR {Z,C,PC+1},PTRY++
CALLY @relative16 'call to 16-bit relative address, WRAUXR {Z,C,PC+1},PTRY++
CALLY D 'call to D[15:0], WRAUXR {Z,C,PC+1},PTRY++
The non-delayed 'calls' shown above all push PC+1, or the next address. The delayed 'calls' push PC+1+n,
where n is the number of instructions in the pipeline which belong to the same task executing the 'call'.
The 'return' instructions can use WZ/WC to restore Z/C to the caller's states:
RET 'return, pop {Z,C,PC} from task's 4-level stack
RETA 'return, RDLONG {Z,C,PC},--PTRA
RETB 'return, RDLONG {Z,C,PC},--PTRB
RETX 'return, RDAUX {Z,C,PC},--PTRX
RETY 'return, RDAUXR {Z,C,PC},--PTRY
The 'push' and 'pop' instructions:
PUSH D/# 'push D/# into task's 4-level stack
PUSHA D/# 'WRLONG D/#,PTRA++
PUSHB D/# 'WRLONG D/#,PTRB++
PUSHX D/# 'WRAUX D/#,PTRX++
PUSHY D/# 'WRAUXR D/#,PTRY++
POP D 'pop D from task's 4-level stack
POPA D 'RDLONG D,--PTRA
POPB D 'RDLONG D,--PTRB
POPX D 'RDAUX D,--PTRX
POPY D 'RDAUXR D,--PTRY
The conditional jumps, which specify a register or a 9-bit constant for their branch address,
all sign-extend their 9-bit constants for use as a relative address - unless AUGS is used to
expresses a full 16-bit relative address:
IJZ D,@relative9 'increment D and jump to 9-bit relative address if zero
IJZ D,@@relative16 'increment D and jump to 16-bit relative address if zero
IJZ D,S 'increment D and jump to S[15:0] if zero
IJNZ D,@relative9 'increment D and jump to 9-bit relative address if not zero
IJNZ D,@@relative16 'increment D and jump to 16-bit relative address if not zero
IJNZ D,S 'increment D and jump to S[15:0] if not zero
DJZ D,@relative9 'decrement D and jump to 9-bit relative address if zero
DJZ D,@@relative16 'decrement D and jump to 16-bit relative address if zero
DJZ D,S 'decrement D and jump to S[15:0] if zero
DJNZ D,@relative9 'decrement D and jump to 9-bit relative address if not zero
DJNZ D,@@relative16 'decrement D and jump to 16-bit relative address if not zero
DJNZ D,S 'decrement D and jump to S[15:0] if not zero
JZ D,@relative9 'test D and jump to 9-bit relative address if zero
JZ D,@@relative16 'test D and jump to 16-bit relative address if zero
JZ D,S 'test D and jump to S[15:0] if zero
JNZ D,@relative9 'test D and jump to 9-bit relative address if not zero
JNZ D,@@relative16 'test D and jump to 16-bit relative address if not zero
JNZ D,S 'test D and jump to S[15:0] if not zero
JP D/#,@relative9 'jump to 9-bit relative address if pin D/# reads high
JP D/#,@@relative16 'jump to 16-bit relative address if pin D/# reads high
JP D/#,S 'jump to S[15:0] if pin D/# reads high
JNP D/#,@relative9 'jump to 9-bit relative address if pin D/# reads low
JNP D/#,@@relative16 'jump to 16-bit relative address if pin D/# reads low
JNP D/#,S 'jump to S[15:0] if pin D/# reads low
JMPSW jumps according to the S field and stores {Z,C,PC} into D. WZ and WC can be used to load
{Z,C} from S[17:16]:
JMPSW D,@relative9 'jump to 9-bit relative address, store [Z,C,PC} into D
JMPSW D,@@relative16 'jump to 16-bit relative address, store [Z,C,PC} into D
JMPSW D,S 'jump to S[15:0], store [Z,C,PC} into D
JMPSW D,S WZ,WC 'jump to S[15:0], store [Z,C,PC} into D, Z=S[17], C=S[16]
SWITCH 'alias for 'JMPSW INDB,++INDB WZ,WC'
'For round-robin switching among threads
'Use FIXINDB to set up a loop of {Z,C,PC) registers for threads
'Can be used with register remapping for multiple program instances
'Instructions trailing SWITCHD are contextually in the next thread
JMPLIST jumps to a base address (S/@/@@) plus index (D).
JMPLIST D,@relative9 'jump to D plus 9-bit relative address
JMPLIST D,@@relative16 'jump to D plus 16-bit relative address
JMPLIST D,S 'jump to D plus S
LOCBASE converts a 16-bit hub instruction address into a normal 18-bit hub address for use
with RDxxxx/WRxxxx instructions:
LOCBASE D,@relative9 'get 18-bit hub address from 9-bit relative address into D
LOCBASE D,@@relative16 'get 18-bit hub address from 16-bit relative address into D
LOCBASE D,S 'get 18-bit hub address from S[15:0] into D
LOCBYTE/LOCWORD/LOCLONG are like LOCBASE, but use the initial D value as an index which gets
scaled and added to the normal 18-bit hub address:
LOCBYTE D,@relative9 'get 18-bit byte-indexed hub address from 9-bit relative address into D
LOCBYTE D,@@relative16 'get 18-bit byte-indexed hub address from 16-bit relative address into D
LOCBYTE D,S 'get 18-bit byte-indexed hub address from S[15:0] into D
LOCWORD D,@relative9 'get 18-bit word-indexed hub address from 9-bit relative address into D
LOCWORD D,@@relative16 'get 18-bit word-indexed hub address from 16-bit relative address into D
LOCWORD D,S 'get 18-bit word-indexed hub address from S[15:0] into D
LOCLONG D,@relative9 'get 18-bit long-indexed hub address from 9-bit relative address into D
LOCLONG D,@@relative16 'get 18-bit long-indexed hub address from 16-bit relative address into D
LOCLONG D,S 'get 18-bit long-indexed hub address from S[15:0] into D
Remember that @@ is going to generate an AUGS instruction.
LOCPTRA/LOCPTRB convert 16-bit constant hub instruction addresses into normal 18-bit hub addresses and then store
them into into PTRA/PTRB:
LOCPTRA #absolute16 'get 18-bit hub address into PTRA from 16-bit absolute instruction address
LOCPTRA @relative16 'get 18-bit hub address into PTRA from 16-bit relative instruction address
LOCPTRB #absolute16 'get 18-bit hub address into PTRB from 16-bit absolute instruction address
LOCPTRB @relative16 'get 18-bit hub address into PTRB from 16-bit relative instruction address
There are five assembler directives which are used to position instructions and set cog vs hub assembly modes:
ORGH absolute16 'set 16-bit-address hub mode, advances to absolute16 and sets origin
ORGH 'set 16-bit-address hub mode, initial state in DAT block
ORG absolute9 'set 9-bit-address cog mode, sets origin to absolute9
ORG 'set 9-bit-address cog mode, sets origin to 0
ORGF absolute9 'advances to absolute9, must be in cog mode
RES regcount 'reserves regcount locations, must be in cog mode
RES 'reserves 0 locations, must be in cog mode
FIT address 'errors out if address exceeded, works in both modes
FIT 'if cog mode, error if origin > $1F2; if hub mode, error if origin > $10000
Here is an example PASM application (use F11 to download) which demonstrates hub execution:
orgh $380 '$380 = 18-bit load address $E00
org 'internal cog code
jmp @go 'jump to hub memory
x long 3 'cog register variable
orgh $1000 'some hub code at $1000
go incmod x,#3
jmplist x,@@list
orgh $1400 'some hub code at $1400
list jmp @z0
jmp @z1
jmp @z2
jmp @z3
orgh $1800 'some hub code at $1800
z0 notp #0
jmp @go
z1 notp #1
jmp @go
z2 notp #2
jmp @go
z3 notp #3
jmp @go


Comments
Excellent news Chip, now we all get to play
Time to go do some reading and playing!
Thanks Chip!!!!
You may want to retitle this thread - I thought it was just an update about how HubExec was implemented, not an actual Chipmas present attached!!
time to play
Thanks Chip, amazing work as always
Not quite a balls demo but it IS running in HUBEXEC mode!
DAT orgh $380 '$380 = 18-bit load address $E00 org 'internal cog code jmp @go 'jump to hub memory x long 3 'cog register variable tick0 long 80_000_000 delay0 long 10_000_000 orgh $1000 'some hub code at $1000 go waitcnt tick0, delay0 incmod x,#3 jmplist x,@@list orgh $1400 'some hub code at $1400 list jmp @z0 jmp @z1 jmp @z2 jmp @z3 orgh $1800 'some hub code at $1800 z0 notp #0 jmp @go z1 notp #2 jmp @go z2 notp #4 jmp @go z3 notp #6 jmp @goMonitor area $800-$DFF ($200-$37F hubexec) is a great place for monitor / loader / crypto routines as they could be called from cogs/hubexec (but I am guessing you already take advantage of this)
I had to rebuild my dev box since last using the Quartus tools so I'm starting from the ground up.
C.W.
I started back at the beginning of the Prop2_docs.txt. Seems a little dense or maybe it is me.
Maybe it is the English part of it seems like it would read better in Mandarin or maybe Australian:)
Brian?
The Balls demo is pretty hypnotic!
Leon, I seem to recall one of the previous releases mentioning that the Parallax board for the Nano was now required to run the emulation.
C.W.
That is OK.
For runing monitor You not need addon PCB --- You need only find RX,TX and Res pins and connect serial to them
Brilliant!
C.W.
These ideas all came from you guys. I just implemented them. It's way better than what I would have thought to make, myself.
Leon, I believe we are making some more DE0-Nano adapter boards. If so, we'll send you one.
Writing good manuals is an art that I'm not so good at, yet. But, I'm finding it easier to do than at first. To be able to convey some concepts clearly in a brief amount of text is a skill I hope to develop.
There's really no reason the data sheet for this chip needs to be more than 50 pages. The original Microchip PIC16C5x data sheet was a gem. When I first read it, I thought "How do you get anything done with this simple of a chip?", but by experimenting, I learned. Everything I needed to know was eventually gleaned from that datasheet. It was only maybe 20 pages. I love stuff like that. A lot of things today seem so haphazard and disjointed that I'm not even inspired to learn how to use them. If I thought something was done right, I'd be all over it.
Thanks, Chip.
JMPLIST jumps to a base address (S/@/@@) plus index (D).
Why You not named this instruction ---> JMPRELS ---
I think that name are more logical
Or even more logical, to follow what others have done in the past - if the Instruction is doing this
JMP @S+D
then simply call it that.
Makes code more readable, and the opcode is pretty much self documenting.
Assemblers have been parsing that sort of code for decades.
Thanks for the update! I don't see the CALL instruction that puts its return address in a register. Did I miss something? That instruction will be very useful for PropGCC and probably other compiled languages.
Thanks,
David
HUB EXEC looks good, TRACE extensions look good too.
Cheers
Brian
I think its only purpose will be to jump into a list of jumps to realize a jump table. Relative addresses are already part of the scheme. This represents a compound relative address whose run-time term (D) is not known at compile time, so is likely only useful for jump tables, or lists of jumps.
A simple name like 'JMP @S+D' won't work in the case where @S is just S, because 'S+D' would be something that the assembler would want to resolve at compile time. So, it needs to have some name other than just JMP. The name JMPREL, to me, implies 'relative to where we are', not relative to some other place, like where a list is.
That's the next thing I'm going to address. I haven't forgotten. I had so much on my plate with hub exec that I couldn't deal with anything extra.
It just needs a slightly more context aware assembler, and you can support this
JMP #absolute16 'jump to 16-bit absolute address JMP @relative16 'jump to 16-bit relative address JMP D 'jump to D[15:0] JMP D+@relative9 'jump to D plus 9-bit relative address JMP D+@@relative16 'jump to D plus 16-bit relative address JMP D+S 'jump to D plus SProp2_Docs(rr).txt
As an alternative to JMPLIST perhaps something like JTABLE could be used? Otherwise I prefer JMPLIST over anything else so far.
Thanks for making those corrections, Cluso99.