HUB EXEC Update Here
cgracey
Posts: 14,151
--- UPDATED AGAIN February 6, 2014 ---
--- UPDATED January 28, 2014 ---
Okay. I finally got it done. Here is the file:
Terasic_Prop2_Emulation_2014_02_06.zip
Be sure to see the HUB EXECUTION section in Prop2_Docs.txt. It explains everything and has a simple example.
Here's that section with the example at the end:
--- UPDATED January 28, 2014 ---
Okay. I finally got it done. Here is the file:
Terasic_Prop2_Emulation_2014_02_06.zip
Be sure to see the HUB EXECUTION section in Prop2_Docs.txt. It explains everything and has a simple example.
Here's that section with the example at the end:
HUB EXECUTION ------------- When a cog is started, registers $000..$1F3 are loaded sequentially from hub memory and then execution commences at register $000. Executing code in this initial mode, from within the cog, is fastest and deterministic, though cog space is limited, with some of the registers invariably serving as data and variables, possibly limiting your code size. Large programs, or programs which don't need to be deterministic and would like to free up the cog register space for data, may be executed from hub memory, instead. These programs address the 256K byte hub memory as 64k longs, ranging from $0000..$FFFF. To accommodate this, all cog program counters are 16-bit, and there are 16-bit-constant 'jump', 'call', and 'return' instructions. To execute from the hub, simply branch outside of the cog address space of $000..$1FF to the executable hub address space of $0200..$FFFF. You can jump, call, and return to and from any address. If an instruction's address is $000..$1FF, it is fetched from cog memory. If an instruction's address is $0200..$FFFF, it is fetched from hub memory. Each cog has four instruction cache lines of eight longs, each, which serve as intermediaries between the hub memory and instruction pipeline. Whenever an instruction is needed from the hub that is not currently cached, a cache line is loaded on the next hub cycle, temporarily stalling the pipeline. Cache lines are reloaded on a least-recently-used basis. A prefetch mode, enabled on cog start, allows straight-line code without hub instructions to execute at full-speed, as if it was running in the cog memory. Prefetch may be turned off to speed up programs which have multiple tasks executing from the hub, and would be hindered by irrelevant prefetches. It may also be turned off to allow a single-task program to cache four lines that can be looped within, without cache disruption. Here are the instructions which govern the instruction cache: ICACHEX 'invalid instruction cache, forces reloads on next hub instructions ICACHEP 'enable prefetch (this mode is enabled on cog start) ICACHEN 'disable prefetch To help make hub execution practical, there are two instructions, AUGS and AUGD, which each provide 23 bits of data to extend 9-bit constants in subsequent instructions to 32 bits: AUGS #longvalue >> 9 MOV reg,#longvalue & $1FF AUGD #longvalue >> 9 SETXCH #longvalue & $1FF AUGS #frq32a >> 9 AUGD #frq32b >> 9 SETFRQS #frq32b & $1FF,#frq32a & $1FF For simplicity, these can be coded as such: MOV reg,##longvalue SETXCH ##longvalue SETFRQS ##frq32b,##frq32a AUGS is cancelled when a subsequent instruction expresses a constant S. AUGD is cancelled when a subsequent instruction expresses a constant D. There are separate AUGS/AUGD circuits for each of the four tasks within a cog. Remember that for every ##, you are generating an AUGS/AUGD instruction. All 'jump' and 'call' instructions have 16-bit-constant and D-register variants: (delayed '-D' versions omitted for brevity) JMP #absolute16 'jump to 16-bit absolute address JMP @relative16 'jump to 16-bit relative address JMP D 'jump to D[15:0] CALL #absolute16 'call to 16-bit absolute address, push {Z,C,PC+1} into task's 4-level stack CALL @relative16 'call to 16-bit relative address, push {Z,C,PC+1} into task's 4-level stack CALL D 'call to D[15:0], push {Z,C,PC+1} into task's 4-level stack CALLA #absolute16 'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRA++ CALLA @relative16 'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRA++ CALLA D 'call to D[15:0], WRLONG {Z,C,PC+1},PTRA++ CALLB #absolute16 'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRB++ CALLB @relative16 'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRB++ CALLB D 'call to D[15:0], WRLONG {Z,C,PC+1},PTRB++ CALLX #absolute16 'call to 16-bit absolute address, WRAUX {Z,C,PC+1},PTRX++ CALLX @relative16 'call to 16-bit relative address, WRAUX {Z,C,PC+1},PTRX++ CALLX D 'call to D[15:0], WRAUX {Z,C,PC+1},PTRX++ CALLY #absolute16 'call to 16-bit absolute address, WRAUXR {Z,C,PC+1},PTRY++ CALLY @relative16 'call to 16-bit relative address, WRAUXR {Z,C,PC+1},PTRY++ CALLY D 'call to D[15:0], WRAUXR {Z,C,PC+1},PTRY++ The non-delayed 'calls' shown above all push PC+1, or the next address. The delayed 'calls' push PC+1+n, where n is the number of instructions in the pipeline which belong to the same task executing the 'call'. The 'return' instructions can use WZ/WC to restore Z/C to the caller's states: RET 'return, pop {Z,C,PC} from task's 4-level stack RETA 'return, RDLONG {Z,C,PC},--PTRA RETB 'return, RDLONG {Z,C,PC},--PTRB RETX 'return, RDAUX {Z,C,PC},--PTRX RETY 'return, RDAUXR {Z,C,PC},--PTRY The 'push' and 'pop' instructions: PUSH D/# 'push D/# into task's 4-level stack PUSHA D/# 'WRLONG D/#,PTRA++ PUSHB D/# 'WRLONG D/#,PTRB++ PUSHX D/# 'WRAUX D/#,PTRX++ PUSHY D/# 'WRAUXR D/#,PTRY++ POP D 'pop D from task's 4-level stack POPA D 'RDLONG D,--PTRA POPB D 'RDLONG D,--PTRB POPX D 'RDAUX D,--PTRX POPY D 'RDAUXR D,--PTRY The conditional jumps, which specify a register or a 9-bit constant for their branch address, all sign-extend their 9-bit constants for use as a relative address - unless AUGS is used to expresses a full 16-bit relative address: IJZ D,@relative9 'increment D and jump to 9-bit relative address if zero IJZ D,@@relative16 'increment D and jump to 16-bit relative address if zero IJZ D,S 'increment D and jump to S[15:0] if zero IJNZ D,@relative9 'increment D and jump to 9-bit relative address if not zero IJNZ D,@@relative16 'increment D and jump to 16-bit relative address if not zero IJNZ D,S 'increment D and jump to S[15:0] if not zero DJZ D,@relative9 'decrement D and jump to 9-bit relative address if zero DJZ D,@@relative16 'decrement D and jump to 16-bit relative address if zero DJZ D,S 'decrement D and jump to S[15:0] if zero DJNZ D,@relative9 'decrement D and jump to 9-bit relative address if not zero DJNZ D,@@relative16 'decrement D and jump to 16-bit relative address if not zero DJNZ D,S 'decrement D and jump to S[15:0] if not zero JZ D,@relative9 'test D and jump to 9-bit relative address if zero JZ D,@@relative16 'test D and jump to 16-bit relative address if zero JZ D,S 'test D and jump to S[15:0] if zero JNZ D,@relative9 'test D and jump to 9-bit relative address if not zero JNZ D,@@relative16 'test D and jump to 16-bit relative address if not zero JNZ D,S 'test D and jump to S[15:0] if not zero JP D/#,@relative9 'jump to 9-bit relative address if pin D/# reads high JP D/#,@@relative16 'jump to 16-bit relative address if pin D/# reads high JP D/#,S 'jump to S[15:0] if pin D/# reads high JNP D/#,@relative9 'jump to 9-bit relative address if pin D/# reads low JNP D/#,@@relative16 'jump to 16-bit relative address if pin D/# reads low JNP D/#,S 'jump to S[15:0] if pin D/# reads low JMPSW jumps according to the S field and stores {Z,C,PC} into D. WZ and WC can be used to load {Z,C} from S[17:16]: JMPSW D,@relative9 'jump to 9-bit relative address, store [Z,C,PC} into D JMPSW D,@@relative16 'jump to 16-bit relative address, store [Z,C,PC} into D JMPSW D,S 'jump to S[15:0], store [Z,C,PC} into D JMPSW D,S WZ,WC 'jump to S[15:0], store [Z,C,PC} into D, Z=S[17], C=S[16] SWITCH 'alias for 'JMPSW INDB,++INDB WZ,WC' 'For round-robin switching among threads 'Use FIXINDB to set up a loop of {Z,C,PC) registers for threads 'Can be used with register remapping for multiple program instances 'Instructions trailing SWITCHD are contextually in the next thread JMPLIST jumps to a base address (S/@/@@) plus index (D). JMPLIST D,@relative9 'jump to D plus 9-bit relative address JMPLIST D,@@relative16 'jump to D plus 16-bit relative address JMPLIST D,S 'jump to D plus S LOCBASE converts a 16-bit hub instruction address into a normal 18-bit hub address for use with RDxxxx/WRxxxx instructions: LOCBASE D,@relative9 'get 18-bit hub address from 9-bit relative address into D LOCBASE D,@@relative16 'get 18-bit hub address from 16-bit relative address into D LOCBASE D,S 'get 18-bit hub address from S[15:0] into D LOCBYTE/LOCWORD/LOCLONG are like LOCBASE, but use the initial D value as an index which gets scaled and added to the normal 18-bit hub address: LOCBYTE D,@relative9 'get 18-bit byte-indexed hub address from 9-bit relative address into D LOCBYTE D,@@relative16 'get 18-bit byte-indexed hub address from 16-bit relative address into D LOCBYTE D,S 'get 18-bit byte-indexed hub address from S[15:0] into D LOCWORD D,@relative9 'get 18-bit word-indexed hub address from 9-bit relative address into D LOCWORD D,@@relative16 'get 18-bit word-indexed hub address from 16-bit relative address into D LOCWORD D,S 'get 18-bit word-indexed hub address from S[15:0] into D LOCLONG D,@relative9 'get 18-bit long-indexed hub address from 9-bit relative address into D LOCLONG D,@@relative16 'get 18-bit long-indexed hub address from 16-bit relative address into D LOCLONG D,S 'get 18-bit long-indexed hub address from S[15:0] into D Remember that @@ is going to generate an AUGS instruction. LOCPTRA/LOCPTRB convert 16-bit constant hub instruction addresses into normal 18-bit hub addresses and then store them into into PTRA/PTRB: LOCPTRA #absolute16 'get 18-bit hub address into PTRA from 16-bit absolute instruction address LOCPTRA @relative16 'get 18-bit hub address into PTRA from 16-bit relative instruction address LOCPTRB #absolute16 'get 18-bit hub address into PTRB from 16-bit absolute instruction address LOCPTRB @relative16 'get 18-bit hub address into PTRB from 16-bit relative instruction address There are five assembler directives which are used to position instructions and set cog vs hub assembly modes: ORGH absolute16 'set 16-bit-address hub mode, advances to absolute16 and sets origin ORGH 'set 16-bit-address hub mode, initial state in DAT block ORG absolute9 'set 9-bit-address cog mode, sets origin to absolute9 ORG 'set 9-bit-address cog mode, sets origin to 0 ORGF absolute9 'advances to absolute9, must be in cog mode RES regcount 'reserves regcount locations, must be in cog mode RES 'reserves 0 locations, must be in cog mode FIT address 'errors out if address exceeded, works in both modes FIT 'if cog mode, error if origin > $1F2; if hub mode, error if origin > $10000 Here is an example PASM application (use F11 to download) which demonstrates hub execution: orgh $380 '$380 = 18-bit load address $E00 org 'internal cog code jmp @go 'jump to hub memory x long 3 'cog register variable orgh $1000 'some hub code at $1000 go incmod x,#3 jmplist x,@@list orgh $1400 'some hub code at $1400 list jmp @z0 jmp @z1 jmp @z2 jmp @z3 orgh $1800 'some hub code at $1800 z0 notp #0 jmp @go z1 notp #1 jmp @go z2 notp #2 jmp @go z3 notp #3 jmp @go
Comments
Excellent news Chip, now we all get to play Thanks!!!!
Time to go do some reading and playing!
Thanks Chip!!!!
You may want to retitle this thread - I thought it was just an update about how HubExec was implemented, not an actual Chipmas present attached!!
time to play
Thanks Chip, amazing work as always
Not quite a balls demo but it IS running in HUBEXEC mode!
Monitor area $800-$DFF ($200-$37F hubexec) is a great place for monitor / loader / crypto routines as they could be called from cogs/hubexec (but I am guessing you already take advantage of this)
I had to rebuild my dev box since last using the Quartus tools so I'm starting from the ground up.
C.W.
I started back at the beginning of the Prop2_docs.txt. Seems a little dense or maybe it is me.
Maybe it is the English part of it seems like it would read better in Mandarin or maybe Australian:)
Brian?
The Balls demo is pretty hypnotic!
Leon, I seem to recall one of the previous releases mentioning that the Parallax board for the Nano was now required to run the emulation.
C.W.
That is OK.
For runing monitor You not need addon PCB --- You need only find RX,TX and Res pins and connect serial to them
Brilliant!
C.W.
These ideas all came from you guys. I just implemented them. It's way better than what I would have thought to make, myself.
Leon, I believe we are making some more DE0-Nano adapter boards. If so, we'll send you one.
Writing good manuals is an art that I'm not so good at, yet. But, I'm finding it easier to do than at first. To be able to convey some concepts clearly in a brief amount of text is a skill I hope to develop.
There's really no reason the data sheet for this chip needs to be more than 50 pages. The original Microchip PIC16C5x data sheet was a gem. When I first read it, I thought "How do you get anything done with this simple of a chip?", but by experimenting, I learned. Everything I needed to know was eventually gleaned from that datasheet. It was only maybe 20 pages. I love stuff like that. A lot of things today seem so haphazard and disjointed that I'm not even inspired to learn how to use them. If I thought something was done right, I'd be all over it.
Thanks, Chip.
JMPLIST jumps to a base address (S/@/@@) plus index (D).
Why You not named this instruction ---> JMPRELS ---
I think that name are more logical
Or even more logical, to follow what others have done in the past - if the Instruction is doing this
JMP @S+D
then simply call it that.
Makes code more readable, and the opcode is pretty much self documenting.
Assemblers have been parsing that sort of code for decades.
Thanks for the update! I don't see the CALL instruction that puts its return address in a register. Did I miss something? That instruction will be very useful for PropGCC and probably other compiled languages.
Thanks,
David
HUB EXEC looks good, TRACE extensions look good too.
Cheers
Brian
I think its only purpose will be to jump into a list of jumps to realize a jump table. Relative addresses are already part of the scheme. This represents a compound relative address whose run-time term (D) is not known at compile time, so is likely only useful for jump tables, or lists of jumps.
A simple name like 'JMP @S+D' won't work in the case where @S is just S, because 'S+D' would be something that the assembler would want to resolve at compile time. So, it needs to have some name other than just JMP. The name JMPREL, to me, implies 'relative to where we are', not relative to some other place, like where a list is.
That's the next thing I'm going to address. I haven't forgotten. I had so much on my plate with hub exec that I couldn't deal with anything extra.
It just needs a slightly more context aware assembler, and you can support this
Prop2_Docs(rr).txt
As an alternative to JMPLIST perhaps something like JTABLE could be used? Otherwise I prefer JMPLIST over anything else so far.
Thanks for making those corrections, Cluso99.