Now I have looked on most instructions AND ---> JMPLIST is good to JUMP in function tables BUT I still are missing its counterpart
> CALLIST for simple Call that tables
Now I have looked on most instructions AND ---> JMPLIST is good to JUMP in function tables BUT I still are missing its counterpart
> CALLIST for simple Call that tables
Calls are lot more expensive than jumps, in terms of op-code requirements. A call must not only express address, but which stack to use. So, it's best to call to a JMPLIST instruction.
I took a break from writing documentation, and read the new p2 docs.
I found two potential errors:
Line 1850, DAC's
I thought we now had more restrictive mappings on video outputs?
Line 2693, Pin transfers
Should the QUAD's not be changed to WIDE's? (8 longs)
Thanks for pointing that out. Cluso had noticed some errors mentioning QUADs, and I changed those this morning to WIDEs. There were some other QUAD mentions that needed changing to WIDE, too. I didn't think about this DAC stuff. I'll get that a little later. Right now I'm implementing the LOCPTRA/LOCPTRB instructions that you thought of. I was able to forge some op-code space by making LOCINST always have an @S, since the S case was just a MOV. So, it's not too disruptive, thankfully.
Thanks for pointing that out. Cluso had noticed some errors mentioning QUADs, and I changed those this morning to WIDEs. There were some other QUAD mentions that needed changing to WIDE, too. I didn't think about this DAC stuff. I'll get that a little later. Right now I'm implementing the LOCPTRA/LOCPTRB instructions that you thought of. I was able to forge some op-code space by making LOCINST always have an @S, since the S case was just a MOV. So, it's not too disruptive, thankfully.
Chip decided to add LOCPTRA and LOCPTRB - so before y'all ask what they are...
JMP/CALL have an embedded 16 bit hub (long) address, and can reference an instruction anywhere in the hub.
LOCPTRA / LOCPTRB add to Chip's LOCxxxx instructions by allowing PTRA and PTRB to be set to any long-aligned hub address in a single instruction by embedding a 16 bit hub address (absolute or relative).
This will help greatly with accessing tables and arrays, including relative addressing of longs & arrays.
LOCPTRA #hubaddress - extends 16 bit constant to 18 bits by appending two zeros and loads PTRA with that value
LOCPTRA @hubaddress - computes relative offset from PC, extends to 18 bits, and loads PTRA with that value
LOCPTRB #/@ behave the same way.
This allows addressing any element in a 64 entry (byte/word/long) table with:
Chip decided to add LOCPTRA and LOCPTRB - so before y'all ask what they are...
JMP/CALL have an embedded 16 bit hub (long) address, and can reference an instruction anywhere in the hub.
LOCPTRA / LOCPTRB add to Chip's LOCxxxx instructions by allowing PTRA and PTRB to be set to any long-aligned hub address in a single instruction by embedding a 16 bit hub address (absolute or relative).
This will help greatly with accessing tables and arrays, including relative addressing of longs & arrays.
LOCPTRA #hubaddress - extends 16 bit constant to 18 bits by appending two zeros and loads PTRA with that value
LOCPTRA @hubaddress - computes relative offset from PC, extends to 18 bits, and loads PTRA with that value
LOCPTRB #/@ behave the same way.
This allows addressing any element in a 64 entry (byte/word/long) table with:
I continued to read the docs and found a few other things that I am unsure of...
(I will use Lnnn/mm for Line nnn Column mm)
L537/1: INDA/INDB are at $1F2/3 (presume this is correct)
L404/1: Cog loads $0-$1F3 in 1017 clocks. (does it load INDA/INDB at $1F2-3 ???)
L1415/1: Cog loads $0-$1F3 (same question)
L2606/1+: Pin transfers... refers to quads - should these be wides??? (half asleep so didn't follow properly)
Attached is an updated doc with a couple of typos fixed. Prop2_Docs(rr2).txt
I continued to read the docs and found a few other things that I am unsure of...
(I will use Lnnn/mm for Line nnn Column mm)
L537/1: INDA/INDB are at $1F2/3 (presume this is correct)
L404/1: Cog loads $0-$1F3 in 1017 clocks. (does it load INDA/INDB at $1F2-3 ???)
L1415/1: Cog loads $0-$1F3 (same question)
L2606/1+: Pin transfers... refers to quads - should these be wides??? (half asleep so didn't follow properly)
Attached is an updated doc with a couple of typos fixed. Prop2_Docs(rr2).txt
After I worked in your prior edits, I found all the other QUAD references and fixed them, too.
It's true that 0..$1F3 get loaded. Before you do a SETINDx/FIXINDx, those are just normal RAM registers, since they point to themselves.
I'm hoping that I have time tonight to get a new update out which has the new LOCPTRx instructions and the fixed Prop2_Docs.txt file.
I'm sure someone has suggested this before and it's probably too late in the game to even consider it but wouldn't it be nice to have a variant of coginit/cognew that starts a COG running in hub execution mode. This would let you start a new COG almost instantly without having to wait for almost 2K of data to be loaded.
I'm sure someone has suggested this before and it's probably too late in the game to even consider it but wouldn't it be nice to have a variant of coginit/cognew that starts a COG running in hub execution mode. This would let you start a new COG almost instantly without having to wait for almost 2K of data to be loaded.
I've thought about that, too. I looked into it the other day and for some reason thought I wouldn't deal with that yet. I'll look again.
I'm sure someone has suggested this before and it's probably too late in the game to even consider it but wouldn't it be nice to have a variant of coginit/cognew that starts a COG running in hub execution mode. This would let you start a new COG almost instantly without having to wait for almost 2K of data to be loaded.
Or, you could just have the GOGINIT/COGNEW work this way altogether. If you still wanted to load instructions to cog memory, it'd take only a few lines of code to do a bulk-copy-then-jump routine, but with much more control (like not loading all $1F3 registers or "starting" at a non-zero address).
I'm sure someone has suggested this before and it's probably too late in the game to even consider it but wouldn't it be nice to have a variant of coginit/cognew that starts a COG running in hub execution mode. This would let you start a new COG almost instantly without having to wait for almost 2K of data to be loaded.
This could get very useful. With such a capability, I can envisage a model where COG(s) could be quickly directed by another COG to go compute a bunch of data and maintain up to $1F4 longs worth of state internally (plus potentially 256 extra in AUX ram). In this way it could be used dynamically to pass a reasonably large amount of information from one hub exec procedure to another. I can see it might come in rather useful for parallel processing, signal processing, rendering line buffers etc where you don't want to necessarily incur the delay of reading/writing lots of hub RAM data each time you enter your procedure as you spawn each COGs workload on the fly. It would probably be less useful for static applications where each COG is only started once, but for more dynamic applications it could be very useful.
It would be great to not have to clobber the COG RAM and be able to quickly do a COGINIT in hub mode.
Another fpga code. Good grief! I have not had time yet to load the first one Fantastic work Chip
COG START...
Yes, it might make more sense to start in hubexec mode.
We could have a small piece of boot code in hub ROM (equal/above $00800 ($200 long) hub) that would load the cog with a variable amount of hub code.
Basically we would run this with an address where the hub start address, length, and hub parameter address resides (or the first hub parameter is the length). This would be a simple mechanism.
As has been said, cog ram would no longer need to be loaded, so we could hold code there between coginits, or some other info. I could see this opening up a whole lot of other opportunities for fast code tricks.
There would be some delay for cog start to clear the appropriate registers etc, but way less than 1017 clocks. Worth more discussion me thinks
Here is the latest instruction set summary (reformatted).
(Hopefully this is ok as I am using a spreadsheet with formulae developed from the previous instruction sets) Tip: Reduce scale in IE10 by Ctl-ScrollWheel
I'm sure someone has suggested this before and it's probably too late in the game to even consider it but wouldn't it be nice to have a variant of coginit/cognew that starts a COG running in hub execution mode. This would let you start a new COG almost instantly without having to wait for almost 2K of data to be loaded.
LOCTPTRA/LOCPTRB were added and the Prop2_Docs.txt was updated. Note that you now must precede any cog-resident code with ORG, as ORGH is the default mode at the beginning of every DAT block.
Those have changed, and have been updated in the latest .zip file. Thanks for finding these things. The newest Pro2_Docs.txt has all the fixes you found, plus some others.
LOCTPTRA/LOCPTRB were added and the Prop2_Docs.txt was updated. Note that you now must precede any cog-resident code with ORG, as ORGH is the default mode at the beginning of every DAT block.
I have no fears. The nice thing about all the advanced features is that you only have to use them when you need them and
when you need them, they make sense. The rest of the time, the Prop2 is just a Prop1 on steroids:)
Comments
Now I have looked on most instructions AND ---> JMPLIST is good to JUMP in function tables
BUT I still are missing its counterpart
> CALLIST for simple Call that tables
Calls are lot more expensive than jumps, in terms of op-code requirements. A call must not only express address, but which stack to use. So, it's best to call to a JMPLIST instruction.
You need to launch it with some pre-code that establishes RX/TX pins. I'll post an example shortly.
I have been anxiously awaiting a new batch of DE0-Nano adapter boards since last summer, please sign me up for one, too!
You need to put this code at the very start of the Rom_Monitor.spin file:
That will launch it and reset the org, with the Rom_Monitor code following.
Is this similar to the XMM modes of PropGCC and Catalina?
Never mind, that was a dumb question...
I guess it's similar in that you can write Assembler code bigger than cog space.
XMM lets you write C code bigger than HUB space...
LMM used to be needed to write code larger than what will fit in a cog
hubexec makes LMM obsolete, and is MUCH faster
At the moment, there are no compilers for hubexec mode, but that is certain to change as it is simpler and faster than LMM.
XMM and CMM will still be around.
I took a break from writing documentation, and read the new p2 docs.
I found two potential errors:
Line 1850, DAC's
I thought we now had more restrictive mappings on video outputs?
Line 2693, Pin transfers
Should the QUAD's not be changed to WIDE's? (8 longs)
Thanks for pointing that out. Cluso had noticed some errors mentioning QUADs, and I changed those this morning to WIDEs. There were some other QUAD mentions that needed changing to WIDE, too. I didn't think about this DAC stuff. I'll get that a little later. Right now I'm implementing the LOCPTRA/LOCPTRB instructions that you thought of. I was able to forge some op-code space by making LOCINST always have an @S, since the S case was just a MOV. So, it's not too disruptive, thankfully.
Thanks Chip. LOCPTRA/LOCPTRB will be very useful to have.
JMP/CALL have an embedded 16 bit hub (long) address, and can reference an instruction anywhere in the hub.
LOCPTRA / LOCPTRB add to Chip's LOCxxxx instructions by allowing PTRA and PTRB to be set to any long-aligned hub address in a single instruction by embedding a 16 bit hub address (absolute or relative).
This will help greatly with accessing tables and arrays, including relative addressing of longs & arrays.
LOCPTRA #hubaddress - extends 16 bit constant to 18 bits by appending two zeros and loads PTRA with that value
LOCPTRA @hubaddress - computes relative offset from PC, extends to 18 bits, and loads PTRA with that value
LOCPTRB #/@ behave the same way.
This allows addressing any element in a 64 entry (byte/word/long) table with:
LOCPTRA @table
RDLONG element,PTRA[(-32..31) * scale]
With optional pre/post decrement/increment, so it can also walk arrays
LOCPTR{A|B} lower memory usage by saving one long on every array reference, and also support a "frame pointer" for small/medium size stack frames.
(I will use Lnnn/mm for Line nnn Column mm)
L537/1: INDA/INDB are at $1F2/3 (presume this is correct)
L404/1: Cog loads $0-$1F3 in 1017 clocks. (does it load INDA/INDB at $1F2-3 ???)
L1415/1: Cog loads $0-$1F3 (same question)
L2606/1+: Pin transfers... refers to quads - should these be wides??? (half asleep so didn't follow properly)
Attached is an updated doc with a couple of typos fixed.
Prop2_Docs(rr2).txt
After I worked in your prior edits, I found all the other QUAD references and fixed them, too.
It's true that 0..$1F3 get loaded. Before you do a SETINDx/FIXINDx, those are just normal RAM registers, since they point to themselves.
I'm hoping that I have time tonight to get a new update out which has the new LOCPTRx instructions and the fixed Prop2_Docs.txt file.
I've thought about that, too. I looked into it the other day and for some reason thought I wouldn't deal with that yet. I'll look again.
Or, you could just have the GOGINIT/COGNEW work this way altogether. If you still wanted to load instructions to cog memory, it'd take only a few lines of code to do a bulk-copy-then-jump routine, but with much more control (like not loading all $1F3 registers or "starting" at a non-zero address).
This could get very useful. With such a capability, I can envisage a model where COG(s) could be quickly directed by another COG to go compute a bunch of data and maintain up to $1F4 longs worth of state internally (plus potentially 256 extra in AUX ram). In this way it could be used dynamically to pass a reasonably large amount of information from one hub exec procedure to another. I can see it might come in rather useful for parallel processing, signal processing, rendering line buffers etc where you don't want to necessarily incur the delay of reading/writing lots of hub RAM data each time you enter your procedure as you spawn each COGs workload on the fly. It would probably be less useful for static applications where each COG is only started once, but for more dynamic applications it could be very useful.
It would be great to not have to clobber the COG RAM and be able to quickly do a COGINIT in hub mode.
Roger
Chip, to do this, just copy 8 longs from HUB into the COG at $0, then start executing at $0 -- these 8 instructions would have the bootstrap code.
COG START...
Yes, it might make more sense to start in hubexec mode.
We could have a small piece of boot code in hub ROM (equal/above $00800 ($200 long) hub) that would load the cog with a variable amount of hub code.
Basically we would run this with an address where the hub start address, length, and hub parameter address resides (or the first hub parameter is the length). This would be a simple mechanism.
As has been said, cog ram would no longer need to be loaded, so we could hold code there between coginits, or some other info. I could see this opening up a whole lot of other opportunities for fast code tricks.
There would be some delay for cog start to clear the appropriate registers etc, but way less than 1017 clocks. Worth more discussion me thinks
(Hopefully this is ok as I am using a spreadsheet with formulae developed from the previous instruction sets)
Tip: Reduce scale in IE10 by Ctl-ScrollWheel
InstructionSet_20140126.spin
There are two errors in this file "ff" s/be "00" for REPS and FIXINDx/SETINDx (see red in above code)
The matter has been broached in the past - http://forums.parallax.com/showthread.php/152079-Hub-Execution-Model-Thread-(split-from-blog)?p=1226871&viewfull=1#post1226871
LOCTPTRA/LOCPTRB were added and the Prop2_Docs.txt was updated. Note that you now must precede any cog-resident code with ORG, as ORGH is the default mode at the beginning of every DAT block.
Those have changed, and have been updated in the latest .zip file. Thanks for finding these things. The newest Pro2_Docs.txt has all the fixes you found, plus some others.
I can't wait to get the RoboPi docs done so I can play... heck, I'll probably play once the text is done, before I take the build pics
when you need them, they make sense. The rest of the time, the Prop2 is just a Prop1 on steroids:)