IGNORE -Simplified method for accessing Extended COG RAM & HUB RAM including HUBEXEC
Cluso99
Posts: 18,069
Simplified method for accessing Extended COG RAM & HUB RAM including HUBEXEC
Assumptions
1. Cog ram, extended cog ram and hub ram totals a maximum address of 1MB
2. A flat address space is used, with 20bits representing the max 1MB
3. Cog ram is $00000-001FF, extended cog ram is $00200-003FF (presume +2KB), hub ram follows on at $00400-xxxxx
4. A standard instruction is 4 clocks (as per the P1 ie IdSDeR)
5. All ram is single port
6. Extended cog ram is only accessible as longs (where address bits1:0=00)
Instructions
7. JMPRET (JMP/CALL/RET) remains unchanged, except that S & D refer to the addresses within the current 2KB cog page
8. LJMP/LRET [#]D:S = a new instruction to jump to an 18bit address +"00" (ie Any address on a long boundary cog, extended cog, or hub)
9. LCALL [#]D:S = a new "paired" instruction which saves PC+8 (ie the address of the instruction following the next instruction) representing the address
to save the 18bit "long" return address. The next instruction will be the LJMP instruction which executes the LCALL.
This means that a LCALL/LJMP pair of instructions will take 8 clocks, but a LJMP or LRET will only take 4 clocks.
10. AUGDS [#]D,[#]S = a new instruction that will extend the following instruction's D & S addresses to become an 18bit+"00" address.
This means that all instructions such as MOV, AND, etc can operate on cog, extended cog, and hub ram. When operating on hub, it may take 2 hub
round-robbin cycles, depending on the type of instruction. Only "longs" can be accessed this way.
Can anyone see anything wrong with this logic???
Assumptions
1. Cog ram, extended cog ram and hub ram totals a maximum address of 1MB
2. A flat address space is used, with 20bits representing the max 1MB
3. Cog ram is $00000-001FF, extended cog ram is $00200-003FF (presume +2KB), hub ram follows on at $00400-xxxxx
4. A standard instruction is 4 clocks (as per the P1 ie IdSDeR)
5. All ram is single port
6. Extended cog ram is only accessible as longs (where address bits1:0=00)
Instructions
7. JMPRET (JMP/CALL/RET) remains unchanged, except that S & D refer to the addresses within the current 2KB cog page
8. LJMP/LRET [#]D:S = a new instruction to jump to an 18bit address +"00" (ie Any address on a long boundary cog, extended cog, or hub)
9. LCALL [#]D:S = a new "paired" instruction which saves PC+8 (ie the address of the instruction following the next instruction) representing the address
to save the 18bit "long" return address. The next instruction will be the LJMP instruction which executes the LCALL.
This means that a LCALL/LJMP pair of instructions will take 8 clocks, but a LJMP or LRET will only take 4 clocks.
10. AUGDS [#]D,[#]S = a new instruction that will extend the following instruction's D & S addresses to become an 18bit+"00" address.
This means that all instructions such as MOV, AND, etc can operate on cog, extended cog, and hub ram. When operating on hub, it may take 2 hub
round-robbin cycles, depending on the type of instruction. Only "longs" can be accessed this way.
Can anyone see anything wrong with this logic???
Comments
Even in the Prop1 the Cogs seem fine to me. The load and store, to and from Hub, operations serve it well. No need for direct bit-bashing of HubRAM after all. The Prop2 will take that further with HubExec being considered default programming model.
The big question is how is Chip going to keep HubExec pipeline stalls to a minimum.
Not immediately.
If I were programming on this version, I would:
1. Immediately LJMP to $200.
2. Use the lower $1FF registers as working memory
3. Start the real code at $200.
To make that work, I'd also expect that COGNEW would load $400 longs.
Is not easier just to move into a bigger opcode/word (64 bit) ?
2. 64bit is too much of a change.
General
What have I been thinking!!!
A long time ago I realised the simplest solution was to make JMPRET (JMP/CALL/RET) and TJ/DJ all relative. This give +/- 256 longs.
Then perhaps a LJMP addition may be helpful.
With a flat memory model and the addition of a LJMP, plus...
1. PC - program counter of 18bits+"00"
2. permitting instruction fetching from hub (with wait for hub-slot)
...we would have an extremely simple full hub speed hubexec.
Off to start a new thread
So you think that 1 MB ought to be enough for anybody? ;-D
The problem is that you CANNOT add "00". I must be missing something. I feel that I should not need to explain this to you. Lets see cog.v:
xxxxxx ZCRICCCC DDDDDDDDD SSSSSSSSS
x -> Instruction opcode. Max. # of Opcodes = 64 (2^6)
D -> Destination. Max COG RAM size = 512 words (2^9)
S -> Source (or Inmediate). 9 bits or 512 words (32 bit size).
P1 define 60 opcodes of a maximum of 64 opcodes possible.
There are 4 possible opcodes leff unassigned. But in fact, those are pre-assigned to MUL,MULS,ENC,ONES:
Now lets take a look to the "HOT" P2 opcode summary (taken from Prop2_Instructions_12_17_13.txt):
Please note the not_so_fine_irony added between the lines into the above code.
2. 64bit is too much of a change.
There is no shortcut to insufficient opcode bits, without pain and regret. So please, everyone, forget about paging (or segments).
Maybe I used the wrong words, maybe append "00" would have been better
Surely you didn't think I meant add (as in +)!
This is a P1V being done in an FPGA, so yes, 1MB seems fine for a P1V. And it just fits nicely with D, S and "00" concatenated to give 20 bits.
Of the 4 spare P1 opcodes, two have been basically agreed to be MUL and either MULS or some form of DIV. That leaves 2 opcodes without changing existing code.
I have proposed one be an AUGDS and the other to support HUBEXEC (LJMPRET or similar, with some spare subsets available).
A 64bit design is not a P1V, nor a P2. Maybe a P3 ???