IGNORE -Simplified method for accessing Extended COG RAM & HUB RAM including HUBEXEC

Cluso99 · 2015-04-03 14:41

Simplified method for accessing Extended COG RAM & HUB RAM including HUBEXEC

Assumptions
1. Cog ram, extended cog ram and hub ram totals a maximum address of 1MB
2. A flat address space is used, with 20bits representing the max 1MB
3. Cog ram is $00000-001FF, extended cog ram is $00200-003FF (presume +2KB), hub ram follows on at $00400-xxxxx
4. A standard instruction is 4 clocks (as per the P1 ie IdSDeR)
5. All ram is single port
6. Extended cog ram is only accessible as longs (where address bits1:0=00)

Instructions

7. JMPRET (JMP/CALL/RET) remains unchanged, except that S & D refer to the addresses within the current 2KB cog page
8. LJMP/LRET [#]D:S = a new instruction to jump to an 18bit address +"00" (ie Any address on a long boundary cog, extended cog, or hub)
9. LCALL [#]D:S = a new "paired" instruction which saves PC+8 (ie the address of the instruction following the next instruction) representing the address
to save the 18bit "long" return address. The next instruction will be the LJMP instruction which executes the LCALL.
This means that a LCALL/LJMP pair of instructions will take 8 clocks, but a LJMP or LRET will only take 4 clocks.
10. AUGDS [#]D,[#]S = a new instruction that will extend the following instruction's D & S addresses to become an 18bit+"00" address.
This means that all instructions such as MOV, AND, etc can operate on cog, extended cog, and hub ram. When operating on hub, it may take 2 hub
round-robbin cycles, depending on the type of instruction. Only "longs" can be accessed this way.

Can anyone see anything wrong with this logic???

evanh · 2015-04-03 15:56

In effect, with HubExec, there is only 32 bit linear addressing. I see the RAM limitations of Hub as the biggest limitation of the Propeller. The Prop2's 512kB won't be enough for many.

Even in the Prop1 the Cogs seem fine to me. The load and store, to and from Hub, operations serve it well. No need for direct bit-bashing of HubRAM after all. The Prop2 will take that further with HubExec being considered default programming model.

The big question is how is Chip going to keep HubExec pipeline stalls to a minimum.

Seairth · 2015-04-04 05:54

Cluso99 wrote: »

Can anyone see anything wrong with this logic???

Not immediately.

If I were programming on this version, I would:

1. Immediately LJMP to $200.
2. Use the lower $1FF registers as working memory
3. Start the real code at $200.

To make that work, I'd also expect that COGNEW would load $400 longs.

Ramon · 2015-04-04 05:56

Cluso, I cannot see anything wrong with all that logic. But certainly I can see something wrong with the requirements: MAX 1MB.

Is not easier just to move into a bigger opcode/word (64 bit) ?

Cluso99 · 2015-04-04 16:41

Ramon wrote: »

Cluso, I cannot see anything wrong with all that logic. But certainly I can see something wrong with the requirements: MAX 1MB.

Is not easier just to move into a bigger opcode/word (64 bit) ?

1. Since there are 18 bits available and the address is for a long with "00" bits added, 20 bits of addressing is available making 1MB total. Anything less than this is fine so consider it a bit towards future-proofing.
2. 64bit is too much of a change.

General
What have I been thinking!!!

A long time ago I realised the simplest solution was to make JMPRET (JMP/CALL/RET) and TJ/DJ all relative. This give +/- 256 longs.
Then perhaps a LJMP addition may be helpful.

With a flat memory model and the addition of a LJMP, plus...
1. PC - program counter of 18bits+"00"
2. permitting instruction fetching from hub (with wait for hub-slot)
...we would have an extremely simple full hub speed hubexec.
Off to start a new thread

Ramon · 2015-04-04 20:33

1. Since there are 18 bits available and the address is for a long with "00" bits added, 20 bits of addressing is available making 1MB total. Anything less than this is fine so consider it a bit towards future-proofing.

So you think that 1 MB ought to be enough for anybody? ;-D

The problem is that you CANNOT add "00". I must be missing something. I feel that I should not need to explain this to you. Lets see cog.v:

instruction								mnem	oper	R  C  Z   +- C  D  S
----------------------------------------------------------------------------
000000 ZC0ICCCC DDDDDDDDD SSSSSSSSS		WRBYTE	D,S		_______   __________

xxxxxx ZCRICCCC DDDDDDDDD SSSSSSSSS

x -> Instruction opcode. Max. # of Opcodes = 64 (2^6)
D -> Destination. Max COG RAM size = 512 words (2^9)
S -> Source (or Inmediate). 9 bits or 512 words (32 bit size).

P1 define 60 opcodes of a maximum of 64 opcodes possible.
There are 4 possible opcodes leff unassigned. But in fact, those are pre-assigned to MUL,MULS,ENC,ONES:

000100 ZCRICCCC DDDDDDDDD SSSSSSSSS	*	<MUL>	D,S		M__M__Z   __________
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS	*	<MULS>	D,S		M__M__Z   __________
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS	*	<ENC>	D,S		E_____Z   __________
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS	*	<ONES>	D,S		E_____Z   __________

Now lets take a look to the "HOT" P2 opcode summary (taken from Prop2_Instructions_12_17_13.txt):

ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
---------------------------------------------------------------------------------------------------------------------
ZCWS		0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTE	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTEC	D,S/PTRA/PTRB		(waits for hub if 
...
ZCWS		1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD4	D,S/#
ZCWS		1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD5	D,S/#

(Things getting complicated ...)

--WS		1010010 11 I CCCC DDDDDDDDD SSSSSSSSS		UNPKRGB	D,S/#			(S 5:5:5 -> D 8:8:8)
--MS		1010011 00 I CCCC DDDDDDDDD SSSSSSSSS		ADDPIX	D,S/#			(waits one clock)
--MS		1010011 01 I CCCC DDDDDDDDD SSSSSSSSS		MULPIX	D,S/#			(waits one clock)
--MS		1010011 10 I CCCC DDDDDDDDD SSSSSSSSS		BLNPIX	D,S/#			(waits one clock)
--MS		1010011 11 I CCCC DDDDDDDDD SSSSSSSSS		MIXPIX	D,S/#			(waits one clock)

( AUGI ... HOLY GRIAL instruction. When opcode is not enough for big inmediate we can use this. !! 
And people said that was wonderful and asked: "Can we use this to S or D too ? ")

----		1111101 00 n nnnn nnnnnnnnn nnnnnnnnn		AUGI	#23bits			(appends n to upper bits of next S or D immediate)

( ... OMG ... last opcode used as a an universal black hole for more opcode instructions. 
   Using Souce bits we have  512 more instructions available (to reach 5 Watts if needed !!! )


ZCW-		1111111 ZC 0 CCCC DDDDDDDDD 000000000		COGID	D			(waits for hub) (doesn't write D if WC)
ZCW-		1111111 ZC 0 CCCC DDDDDDDDD 000000001		LOCKNEW	D			(waits for hub)
ZCW-		1111111 ZC 0 CCCC DDDDDDDDD 000000010		GETPC	D
...
----		1111111 00 x CCCC xxxxxxxxx 100011000		SETPIXW

Please note the not_so_fine_irony added between the lines into the above code.

2. 64bit is too much of a change.

There is no shortcut to insufficient opcode bits, without pain and regret. So please, everyone, forget about paging (or segments).

Cluso99 · 2015-04-04 23:07

Ramon,
Maybe I used the wrong words, maybe append "00" would have been better

Surely you didn't think I meant add (as in +)!

This is a P1V being done in an FPGA, so yes, 1MB seems fine for a P1V. And it just fits nicely with D, S and "00" concatenated to give 20 bits.

Of the 4 spare P1 opcodes, two have been basically agreed to be MUL and either MULS or some form of DIV. That leaves 2 opcodes without changing existing code.
I have proposed one be an AUGDS and the other to support HUBEXEC (LJMPRET or similar, with some spare subsets available).

A 64bit design is not a P1V, nor a P2. Maybe a P3 ???

IGNORE -Simplified method for accessing Extended COG RAM & HUB RAM including HUBEXEC

Comments