I have been playing with some FPGAs recently. So I thought I would try my hand again with the P1V. This concept should work with smaller FPGAs, just with less RAM to spread around. Here is the concept, picture first.
This is how it works...
There are 9 Blocks of 64KB (16Kx32) RAM, arranged as 32bits wide, accessable as bytes/words/longs, on long boundaries (same as HUB RAM in P1). If there is less RAM available in the FPGA, then 9 blocks of a smaller size. If less that 8 cogs, then the number of RAM Blocks will be the number of COGs +1.
The Central RAM Block forms the HUB RAM. This is accessable by each COG in turn, with each COG getting a turn (time slot) every 8 clocks (was 1:16 for P1).
The 8 remaining RAM Blocks have a Data & Address Bus running around the outside. Each of the 8 COGs have their Data & Address Bus connected to this Bus Ring.
This outer ring of Data & Address Bus has gate/switches to isolate each RAM Block. Each RAM Block is connected to this Ring Bus, and to its' respective COG. If all isolators were active, then each RAM Block would only be connected to its' respective COG.
On Power Up or Reset, COG-0 is started (becomes active). No other COG is active (running).
When a COG becomes active, it automatically turns its'
Bus Isolator switch ON (i.e. the bus is isolated at this respective point).
The effect of these isolation switches is such that when only a single COG is active, that COG will own ALL
8 RAM Blocks. (i.e. 8*64KB=512KB)
For example, COG-0 when active cuts the Bus between COG-0 & COG-1. COG-0 will have its' own 64KB RAM Block at $0_0000-$0_FFFF, plus COG-7's RAM at $1_0000-$1_FFFF, plus COG-6, 5, 4, 3, 2 and COG-1 at $8_0000-$8_FFFF.
Now when COG-1 becomes active, it isolates the Bus between it and COG-2, so it will only get 1 RAM Block at $0_0000-$0_FFFF, and COG-0 will lose access to COG-1's RAM at $8_0000-$8_FFFF.
By starting specific COGs, while leaving other COGs inactive, it is possible to dish out
various size COG RAMs to respective COGs.
Note that each COG now has large COG RAM, with the Program Counter extended to 20bits (to address 1MB max). Since these RAM Blocks are straight COG Memory, code can execute from the large COG RAM without penalty. It will be necessary to introduce a method (instruction variants) for accessing the larger COG Memory. For example...
A new JMPRET instruction where the jump/return registers (S & D) contain the 20-bit memory addresses. i.e. an INDIRECT JUMP/CALL instruction
A new MOV instruction where the S and/or D registers contain the 20-bit memory addresses. i.e. an INDIRET MOV instruction
Other methods are possible using an AUGDS instruction.