Dual Prop P1V Design
Cluso99
Posts: 18,069
I have been pondering a dual prop p1v design as follows...
Prop1:
Cogs 4, 4KB cog ram
Hub 64+KB, 1:4 access
Video on 2 cogs only
Port A accesses P0..P31 (PortB not implemented)
Prop2:
Cogs 8, 2 with 4KB and 6 with 2KB
Hub 10+ KB, 1:8 access (2KB at $0000, 8KB at $E000)
No video
Simple UART/serialiser/deserialiser per cog using vcfg/vs like registers
Port A accesses P32..P63 (Port B not implemented)
2KB of each props hubs would be true dual port, allowing hub exchanges between props at full hub speed
This 2KB would be mapped into each cogs hub space.
Initially, Prop1 and Prop2 would both have EPROMs, and both would boot as 2 distinct props.
Only the top 4KB ($F000..$FFFF) in each prop would be initialised as ROM with the booted/runner/spin (my faster spin works here too)
Prop1:
Cogs 4, 4KB cog ram
Hub 64+KB, 1:4 access
Video on 2 cogs only
Port A accesses P0..P31 (PortB not implemented)
Prop2:
Cogs 8, 2 with 4KB and 6 with 2KB
Hub 10+ KB, 1:8 access (2KB at $0000, 8KB at $E000)
No video
Simple UART/serialiser/deserialiser per cog using vcfg/vs like registers
Port A accesses P32..P63 (Port B not implemented)
2KB of each props hubs would be true dual port, allowing hub exchanges between props at full hub speed
This 2KB would be mapped into each cogs hub space.
Initially, Prop1 and Prop2 would both have EPROMs, and both would boot as 2 distinct props.
Only the top 4KB ($F000..$FFFF) in each prop would be initialised as ROM with the booted/runner/spin (my faster spin works here too)
Comments
Certainly in a FPGA target, it makes sense to deploy DualPort memory as it comes almost for free.
I'd avoid using Prop1 and Prop2 as those names are taken - Maybe PropF and PropS or PropS & PropL ?
Is the idea behind Dual Prop to allow one to run faster ?
Would a slot-table not achieve almost the same thing, without the split issues ?
If you are doing a Serial Cell, more Prop useful may be a video cell with serial overlaid (ie it can do both with bits flipped), then the Build tools Ozpropdev has could incude as many of those as you needed.
I've been measuring a UART that appears to use DDS for Baud generation, which could suit a P1V.
A counter already does DDS so that is one option, but for duplex you need to reset the receiver sampling, separate from TX, which would need 2 counters.
That could be expensive - so another approach would be a combination of DDS and binary, where the Baud is set with bitfield in a 32b Baud register as as
4 Bits = Bit Sampling divider, Range 4..16 clocks per bit - Default /16, use /4 for highest Baud eg 20 MBd
25 Bits of Adder value, for Define of Baud - this is now shared Tx & Rx & is smaller than 32b Counter cell.
Leaves 3 bits for Enable & other controls, - maybe a Alt-reg bit to allow Length.Stop bits to be set, or pack the 25+4 a little , so the Length fits into one 32b field. ?
Addit : I think with a simple Mux to the 25b adder, for a choice of 2 alignments, and limiting the Divider choice to 4:8:16, the DDS baud field can compress into 21 bits, (2+19) which will leave enough for Modes and Length bits in a single 32b value.
Covers from below 300bd to 20MBd from a 80MHz SysCLK.
Prop2 (or PropB) could run hubexec at half speed, but probably not required. This prop would be devoted to simple I/O processing, but not video due to lack of hub ram. The 2 cogs with extra cog ram could run my spin interpreter faster by having the vector table (1KB) and the stack (256 slots= 1KB) in extended cog ram instead of hub ram. KISS principle Again, KISS principle. A simple serialiser/deserialiser using CTRA for the TX and CTRB for the RX baud generator, as they normally would not be used with a UART.
Configuration would select 8/16/32 bits, with/without start/stop bits added. In 8 bit mode, an extra tx mode to send up to 4 bytes from the data register, providing the byte is not $00 (does not apply to first byte which will be sent regardless). I have used this concept in my P2 debugger by software - it's a simple concept and saves code.
I have some simple UART code partly done. It's not really that difficult.
The hardest part is trapping the MOV TXDATA, [#]source [WC] so that if WC is NOT specified it waits for TXDATA to become free; If WC IS specified, if TXDATA is free it writes the new value and returns C=0, else it does not write and returns C=1. Likewise, the MOV destination, [#]RXDATA [WC] works similarly for receive data available.
Alternately, we could read a UART register, testing for an RX bit or TX bit, such as AND TXSTATUS, [#]value [WC],[WZ].
I need to see what Chip did in the P2.
With a helper instruction or a little Verilog code, it should be possible to do USB FS.
BTW Autocorrect changed my posts text... s/be "Simple UART/serialiser/deserialiser per cog using vcfg/vscl registers"
FWIW it's possible to test both designs separately as standalone props. Once verified, it should be easy enough to combine the two designs.