Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Cluso99 · 2019-03-16 13:40

@AJL,
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.

But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.

evanh · 2019-03-16 14:02

C and Z are equals for this. You've got a real fixation on Z being somehow distinct there Cluso. But then I knew that already.

AJL · 2019-03-16 23:06

Cluso99 wrote: »

@AJL,
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.

But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.

I find it interesting the way you are prepared to continue arguing from a point of ignorance. This mainframe used a set of comparator circuits to determine Equality or not, it did not perform a subtraction (inverted addition). Maybe that also explains why the MIPS architecture doesn't even have a zero flag.

As every processor that I have ever worked with stores the Z bit in it's raw form when the Status Register (whatever it is named) is pushed to, or retrieved from the stack, I find your logic at odds with processor design.

I find that I've reached the point where I have to accept that you are unreachable in your tower of self-confirming logic. I've built my bridge, and now I'm over it. You'll hear nothing more on this from me.

Cluso99 · 2019-03-17 00:03

@AJL
It's a shame I threw out the schematics for the 1969 mini design I worked on or I could show you what I mean.
It doesn't resolve the different interpretations in different instructions tho' within the same P2.
Anyway, it is what it is, so I'll give up trying to explain my reasoning.

Rayman · 2019-03-17 16:58

I'm using testp right now... I don't like the nomenclature, it is confusing...

Rayman · 2019-03-17 20:45

Is the use of the S field for xcont on horizontal sync in the VGA driver documented anywhere?

Looks like #1 makes it do sync, but I can't find anything in the docs about this...

Also, is there anyway to input the actual hsync pin's state?
I'm thinking you can't because it's in a smartpin mode...

evanh · 2019-03-18 02:12

Just looking at source right now ...

blank       call    #hsync          'blank lines
            xcont   m_vi,#0
    _ret_   djnz    x,#blank

hsync       xcont   m_bs,#0         'horizontal sync
            xcont   m_sn,#1
    _ret_   xcont   m_bv,#0

m_bs        long    $CF000000+16        'before sync
m_sn        long    $CF000000+96        'sync
m_bv        long    $CF000000+48        'before visible
m_vi        long    $CF000000+640       'visible

m_rf        long    $7F000000+640       'visible rlong 8bpp lut

My first reading of streamer docs tells me $C command means 32-bit immediate mode, which says S is the output data.

Immediate mode
S/# provides 32 bits of data which are directly output for the duration of the command.

The rest of the config says output to all four DAC channels. So a byte value of #1 is fed to DAC0 (Sync pin I presume) for the duration of 96 pixels. I don't know why only #1 and not #200 or something.

evanh · 2019-03-18 02:59

Solved. The reason is SETCMOD instruction configures special video encoding hardware on the data going to the four DACx channels. For VGA output, bit 0 of DAC0 is duplicated to all 8 bits of its physical DAC, with an option polarity inversion.

Oddly, I don't think cmod is doing anything else, other than adding some lag, when it comes to RGB. It could be disabled and use #255 for the sync instead.

Rayman · 2019-03-23 20:06

Maybe I'm remembering wrong, but isn't something like:

setq      #8
dirh      #Pin_BusStart

Supposed to change dir for several pins at a time?

evanh · 2019-03-23 21:43

That's only in the v33 files for the respin - https://forums.parallax.com/discussion/169695/new-fpga-files-for-next-silicon-version-5th-final-release-contains-new-rom/p1

Rayman · 2019-03-24 14:32

Does the streamer block COGATN?

I tried using COGATN inside the VGA driver field loop and it didn't seem to work...
Not until we added this "rdfast ##1<<31,..." anyway...

ozpropdev · 2019-03-24 23:32

In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this

	dirh	#vga_basepin + 3 << 6

wjich replaces the current method of

	dirh	#vga_basepin
	dirh	#vga_basepin+1
	dirh	#vga_basepin+2
	dirh	#vga_basepin+3

evanh · 2019-03-25 00:55

Rayman wrote: »

Does the streamer block COGATN?

I tried using COGATN inside the VGA driver field loop and it didn't seem to work...
Not until we added this "rdfast ##1<<31,..." anyway...

The streamer instructions block cog execution. They only return once that streamer command is running. There is no double buffering with them.

But that just means that any subsequent instructions, like COGATN, are occurring in time with the same video actions. For example, the following issues a COGATN at the leading edge of the hsync pulse:

                xcont   m_bs,#0         'horizontal sync
                xzero   m_sn,#1
                cogatn  #1
        _ret_   xcont   m_bv,#0

cgracey · 2019-03-25 05:42

ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
	dirh	#vga_basepin + 3 << 6
wjich replaces the current method of
	dirh	#vga_basepin
	dirh	#vga_basepin+1
	dirh	#vga_basepin+2
	dirh	#vga_basepin+3

Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.

jmg · 2019-03-25 05:45

cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
	dirh	#vga_basepin + 3 << 6
wjich replaces the current method of
	dirh	#vga_basepin
	dirh	#vga_basepin+1
	dirh	#vga_basepin+2
	dirh	#vga_basepin+3
Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.

.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?

Cluso99 · 2019-03-25 08:04

jmg wrote: »
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
	dirh	#vga_basepin + 3 << 6
wjich replaces the current method of
	dirh	#vga_basepin
	dirh	#vga_basepin+1
	dirh	#vga_basepin+2
	dirh	#vga_basepin+3
Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.
.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?

Good suggestion but that can come later.

DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.

DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.

DIRx/DRVx/FLTx etc are in the same category.

Should we use .. or : ???

cgracey · 2019-03-25 10:59

Cluso99 wrote: »
jmg wrote: »
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
	dirh	#vga_basepin + 3 << 6
wjich replaces the current method of
	dirh	#vga_basepin
	dirh	#vga_basepin+1
	dirh	#vga_basepin+2
	dirh	#vga_basepin+3
Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.
.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?
Good suggestion but that can come later.

DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.

DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.

DIRx/DRVx/FLTx etc are in the same category.

Should we use .. or : ???

Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.

For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.

R Baggett · 2019-03-25 11:00

I, for one could use .. without pulling out the manual. Very clear what will happen.

Cluso99 · 2019-03-25 14:22

cgracey wrote: »
Cluso99 wrote: »
jmg wrote: »
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
	dirh	#vga_basepin + 3 << 6
wjich replaces the current method of
	dirh	#vga_basepin
	dirh	#vga_basepin+1
	dirh	#vga_basepin+2
	dirh	#vga_basepin+3
Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.
.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?
Good suggestion but that can come later.

DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.

DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.

DIRx/DRVx/FLTx etc are in the same category.

Should we use .. or : ???
Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.

For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.

I knew there were others. The BITx ops are a great addition too

What do the C and Z do with multiple bits in these instructions?

This is really going to speed up and minimise code, including the spin interpreter!

evanh · 2019-07-17 11:17

Chip,
I've had another look at my attempt to match up understanding of the instruction pipeline. Last time I presented it it didn't work for you, so I've rotated my view by one clock and added some more detail. I think this will fit reality this time. Top diagram is your one, bottom diagram is mine lined up showing what I feel is clarifying the same stages. I'm keen to get some feedback again.

-------------------------------
  Instruction Pipeline Timing
-------------------------------

        |                   |                   |                   |                   |                   |                   |
rdRAM Ib|------+            |           rdRAM Ic|------+            |           rdRAM Id|------+            |           rdRAM Ie|
        |      |            |                   |      |            |                   |      |            |                   |
latch Da|--+   +--> rdRAM Db|---------> latch Db|--+   +--> rdRAM Dc|---------> latch Dc|--+   +--> rdRAM Dd|---------> latch Dd|
latch Sa|--+   +--> rdRAM Sb|---------> latch Sb|--+   +--> rdRAM Sc|---------> latch Sc|--+   +--> rdRAM Sd|---------> latch Sd|
latch Ia|--+   +--> latch Ib|---------> latch Ib|--+   +--> latch Ic|---------> latch Ic|--+   +--> latch Id|---------> latch Id|
        |  |                |                   |  |                |                   |  |                |                   |
        |  +---------------ALU--------> wrRAM Ra|  +---------------ALU--------> wrRAM Rb|  +---------------ALU--------> wrRAM Rc|
        |                   |                   |                   |                   |                   |                   |
        |                   |stall/done = 'gox' |                   |stall/done = 'gox' |                   |stall/done = 'gox' |
        |       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |

 clk     _________           _________           _________           _________           _________           _________           
________|    1    |_________|    2    |_________|    3    |_________|    4    |_________|    5    |_________|    6    |_________|

 PCflux |............==c==..|...................|............==d==..|...................|............==e==..|...................|
 PClatc |...................|=c=................|...................|=d=................|...................|=e=................|
 Ifetch |...................|...======c=======..|...................|...======d=======..|...................|...======e=======..|
 Ilatch |=b=................|...................|=c=................|...................|=d=................|...................|
 Odecod |...=b=.............|...................|...=c=.............|...................|...=d=.............|...................|
 SDfetc |......=====b=====..|...................|......=====c=====..|...................|......=====d=====..|...................|
 SDlatc |...................|=b=................|...................|=c=................|...................|=d=................|
 ALUs1  |...................|...=======b======..|...................|...=======c======..|...................|...=======d======..|
 ALUs2  |========a========..|...................|========b========..|...................|========c========..|...................|
 Rwrite |...................|====a====..........|...................|====b====..........|...................|====c====..........|


 PCflux - New value for Program Counter.
 Ifetch - Instruction fetching from CogRAM, LUT or FIFO.
 Odecod - Opcode decode for S/D operands.
 SDfetc - S and D parallel fetches, from CogRAM, if instruction requires.
  ALU   - Execute, two stages, s1, s2.
 Rwrite - Result write back to D destination, to CogRAM, if required.

EDIT: Added PClatc, improve description of PCflux, and merge ALU descriptions.

cgracey · 2019-07-18 10:16

Evanh, that drawing hurts my head. It might be right. I will look at it more later. I need some sleep before I can figure it out.

evanh · 2019-07-18 10:35

Okay, no problem.

evanh · 2019-07-18 10:47

Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.

Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.

cgracey · 2019-07-18 19:08

evanh wrote: »

Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.

Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.

Every bit storage element is a register, not a latch. So, visualize flip-flops.

evanh · 2019-07-19 07:32

Another attempt. I've paralleled the prior stage latching with the associated fetches. I've never seen a pipeline schematic so it's not something I've considered before. Presumably that's called forwarding. So it occurs at all three registered stages.

I've included both get/go phase variants to choose from:

-------------------------------
  Instruction Pipeline Timing
-------------------------------

        |                   |                   |                   |                   |                   |                   |
rdRAM Ib|------+            |           rdRAM Ic|------+            |           rdRAM Id|------+            |           rdRAM Ie|
        |      |            |                   |      |            |                   |      |            |                   |
latch Da|--+   +--> rdRAM Db|---------> latch Db|--+   +--> rdRAM Dc|---------> latch Dc|--+   +--> rdRAM Dd|---------> latch Dd|
latch Sa|--+   +--> rdRAM Sb|---------> latch Sb|--+   +--> rdRAM Sc|---------> latch Sc|--+   +--> rdRAM Sd|---------> latch Sd|
latch Ia|--+   +--> latch Ib|---------> latch Ib|--+   +--> latch Ic|---------> latch Ic|--+   +--> latch Id|---------> latch Id|
        |  |                |                   |  |                |                   |  |                |                   |
        |  +---------------ALU--------> wrRAM Ra|  +---------------ALU--------> wrRAM Rb|  +---------------ALU--------> wrRAM Rc|
        |                   |                   |                   |                   |                   |                   |
        |                   |stall/done = 'gox' |                   |stall/done = 'gox' |                   |stall/done = 'gox' |
        |       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |

 clk     _________           _________           _________           _________           _________           _________           
________|    1    |_________|    2    |_________|    3    |_________|    4    |_________|    5    |_________|    6    |_________|

 PCflux |...................|...======c======...|...................|...======d======...|...................|...======e======...|
 PClatc |=b=................|...................|=c=................|...................|=d=................|...................|
 Ifetch |====b====..........|...................|====c====..........|...................|====d====..........|...................|
 Odecod |.........===b===...|...................|.........===c===...|...................|.........===d===...|...................|
 Ilatch |...................|=b=................|...................|=c=................|...................|=d=................|
 SDfetc |...................|====b====..........|...................|====c====..........|...................|====d====..........|
 Fdecod |...................|.........===b===...|...................|.........===c===...|...................|.........===d===...|
 SDlatc |=a=................|...................|=b=................|...................|=c=................|...................|
 ALUs1  |========a=======...|...................|========b=======...|...................|========c=======...|...................|
 ALUs2  |...................|========a=======...|...................|========b=======...|...................|========c=======...|
 Rwrite |===................|...................|=a=................|...................|=b=................|...................|


 PCflux |...======c======...|...................|...======d======...|...................|...======e======...|...................|
 PClatc |...................|=c=................|...................|=d=................|...................|=e=................|
 Ifetch |...................|====c====..........|...................|====d====..........|...................|====e====..........|
 Odecod |...................|.........===c===...|...................|.........===d===...|...................|.........===e===...|
 Ilatch |=b=................|...................|=c=................|...................|=d=................|...................|
 SDfetc |====b====..........|...................|====c====..........|...................|====d====..........|...................|
 Fdecod |.........===b===...|...................|.........===c===...|...................|.........===d===...|...................|
 SDlatc |...................|=b=................|...................|=c=................|...................|=d=................|
 ALUs1  |...................|========b=======...|...................|========c=======...|...................|========d=======...|
 ALUs2  |========a=======...|...................|========b=======...|...................|========c=======...|...................|
 Rwrite |...................|=a=................|...................|=b=................|...................|=c=................|


 PCflux - Setting up of new value for Program Counter.
 Ifetch - Instruction fetching from CogRAM, LUT or FIFO.
 Odecod - Opcode decode for S/D operands.
 Fdecod - Fully decode the instruction.  Probably mostly fan-out to feed selected logic block.
 SDfetc - S and D parallel fetches, from CogRAM, if instruction requires.
  ALU   - Execute, two stages, s1, s2.
 Rwrite - Result write back to D destination, to CogRAM, if required.

cgracey · 2019-07-19 20:48

Evanh, I think what's throwing me off is the variably-placed width-varying ===x=== markers. These paths take different times, of course, but they all must resolve before the clock edge. So, the edge is the only event that matters. It looks like you would like to know when the PC updates. I could add that to my drawing.

evanh · 2019-07-19 21:51

Those propagation widths, while not necessarily accurate sizes, are quite important to my attempt. For example, I've made inline register clocking have a three character, =x=, propagation time. Also, note the gap at the end of each clock interval, before the new clock edge. That indicates spare overclocking overhead leeway.

evanh · 2019-07-19 22:18

I've made the propagation for cogRAM fetching a lot longer than the latches as a guess to account for the level of mux'ing.

As for my "forwarding", I did that to handle the fact that to use synchronous registers for cogRAM requires setup before the clock rise. It can't wait for the propagation of an addressing latch.

cgracey · 2019-07-20 01:14

Evanh, here's the spoiler: the ASIC tools optimize the logic cell placement and routing only to meet the timing goal, so that signals with plenty of slack get routed around the hot spots, loosing their slack, while the signals needing speed get the prime placement and shortest routes. In the end, hundreds of thousands of paths are stacked against the timing wall, forming a cliff, where the chip fails systemically if the clock period becomes too short. So, while in theory some things take less time than others, the implementation is a blob of nearly identically-timed paths that affords no possibility of speed-up via selective clock cycle shortening. When you hit the speed limit, everything fails at once.

Roy Eltham · 2019-07-20 01:35

@cgracey aren't the new chips due soon?

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments