Welcome to the Parallax Discussion Forums, sign-up to participate.

# Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

• Posts: 15,405
@AJL,
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.

But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.
My Prop boards:
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
• Posts: 7,904
C and Z are equals for this. You've got a real fixation on Z being somehow distinct there Cluso. But then I knew that already.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 198
Cluso99 wrote: »
@AJL,
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.

But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.

I find it interesting the way you are prepared to continue arguing from a point of ignorance. This mainframe used a set of comparator circuits to determine Equality or not, it did not perform a subtraction (inverted addition). Maybe that also explains why the MIPS architecture doesn't even have a zero flag.

As every processor that I have ever worked with stores the Z bit in it's raw form when the Status Register (whatever it is named) is pushed to, or retrieved from the stack, I find your logic at odds with processor design.

I find that I've reached the point where I have to accept that you are unreachable in your tower of self-confirming logic. I've built my bridge, and now I'm over it. You'll hear nothing more on this from me.
• Posts: 15,405
@AJL
It's a shame I threw out the schematics for the 1969 mini design I worked on or I could show you what I mean.
It doesn't resolve the different interpretations in different instructions tho' within the same P2.
Anyway, it is what it is, so I'll give up trying to explain my reasoning.
My Prop boards:
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
• Posts: 9,716
I'm using testp right now... I don't like the nomenclature, it is confusing...
Prop Info and Apps: http://www.rayslogic.com/
• Posts: 9,716
edited 2019-03-17 - 20:47:24
Is the use of the S field for xcont on horizontal sync in the VGA driver documented anywhere?

Looks like #1 makes it do sync, but I can't find anything in the docs about this...

Also, is there anyway to input the actual hsync pin's state?
I'm thinking you can't because it's in a smartpin mode...
Prop Info and Apps: http://www.rayslogic.com/
• Posts: 7,904
Just looking at source right now ...
```blank       call    #hsync          'blank lines
xcont   m_vi,#0
_ret_   djnz    x,#blank

hsync       xcont   m_bs,#0         'horizontal sync
xcont   m_sn,#1
_ret_   xcont   m_bv,#0

m_bs        long    \$CF000000+16        'before sync
m_sn        long    \$CF000000+96        'sync
m_bv        long    \$CF000000+48        'before visible
m_vi        long    \$CF000000+640       'visible

m_rf        long    \$7F000000+640       'visible rlong 8bpp lut
```

My first reading of streamer docs tells me \$C command means 32-bit immediate mode, which says S is the output data.
Immediate mode
S/# provides 32 bits of data which are directly output for the duration of the command.

The rest of the config says output to all four DAC channels. So a byte value of #1 is fed to DAC0 (Sync pin I presume) for the duration of 96 pixels. I don't know why only #1 and not #200 or something.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 7,904
Solved. The reason is SETCMOD instruction configures special video encoding hardware on the data going to the four DACx channels. For VGA output, bit 0 of DAC0 is duplicated to all 8 bits of its physical DAC, with an option polarity inversion.

Oddly, I don't think cmod is doing anything else, other than adding some lag, when it comes to RGB. It could be disabled and use #255 for the sync instead.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 9,716
edited 2019-03-23 - 20:06:42
Maybe I'm remembering wrong, but isn't something like:
```setq      #8
dirh      #Pin_BusStart
```
Supposed to change dir for several pins at a time?
Prop Info and Apps: http://www.rayslogic.com/
• Posts: 7,904
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 9,716
edited 2019-03-24 - 14:33:51
Does the streamer block COGATN?

I tried using COGATN inside the VGA driver field loop and it didn't seem to work...
Not until we added this "rdfast ##1<<31,..." anyway...
Prop Info and Apps: http://www.rayslogic.com/
• Posts: 2,566
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
```	dirh	#vga_basepin + 3 << 6
```
wjich replaces the current method of
```	dirh	#vga_basepin
dirh	#vga_basepin+1
dirh	#vga_basepin+2
dirh	#vga_basepin+3
```

Melbourne, Australia
• Posts: 7,904
Rayman wrote: »
Does the streamer block COGATN?

I tried using COGATN inside the VGA driver field loop and it didn't seem to work...
Not until we added this "rdfast ##1<<31,..." anyway...
The streamer instructions block cog execution. They only return once that streamer command is running. There is no double buffering with them.

But that just means that any subsequent instructions, like COGATN, are occurring in time with the same video actions. For example, the following issues a COGATN at the leading edge of the hsync pulse:
```                xcont   m_bs,#0         'horizontal sync
xzero   m_sn,#1
cogatn  #1
_ret_   xcont   m_bv,#0
```
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 11,711
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
```	dirh	#vga_basepin + 3 << 6
```
wjich replaces the current method of
```	dirh	#vga_basepin
dirh	#vga_basepin+1
dirh	#vga_basepin+2
dirh	#vga_basepin+3
```

Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.
• Posts: 13,923
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
```	dirh	#vga_basepin + 3 << 6
```
wjich replaces the current method of
```	dirh	#vga_basepin
dirh	#vga_basepin+1
dirh	#vga_basepin+2
dirh	#vga_basepin+3
```

Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.

.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?
• Posts: 15,405
jmg wrote: »
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
```	dirh	#vga_basepin + 3 << 6
```
wjich replaces the current method of
```	dirh	#vga_basepin
dirh	#vga_basepin+1
dirh	#vga_basepin+2
dirh	#vga_basepin+3
```

Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.

.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?
Good suggestion but that can come later.

DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.

DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.

DIRx/DRVx/FLTx etc are in the same category.

Should we use .. or : ???
My Prop boards:
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
• Posts: 11,711
Cluso99 wrote: »
jmg wrote: »
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
```	dirh	#vga_basepin + 3 << 6
```
wjich replaces the current method of
```	dirh	#vga_basepin
dirh	#vga_basepin+1
dirh	#vga_basepin+2
dirh	#vga_basepin+3
```

Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.

.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?
Good suggestion but that can come later.

DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.

DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.

DIRx/DRVx/FLTx etc are in the same category.

Should we use .. or : ???

Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.

For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.
• Posts: 23
I, for one could use .. without pulling out the manual. Very clear what will happen.
• Posts: 15,405
cgracey wrote: »
Cluso99 wrote: »
jmg wrote: »
cgracey wrote: »
ozpropdev wrote: »
In the P2 silicon respin (v33 fpga image) a range of pins can be changed like this
```	dirh	#vga_basepin + 3 << 6
```
wjich replaces the current method of
```	dirh	#vga_basepin
dirh	#vga_basepin+1
dirh	#vga_basepin+2
dirh	#vga_basepin+3
```

Right. You can also use SETQ to override the bits above the pin/bit number:

SETQ #7
DIRH #8

That would make pins 8..15 high outputs.

.. all of which is quite cryptic to a new user...

Could the tools support (for example) something like

DIRH [15..8]

as a single line, with much clearer user intent ?
Good suggestion but that can come later.

DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.

DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.

DIRx/DRVx/FLTx etc are in the same category.

Should we use .. or : ???

Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.

For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.

I knew there were others. The BITx ops are a great addition too

What do the C and Z do with multiple bits in these instructions?

This is really going to speed up and minimise code, including the spin interpreter!
My Prop boards:
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
• Posts: 7,904
edited 2019-07-17 - 14:08:52
Chip,
I've had another look at my attempt to match up understanding of the instruction pipeline. Last time I presented it it didn't work for you, so I've rotated my view by one clock and added some more detail. I think this will fit reality this time. Top diagram is your one, bottom diagram is mine lined up showing what I feel is clarifying the same stages. I'm keen to get some feedback again.
```-------------------------------
Instruction Pipeline Timing
-------------------------------

|                   |                   |                   |                   |                   |                   |
rdRAM Ib|------+            |           rdRAM Ic|------+            |           rdRAM Id|------+            |           rdRAM Ie|
|      |            |                   |      |            |                   |      |            |                   |
latch Da|--+   +--> rdRAM Db|---------> latch Db|--+   +--> rdRAM Dc|---------> latch Dc|--+   +--> rdRAM Dd|---------> latch Dd|
latch Sa|--+   +--> rdRAM Sb|---------> latch Sb|--+   +--> rdRAM Sc|---------> latch Sc|--+   +--> rdRAM Sd|---------> latch Sd|
latch Ia|--+   +--> latch Ib|---------> latch Ib|--+   +--> latch Ic|---------> latch Ic|--+   +--> latch Id|---------> latch Id|
|  |                |                   |  |                |                   |  |                |                   |
|  +---------------ALU--------> wrRAM Ra|  +---------------ALU--------> wrRAM Rb|  +---------------ALU--------> wrRAM Rc|
|                   |                   |                   |                   |                   |                   |
|                   |stall/done = 'gox' |                   |stall/done = 'gox' |                   |stall/done = 'gox' |
|       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |

clk     _________           _________           _________           _________           _________           _________
________|    1    |_________|    2    |_________|    3    |_________|    4    |_________|    5    |_________|    6    |_________|

PCflux |............==c==..|...................|............==d==..|...................|............==e==..|...................|
PClatc |...................|=c=................|...................|=d=................|...................|=e=................|
Ifetch |...................|...======c=======..|...................|...======d=======..|...................|...======e=======..|
Ilatch |=b=................|...................|=c=................|...................|=d=................|...................|
Odecod |...=b=.............|...................|...=c=.............|...................|...=d=.............|...................|
SDfetc |......=====b=====..|...................|......=====c=====..|...................|......=====d=====..|...................|
SDlatc |...................|=b=................|...................|=c=................|...................|=d=................|
ALUs1  |...................|...=======b======..|...................|...=======c======..|...................|...=======d======..|
ALUs2  |========a========..|...................|========b========..|...................|========c========..|...................|
Rwrite |...................|====a====..........|...................|====b====..........|...................|====c====..........|

PCflux - New value for Program Counter.
Ifetch - Instruction fetching from CogRAM, LUT or FIFO.
Odecod - Opcode decode for S/D operands.
SDfetc - S and D parallel fetches, from CogRAM, if instruction requires.
ALU   - Execute, two stages, s1, s2.
Rwrite - Result write back to D destination, to CogRAM, if required.

```
EDIT: Added PClatc, improve description of PCflux, and merge ALU descriptions.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 11,711
Evanh, that drawing hurts my head. It might be right. I will look at it more later. I need some sleep before I can figure it out.
• Posts: 7,904
Okay, no problem.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 7,904
Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.

Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 11,711
evanh wrote: »
Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.

Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.
evanh wrote: »
Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.

Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.

Every bit storage element is a register, not a latch. So, visualize flip-flops.
• Posts: 7,904
Another attempt. I've paralleled the prior stage latching with the associated fetches. I've never seen a pipeline schematic so it's not something I've considered before. Presumably that's called forwarding. So it occurs at all three registered stages.

I've included both get/go phase variants to choose from:
```-------------------------------
Instruction Pipeline Timing
-------------------------------

|                   |                   |                   |                   |                   |                   |
rdRAM Ib|------+            |           rdRAM Ic|------+            |           rdRAM Id|------+            |           rdRAM Ie|
|      |            |                   |      |            |                   |      |            |                   |
latch Da|--+   +--> rdRAM Db|---------> latch Db|--+   +--> rdRAM Dc|---------> latch Dc|--+   +--> rdRAM Dd|---------> latch Dd|
latch Sa|--+   +--> rdRAM Sb|---------> latch Sb|--+   +--> rdRAM Sc|---------> latch Sc|--+   +--> rdRAM Sd|---------> latch Sd|
latch Ia|--+   +--> latch Ib|---------> latch Ib|--+   +--> latch Ic|---------> latch Ic|--+   +--> latch Id|---------> latch Id|
|  |                |                   |  |                |                   |  |                |                   |
|  +---------------ALU--------> wrRAM Ra|  +---------------ALU--------> wrRAM Rb|  +---------------ALU--------> wrRAM Rc|
|                   |                   |                   |                   |                   |                   |
|                   |stall/done = 'gox' |                   |stall/done = 'gox' |                   |stall/done = 'gox' |
|       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |       'get'       |      done = 'go'  |

clk     _________           _________           _________           _________           _________           _________
________|    1    |_________|    2    |_________|    3    |_________|    4    |_________|    5    |_________|    6    |_________|

PCflux |...................|...======c======...|...................|...======d======...|...................|...======e======...|
PClatc |=b=................|...................|=c=................|...................|=d=................|...................|
Ifetch |====b====..........|...................|====c====..........|...................|====d====..........|...................|
Odecod |.........===b===...|...................|.........===c===...|...................|.........===d===...|...................|
Ilatch |...................|=b=................|...................|=c=................|...................|=d=................|
SDfetc |...................|====b====..........|...................|====c====..........|...................|====d====..........|
Fdecod |...................|.........===b===...|...................|.........===c===...|...................|.........===d===...|
SDlatc |=a=................|...................|=b=................|...................|=c=................|...................|
ALUs1  |========a=======...|...................|========b=======...|...................|========c=======...|...................|
ALUs2  |...................|========a=======...|...................|========b=======...|...................|========c=======...|
Rwrite |===................|...................|=a=................|...................|=b=................|...................|

PCflux |...======c======...|...................|...======d======...|...................|...======e======...|...................|
PClatc |...................|=c=................|...................|=d=................|...................|=e=................|
Ifetch |...................|====c====..........|...................|====d====..........|...................|====e====..........|
Odecod |...................|.........===c===...|...................|.........===d===...|...................|.........===e===...|
Ilatch |=b=................|...................|=c=................|...................|=d=................|...................|
SDfetc |====b====..........|...................|====c====..........|...................|====d====..........|...................|
Fdecod |.........===b===...|...................|.........===c===...|...................|.........===d===...|...................|
SDlatc |...................|=b=................|...................|=c=................|...................|=d=................|
ALUs1  |...................|========b=======...|...................|========c=======...|...................|========d=======...|
ALUs2  |========a=======...|...................|========b=======...|...................|========c=======...|...................|
Rwrite |...................|=a=................|...................|=b=................|...................|=c=................|

PCflux - Setting up of new value for Program Counter.
Ifetch - Instruction fetching from CogRAM, LUT or FIFO.
Odecod - Opcode decode for S/D operands.
Fdecod - Fully decode the instruction.  Probably mostly fan-out to feed selected logic block.
SDfetc - S and D parallel fetches, from CogRAM, if instruction requires.
ALU   - Execute, two stages, s1, s2.
Rwrite - Result write back to D destination, to CogRAM, if required.

```
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 11,711
Evanh, I think what's throwing me off is the variably-placed width-varying ===x=== markers. These paths take different times, of course, but they all must resolve before the clock edge. So, the edge is the only event that matters. It looks like you would like to know when the PC updates. I could add that to my drawing.
• Posts: 7,904
edited 2019-07-19 - 23:11:29
Those propagation widths, while not necessarily accurate sizes, are quite important to my attempt. For example, I've made inline register clocking have a three character, =x=, propagation time. Also, note the gap at the end of each clock interval, before the new clock edge. That indicates spare overclocking overhead leeway.
"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 7,904
I've made the propagation for cogRAM fetching a lot longer than the latches as a guess to account for the level of mux'ing.

As for my "forwarding", I did that to handle the fact that to use synchronous registers for cogRAM requires setup before the clock rise. It can't wait for the propagation of an addressing latch.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."
• Posts: 11,711
edited 2019-07-20 - 01:16:12
Evanh, here's the spoiler: the ASIC tools optimize the logic cell placement and routing only to meet the timing goal, so that signals with plenty of slack get routed around the hot spots, loosing their slack, while the signals needing speed get the prime placement and shortest routes. In the end, hundreds of thousands of paths are stacked against the timing wall, forming a cliff, where the chip fails systemically if the clock period becomes too short. So, while in theory some things take less time than others, the implementation is a blob of nearly identically-timed paths that affords no possibility of speed-up via selective clock cycle shortening. When you hit the speed limit, everything fails at once.
• Posts: 2,655
@cgracey aren't the new chips due soon?