Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

ozpropdev · 2016-05-22 09:43

Chip
Any chance of some documentation on the pixel instructions?

CCCC 1011011 00I DDDDDDDDD SSSSSSSSS        ADDPIX  D,S/#
CCCC 1011011 01I DDDDDDDDD SSSSSSSSS        MULPIX  D,S/#
CCCC 1011011 10I DDDDDDDDD SSSSSSSSS        BLNPIX  D,S/#
CCCC 1011011 11I DDDDDDDDD SSSSSSSSS        MIXPIX  D,S/#

cgracey · 2016-05-22 20:25

I will add that to my doc list. There's not much to cover there, anyway.

Rayman · 2016-05-22 21:03

Looking at this: To facilitate handshaking between cogs sharing lookup RAM, the SETRDL/SETWRL instructions can be used to set up lookup RAM read and write events.

Wondering how that is going to work...
I think those instructions work with the special registers at end of RAM.
Are they changed to work with LUT too?

I guess I should wait until it's done to ask questions about it...

cgracey · 2016-05-22 22:04

Rayman wrote: »

Looking at this: To facilitate handshaking between cogs sharing lookup RAM, the SETRDL/SETWRL instructions can be used to set up lookup RAM read and write events.

Wondering how that is going to work...
I think those instructions work with the special registers at end of RAM.
Are they changed to work with LUT too?

I guess I should wait until it's done to ask questions about it...

Same old names, but totally new meanings. The old hub RAM events are gone, replaced by LUT events.

Rayman · 2016-05-22 22:08

So are those special registers at top of RAM gone too?

I don't think I'd miss them...

cgracey · 2016-05-22 22:52

Rayman wrote: »

So are those special registers at top of RAM gone too?

I don't think I'd miss them...

At the very end of RAM are the debug interrupt hooks. Right below them used to be those read/write event longs - they are gone now.

Tubular · 2016-05-22 23:49

cgracey wrote: »

I will add that to my doc list. There's not much to cover there, anyway.

Similar with the REP instructions, Chip. It'd be good to add to the list.

Nothing urgent, but they're not there at the moment.

Rayman · 2016-05-23 00:28

Be nice to hide the debug hooks...
Maybe the MSbit in address can be used to indicate a debug hook?

Fine like it is, just contemplating...

Ale · 2016-05-23 09:47

I have a question: on the FPGA images, are the no-smart pins there or are they completely disabled ?, say you only get 8 PINs, smart PINs, period ?

jmg · 2016-05-23 09:50

Ale wrote: »

I have a question: on the FPGA images, are the no-smart pins there or are they completely disabled ?, say you only get 8 PINs, smart PINs, period ?

My understanding is the Smart-Pin cell clips-onto the standard pin, so the fallback would be a standard pin cell, not no pin at all.
Besides, for testing it is best to not drop pins too, keep what is removed simple.
Certainly Streamer testing is going to need many pins available.

Rayman · 2016-05-23 15:57

Hey, I can watch Chip type in the document. That's kinda funny...

mindrobots · 2016-05-23 16:32

Yes, we gather together and have watch parties and play drinking games based on what he types next......it makes more sense than watching golf on TV!!

cgracey · 2016-05-23 16:36

Rayman wrote: »

Hey, I can watch Chip type in the document. That's kinda funny...

It's a fitful, herky-jerky process that probably wouldn't inspire any confidence, if witnessed.

cgracey · 2016-05-24 07:34

I think I got the v9 docs finished, for now. I covered the new 1/2/4-bit streamer modes and smart pin modes today. I'm looking at the PIX instructions to document them, but I can't make sense of it, at the moment. I thought I had made some minor change to the USB smart pin mode, but I don't see it. It was something to help Rayman.

cgracey · 2016-05-24 07:38

Ah, I see the USB change... bit 5 of status (EOP sensor) now clears on 7+ bit clocks of J or K, in addition to SOP or reset.

cgracey · 2016-05-24 07:46

I just added the same rule to bit 4 of the USB status (SOP sensor). It now clears on 7+ bit clocks of J or K.

Rayman · 2016-05-24 09:59

How do you envision the right way to set the time between USB outgoing packets ?

Looks like I need a spacing of 2 to 6 usb clocks of J between packets.

cgracey · 2016-05-24 14:57

Rayman wrote: »

How do you envision the right way to set the time between USB outgoing packets ?

Looks like I need a spacing of 2 to 6 usb clocks of J between packets.

I haven't had time, yet, to make an actual spacer instruction. Each time you write out a state command like J (IDLE), that takes one bit clock to execute. For low-speed, you could do 2 to 6 of those conmands. For full-speed, I need to make an instruction.

Rayman · 2016-05-24 15:20

Actually, looking back, I see that I was doing it wrong by waiting for idle, which meant 7 or 8 bits of idle.
What I need to do is wait for EOP and then send a couple idles.
Then, I'll have the spacing I want.

So, maybe it's OK as is.

Bill Henning · 2016-05-24 15:39

Hi Chip,

Sorry, I have not followed a lot of the discussion due to a lack of time.

Can the streamer modes (1/2/4/8/16/32 bit) to/from pins output a strobe signal for output, and be externally clocked for input?

If so, what is the maximum output clock rate that will work, and the maximum strobed input data rate?

Thanks,

Bill

cgracey wrote: »

I think I got the v9 docs finished, for now. I covered the new 1/2/4-bit streamer modes and smart pin modes today. I'm looking at the PIX instructions to document them, but I can't make sense of it, at the moment. I thought I had made some minor change to the USB smart pin mode, but I don't see it. It was something to help Rayman.

cgracey · 2016-05-24 17:01

Bill Henning wrote: »

Hi Chip,

Sorry, I have not followed a lot of the discussion due to a lack of time.

Can the streamer modes (1/2/4/8/16/32 bit) to/from pins output a strobe signal for output, and be externally clocked for input?

If so, what is the maximum output clock rate that will work, and the maximum strobed input data rate?

Thanks,

Bill

cgracey wrote: »

I think I got the v9 docs finished, for now. I covered the new 1/2/4-bit streamer modes and smart pin modes today. I'm looking at the PIX instructions to document them, but I can't make sense of it, at the moment. I thought I had made some minor change to the USB smart pin mode, but I don't see it. It was something to help Rayman.

Right now, the ports can be updated/read on every clock, but there is no way to get the system clock out on a pin.

So, you can output a signal of clk/2 and you can have the streamer transact on every clock (DDR) or every other clock (edge synchronous), or any lower frequency.

jmg · 2016-05-24 20:43

cgracey wrote: »

Right now, the ports can be updated/read on every clock, but there is no way to get the system clock out on a pin.

So, you can output a signal of clk/2 and you can have the streamer transact on every clock (DDR) or every other clock (edge synchronous), or any lower frequency.

This sounds close - can you give some code examples, of how you Start the Streamer, then the CLKout, to maintain clock phase and counts ?

I think a 74AUP1G57 (XNOR+RC) can regenerate a 2x Clock from a DDR signal, if a System-Clock out proves impossible.

Tubular · 2016-05-24 20:55

cgracey wrote: »

Right now, the ports can be updated/read on every clock, but there is no way to get the system clock out on a pin.

So, you can output a signal of clk/2 and you can have the streamer transact on every clock (DDR) or every other clock (edge synchronous), or any lower frequency.

Yes DDR makes a big difference here and reduces my claim.

If a clock enable (a pin that's high for full cycle if streamer has fresh data) was possible, that would still future proof things somewhat, because we could externally And it with a system clock.

But we can do a lot with what we have, using DDR

jmg · 2016-05-24 21:29

cgracey wrote: »

So, you can output a signal of clk/2 and you can have the streamer transact on every clock (DDR) or every other clock (edge synchronous), or any lower frequency.

Thinking some more about Start/Phase issues, can the Streamer use the same START mechanism you have for multiple Smart-Pins ?

That way, you could configure a smart pin as SysCLK/2, complete with preset phase,
(IIRC P2 NCO now has Phase ? ) then configure Streamer and then tell both to GO! on the same SysCLK ?

Chip selects can precede the CLK & Data, so only CLK.Data need the careful sync.phase.
If someone wanted CLK, !CLK, Data, I think that is possible with 2 smart pins ?

cgracey · 2016-05-25 00:08

jmg wrote: »

cgracey wrote: »

So, you can output a signal of clk/2 and you can have the streamer transact on every clock (DDR) or every other clock (edge synchronous), or any lower frequency.

Thinking some more about Start/Phase issues, can the Streamer use the same START mechanism you have for multiple Smart-Pins ?

That way, you could configure a smart pin as SysCLK/2, complete with preset phase,
(IIRC P2 NCO now has Phase ? ) then configure Streamer and then tell both to GO! on the same SysCLK ?

Chip selects can precede the CLK & Data, so only CLK.Data need the careful sync.phase.
If someone wanted CLK, !CLK, Data, I think that is possible with 2 smart pins ?

You can feed the streamer an instruction to effectively stall some number of clocks, then feed it another to take care of business when the first instruction finishes. Then, you are free to orchestrate some smart pin activity that will coincide with the streamer business.

jmg · 2016-05-25 00:32

cgracey wrote: »

You can feed the streamer an instruction to effectively stall some number of clocks, then feed it another to take care of business when the first instruction finishes. Then, you are free to orchestrate some smart pin activity that will coincide with the streamer business.

OK, sounds like it needs some one-off tuning, then it could work.

Code would issue a Delay_Streamer, then a Queue_Start_Streamer then a Start_Clock

One LCD spec I looked at, with simplest one-use HyperRAM, would need 429792 Clocks per frame.
The read command is the leading 48 bits / 6 bytes, after that, the HyperRAM simply runs for 429792-6 Clocks
I think you only need to repeat the read pass, inside 64ms, to meet refresh.
Read code looks to be very compact here ?

Also, during the read time, the COG is free to collect the next write blocks, read for fastest delivery in the write window.

cgracey · 2016-05-25 07:42

I'm sorry it's been taking me so long to get this update out.

The problem has been long compile times on the A9 and bad Fmax results.

I noticed the LUT write sharing was creating a critical path because I was doing a magnitude comparison on what was potentially the late-arriving LUT exec address. I got rid of the possibility of the LUT instruction fetch triggering a 'read LUT' event, so Fmax should go back up.

It's still taking over 1.5 hours to compile, though, which makes things very tedious.

At this point, I must break and meet with Treehouse, OnSemi, and pjv, and I'll be unable to work on this again until Friday. So, no update for a few more days. I'm sorry about this. I was really hoping I could get this done before the this meeting.

Ale · 2016-05-25 08:44

Hei Chip, what kind of machine do you use to compile ? I think Quartus would only benefit from an 1-Core 200 GHz processor, parallel compilation seems to be a very hard nut.
Maybe you could have a farm of 8 or so machines with SSDs and loads of RAM and launch them in parallel

Tubular · 2016-05-25 10:41

Its a pity you can't give them a list of seeds to try and batch compile over a few days while you're in meetings

Thanks for all your efforts Chip.

Rayman · 2016-05-25 10:50

Hope it compiles.

If not,
Maybe disable LUT exec when LUT sharing is enabled?

Or, shelve the whole LUT sharing thing. Rather have faster chips than LUT sharing.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments