Big update for DE2-115 and DE0-Nano users w/add-on boards

jmg · 2013-10-05 00:19

Cluso99 wrote: »

But we only have so much silicon.
And in your example, that is precisely what cogs are for..
* One cog does the Period/Frequency/Phase measurement, plus optionally conversion/smoothing/feedback
* One cog does the transmission.

This example reinforces my point.

Yes, silicon is finite, and yet your removal of Baud Rate option has now forced the user to consume a whole extra COG. (?!)
Chip already has Baud rate control in what he has designed. The Timers are also designed.

I reiterate, by all means give users a choice of Timers as BRG, (if there is ready control register room) but do not impose that on them.

You have proven how wasteful that decision is.
Smarter silicon choices, allows users to choose threads, instead of wasting a whole COG.

jmg · 2013-10-05 00:28

evanh wrote: »

What's interesting is this makes the UARTs transmit register/buffer and shifter available for a second SPI port. I guess that makes the extra config worth it.
So, for every UART there is actually two SPIs.

It is not quite a free lunch : 32 D-FF does not quite make another SPI.
You need to have queue registers for Tx and Rx, plus some added memory pathway means to access those, plus more registers to config the 'bonus' SPI channel
The 32 D-FF is a small part of the mix, and a very low silicon area at that.

I've not seen any small micros morph a UART into two SPI, but quite a few offer SPI or UART configs.

evanh · 2013-10-05 00:46

jmg wrote: »

It is not quite a free lunch : 32 D-FF does not quite make another SPI.
You need to have queue registers for Tx and Rx, plus some added memory pathway means to access those, plus more registers to config the 'bonus' SPI channel
The 32 D-FF is a small part of the mix, and a very low silicon area at that.

I've not seen any small micros morph a UART into two SPI, but quite a few offer SPI or UART configs.

That's pretty much what I was saying, that there is a number of changes and extra config. But it wasn't hard to see a distinct level of separation occurring between the two shifters because of all the config options.

As for the rx and tx buffering, I'm thinking a little bit of synchronous trickery could even use the same buffer for both. When the two shifters are together they have one buffer for tx and another one for rx. Important for async clocking. And can be used this way for high performance SPIs and other synchronous serial ports. But when config'd for simple SPI: Since the incoming and outgoing data are in lock-step there is only a need for a single buffer register that can take both the tx and rx data. Making for no extra memory mapping/bussing at all. I think it would be easy to spit the two shifter into two independent ports.

cgracey · 2013-10-05 01:04

Wow! You guys have really delved into what can/ought to be done for synchronous serial.

I've been stuck trying to get reliable SDRAM timing on the Cyclone FPGA. There are hold-time problems that are very difficult to overcome. Altera's documentation seems quite sparse (despite it hugeness) on the matter of outputting clocks for synchronous use with internal logic. I have to clear the SDRAM problems out of the way so that I can move forward with SDRAM as part of my test bed. I got the new RDAUX and WRAUX instructions done, but want to get the SDRAM working reliably before I debug them. I decided to rename STACK RAM to AUXILLIARY RAM, or AUX for short. The term "stack" is too limiting and it's misleading.

Anyway, I realize the way to do P3 might be to build a generic big-FPGA board with an FPGA that is supported by Quartus Web Edition (no $3k/year cost for the tools), so that everybody can buy one and compile on it. We keep the base processor code open and current, so that you guys could compile it yourselves, tweaking it and adding anything like synchronous serial on your own, to prove things. You'd all become versed in Verilog and hardware development, and we could get things refined in parallel. I guess I'd be the final arbiter of what goes in, but I think what belongs in would be readily apparent to most all involved, without much contention. The best solutions, in my experience, are usually the simplest (like Tubular's leading-%00 idea to determine task loop size), and when they pop up, they're obvious to everyone and the arguments are over. Then, the chip we fab will be quite refined and well-rounded. That would be a first in the chip-design world.

ozpropdev · 2013-10-05 01:23

cgracey wrote: »

Anyway, I realize the way to do P3 might be to build a generic big-FPGA board with an FPGA that is supported by Quartus Web Edition (no $3k/year cost for the tools), so that everybody can buy one and compile on it. We keep the base processor code open and current, so that you guys could compile it yourselves, tweaking it and adding anything like synchronous serial on your own, to prove things. You'd all become versed in Verilog and hardware development, and we could get things refined in parallel. I guess I'd be the final arbiter of what goes in, but I think what belongs in would be readily apparent to most all involved, without much contention. The best solutions, in my experience, are usually the simplest (like Tubular's leading-%00 idea to determine task loop size), and when they pop up, they're obvious to everyone and the arguments are over. Then, the chip we fab will be quite refined and well-rounded. That would be a first in the chip-design world.

A pioneering concept!

Heater. · 2013-10-05 01:29

An open source P3 would be a first in the chip design world. Would it be a money making, or at least self sustainable proposition?

Chip designs are normally closed so that competitors don't start mass producing you idea and eating your lunch. I wonder if that would really happen though.

Ideally copyright law would prevent that happening. After all books are published and they are readable by anyone.

On the other hand an open design that is copied by others can actually grow the market.

It would certainly be a hit with all those who clamour for open hardware and buy Arduino's because they think that it gives them that.

Can we really get my 48 or 64 bit COG with megabytes of COG registers into an FPGA?:)

Cluso99 · 2013-10-05 01:42

Chip said...
Wow! You guys have really delved into what can/ought to be done for synchronous serial.

Certainly have!

I've been stuck trying to get reliable SDRAM timing on the Cyclone FPGA. There are hold-time problems that are very difficult to overcome. Altera's documentation seems quite sparse (despite it hugeness) on the matter of outputting clocks for synchronous use with internal logic. I have to clear the SDRAM problems out of the way so that I can move forward with SDRAM as part of my test bed. I got the new RDAUX and WRAUX instructions done, but want to get the SDRAM working reliably before I debug them. I decided to rename STACK RAM to AUXILLIARY RAM, or AUX for short. The term "stack" is too limiting and it's misleading.

You have been busy!

Anyway, I realize the way to do P3 might be to build a generic big-FPGA board with an FPGA that is supported by Quartus Web Edition (no $3k/year cost for the tools), so that everybody can buy one and compile on it. We keep the base processor code open and current, so that you guys could compile it yourselves, tweaking it and adding anything like synchronous serial on your own, to prove things. You'd all become versed in Verilog and hardware development, and we could get things refined in parallel. I guess I'd be the final arbiter of what goes in, but I think what belongs in would be readily apparent to most all involved, without much contention. The best solutions, in my experience, are usually the simplest (like Tubular's leading- ..00 idea to determine task loop size), and when they pop up, they're obvious to everyone and the arguments are over. Then, the chip we fab will be quite refined and well-rounded. That would be a first in the chip-design world.

This would be fantastic but you would need to keep hold of the cpu section - you need some IP to protect Parallax!

There is a 300K LE Cyclone V that is about $200. What a project

Chip,
Are the P92-P95 internally connected like P96-P127 ?
Do you have time to scribble a quick-and-dirty block diagram of the counters?

Heater. · 2013-10-05 02:10

Cluso,

...you need some IP to protect Parallax!

That is the traditional view of things in the chip world but is it necessarily true?

It would seem to be impossible to make money by giving away your IP but this is Parallaxia where the impossible seems to happen quite regularly:)

Consider:

1) An open CPU design would be massively popular just by virtue of it being open. You may have noticed the surge in demand for open hardware in recent years. This is what gave the Arduino it's popularity even if it's not the chip that is open only the board design and software.

2) That demand grows the market. I could imagine a thousand times more Propellers in use if it were open than will ever be shipped in it's traditional closed form.

3) That market will attract copycat's for sure but so what? Parallax might end up only shipping 10% of the supply, But that's 10% of a supply into a market that is a thosand times bigger than it would have been otherwise. They would do 100 times more business!

4) And here is the kicker. What about all that I/O and analog circuitry? As it stands now we can play with the COG design on FPGA but we don't get all that peripheral stuff. One could imagine that the source for the FPGA configuration of COGS is open and that is still not enough for a competitor to build a Propeller with out a lot of effort and expense.

Does any of that make sense? No idea.

Is Parallax brave enough to try it?

jmg · 2013-10-05 02:19

evanh wrote: »

As for the rx and tx buffering, I'm thinking a little bit of synchronous trickery could even use the same buffer for both. When the two shifters are together they have one buffer for tx and another one for rx. Important for async clocking. And can be used this way for high performance SPIs and other synchronous serial ports. But when config'd for simple SPI: Since the incoming and outgoing data are in lock-step there is only a need for a single buffer register that can take both the tx and rx data. Making for no extra memory mapping/bussing at all. I think it would be easy to spit the two shifter into two independent ports.

So you are actually meaning three SPIs ?
- One that is UART OR high performance SPI, and then a choice to split into two 'simple SPIs'.
The simple SPIs would lose gapless-send/receive, but it would still need two memory locations for Data and two for Config, as you have spawned two com ports.
A Mux bit to select could be a pain across threads, best avoided.
Technically feasible ? yes, I guess, - but it starts to get confusing to the average user to grasp TWO distinct SPIs, and their caveats. So could prove a marketing mistake.

Better to focus on doing Async & one Sync option properly in one block first, and then explore just how much silicon cost there was, to actually doing the split SPI properly. (ie remove the caveats).

Less variables means easier testing and less user confusion.

Cluso99 · 2013-10-05 04:17

heater: I would currently think that releasing the cog cpu cores and hub interface as an IP would currently be the safest way for Parallax to start with.

Certainly with the current discussion, the Verilog code for the UART and perhaps the counters would be great to see, and then what changes could be done. Unfortunately the timescales would be against us to dig deep enough.

Just waiting for some more comments on the serial/shift register ideas. If I was more proficient in Verilog I would knock up some code to try on my DE0 (or my Spartan3A).

Meanwhile I am just trying to get reading USB FS working in 1 cog on the DE0.

Heater. · 2013-10-05 04:42

Cluso, yes, just now is not a good time to release source for the PII.

I was just suggesting it might be something the think about when kicking off the PII if that ever comes to pass.

evanh · 2013-10-05 04:48

jmg wrote: »

So you are actually meaning three SPIs ?
- One that is UART OR high performance SPI, and then a choice to split into two 'simple SPIs'.

Just the two shift registers. Either they are used together or apart.

The simple SPIs would lose gapless-send/receive, but it would still need two memory locations for Data and two for Config, as you have spawned two com ports.

They would keep that too. The single buffer per shifter can be read for rx data and written again for next tx data between frame strobes, or frame length triggered. On the strobe/trigger the buffer content is swapped with the shift register content.

SPI slave mode frame strobing itself is another issue which is true with or without the dual mechanism.

Less variables means easier testing and less user confusion.

The whole idea I just extrapolated out from where Cluso was already leading. The nature of automated hardware is it does require lots of config registers or it becomes very rigid.

evanh · 2013-10-05 05:02

In a UART situation the sending and receiving clocks and framing are not locked to each other at all. Each end has it's own clock generator and it's own initiation of sending a frame. So, having two shifters, each with it's own shift register and buffer, is needed to manage this.

With there already being two shift registers in each UART and since we are wanting to provide SPI support, including slave mode and the related chaining ... and chaining needs to have it's receiving shift register piped back to the tx line ... This provided the reasoning for my line of thought - With one shifter doing all the work now, what to do with the second one? Why turn it into another SPI of course.

evanh · 2013-10-05 05:37

Actually, the gapless part might not be applicable. And the related buffer swap could be problematic if an SPI read command is expecting a result from the Prop with a single strobe.

I guess that's something to iron out. Do we want to conform to a particular arbitration mechanism in slave mode or make our own arbitration?

Sapieha · 2013-10-05 06:15

Hi Chip.

I'm lazy so I reintroduce only my old idea to You on how I will see SER ports will look...

In this post I posted one PDF file with SER idea.

http://forums.parallax.com/showthread.php/126928-Propeller-II-(Chat-with-CHIP)-__-Some-extra-questions-will-come?p=951899&viewfull=1#post951899

And in this thread some functions in Byte MOVE for nice handle of bytes in long's
http://forums.parallax.com/showthread.php/117940-MOV-Reverse-Ideas-to-Chip-Gracey-%28Parallax%29?highlight=Sapieha

cgracey wrote: »

Wow! You guys have really delved into what can/ought to be done for synchronous serial.

I've been stuck trying to get reliable SDRAM timing on the Cyclone FPGA. There are hold-time problems that are very difficult to overcome. Altera's documentation seems quite sparse (despite it hugeness) on the matter of outputting clocks for synchronous use with internal logic. I have to clear the SDRAM problems out of the way so that I can move forward with SDRAM as part of my test bed. I got the new RDAUX and WRAUX instructions done, but want to get the SDRAM working reliably before I debug them. I decided to rename STACK RAM to AUXILLIARY RAM, or AUX for short. The term "stack" is too limiting and it's misleading.

Anyway, I realize the way to do P3 might be to build a generic big-FPGA board with an FPGA that is supported by Quartus Web Edition (no $3k/year cost for the tools), so that everybody can buy one and compile on it. We keep the base processor code open and current, so that you guys could compile it yourselves, tweaking it and adding anything like synchronous serial on your own, to prove things. You'd all become versed in Verilog and hardware development, and we could get things refined in parallel. I guess I'd be the final arbiter of what goes in, but I think what belongs in would be readily apparent to most all involved, without much contention. The best solutions, in my experience, are usually the simplest (like Tubular's leading- idea to determine task loop size), and when they pop up, they're obvious to everyone and the arguments are over. Then, the chip we fab will be quite refined and well-rounded. That would be a first in the chip-design world.

Seairth · 2013-10-05 07:02

Heater. wrote: »

An open source P3 would be a first in the chip design world. Would it be a money making, or at least self sustainable proposition?

Chip designs are normally closed so that competitors don't start mass producing you idea and eating your lunch. I wonder if that would really happen though.

Ideally copyright law would prevent that happening. After all books are published and they are readable by anyone.

On the other hand an open design that is copied by others can actually grow the market.

It would certainly be a hit with all those who clamour for open hardware and buy Arduino's because they think that it gives them that.

Can we really get my 48 or 64 bit COG with megabytes of COG registers into an FPGA?:)

Don't forget the importance of the trademark. Everyone will know Parallax and Propeller. And no competitor can copy that!

jmg · 2013-10-05 13:19

evanh wrote: »

In a UART
With there already being two shift registers in each UART and since we are wanting to provide SPI support, including slave mode and the related chaining ... and chaining needs to have it's receiving shift register piped back to the tx line ... This provided the reasoning for my line of thought - With one shifter doing all the work now, what to do with the second one? Why turn it into another SPI of course.

I can see the idea has some merit, but you originally mentioned two SPI models - I think that is just too confusing for users.
Your post : When the two shifters are together they have one buffer for tx and another one for rx. Important for async clocking. And can be used this way for high performance SPIs and other synchronous serial ports. But when config'd for simple SPI:

I think if you do split, you just do it properly, and avoid the confusion of 2 variants. [ie your 2 simple SPIs, become high performance]

I would explore this as two designs in parallel, to see just how much real cost the split SPI imposed.
Easily done with conditional code switches, and easy to remove late, if the impact bites.

Important to avoid would be the possible compromise of just one UART per COG, as that kills many bridge applications.
Single COG encapsulation is important for OBEX deployment. (and marketing)

There is some mappable (shareable) resource, like two baud generators on Async can become one-each on SPI, and separate totally free Pin allocate on Tx Rx (if there) can become 'pin leader' map for each SPI.
the other 2 (3?) SPI pins would become 'pin followers' as was mentioned above.
Bit Frame counters for Rx and Tx become one-each for SPI, and read-back follows status of each.

A minor added complication is that some SPI designs I've checked, use SS pin as a hardware feature in slave mode.
This SS pin (IP) drives OE on the Slave Data Out pin, and gates the Clock - minimal HW but it is HW that needs mapping.

Just checking on I2S, I see that uses a frame signal to alternate switch L/R, and thus some small care on that SS control, could allow I2S steer into 2 L/R SPI channels.

SS just needs a polarity control, and the dual SPI needs to overlay pins legally. (which I think is automatic?)

I2S can hit >12MHz, so this is best done in hardware. (again following the HW does bit level stuff, SW does word level stuff rule)
This simple HW drops the SW service rate down to Frame speeds (48..192KHz)

Checking AK4388A data, there is a 40ns window for LRCK wrt master generate, at top speed, so SS pin(master case) could benefit from a single added Flipflop, to sync with TXSR loads.(HW does bit level, SW does word level)

I cannot think of any SPI/I2S case where SS changes are not bit-frame aligned, and variable bit frame allows the edge to go anywhere.

evanh · 2013-10-05 15:02

jmg wrote: »

I can see the idea has some merit, but you originally mentioned two SPI models - I think that is just too confusing for users.
Your post : When the two shifters are together they have one buffer for tx and another one for rx. Important for async clocking. And can be used this way for high performance SPIs and other synchronous serial ports. But when config'd for simple SPI:

It wasn't specific models of SPI but rather stating that SPI/I2S peripherals can hang off either configuration of the shift registers. There is no rule that says SPI products have to be connected via a singular chained shifter. In fact I think you'll find most uCs do have separate shift registers for tx and rx respectively.

A minor added complication is that some SPI designs I've checked, use SS pin as a hardware feature in slave mode.
This SS pin (IP) drives OE on the Slave Data Out pin, and gates the Clock - minimal HW but it is HW that needs mapping.

That's the strobe I refered to. Chip(Slave) Select is another termed use for the same thing. This and the external clock are both added extras that any SPI implementation demands.

Cluso99 · 2013-10-05 15:20

jmg: I know you mean I2C for I2S. I have those senior moments too

I agree, whether 2 or 4 shifters, they should be identical. I would prefer 4 which would normally be 2 transmitters and 2 receivers. Perhaps they could not only be daisy-chained, but also paralleled (usable to quad SPI), or 2 pairs shadowed to be used as a single level buffered tx and rx pair.

Remember, each pin needs to be set (tx, rx, clk), not a grouping as then sometimes we won't be able to use the pin groups of the differential pairing. (We don't always follow the 4 pin set for SD cards)

The silicon for the configuration and read/write most likely takes far more silicon than the actual shift registers. Let's try and make the shift registers identical and as generic as possible - ie many simple features set by config. This way we get the most powerful hw for future ideas.

evanh · 2013-10-05 16:05

I2S is a specific little-endian variant of SPI (Nominally big-endian) that was, I believe, introduced as part of the AC97 spec by the PC industry.

Cluso99 · 2013-10-05 16:09

Thanks Evan, I have never heard of it.

jmg · 2013-10-05 18:53

Cluso99 wrote: »

jmg: I know you mean I2C for I2S. I have those senior moments too

Actually no, I2S is an Audio variant of SPI, google AK4388A for one example of cheap Audio DACs using I2S.
Basicallly is is CLK +DATA and a variant of SS called LRCK, that toggles on alternate words to steer Left/Right Audio info.

So SPI hw can come very close to I2S, but some details need care.

AKM device has these modes - 24 bits are send with 32 clocks.

Mode DIF1 DIF0 SDTI Format BICK Figure
0 L L 16bit LSB justified ≥32fs Figure 4
1 L H 24bit LSB justified ≥48fs Figure 5
2 H L 24bit MSB justified ≥48fs Figure 6
3 H H 16/24bit I2S Compatible ≥48fs or 32fs Figure 7

evanh · 2013-10-05 19:42

AC97 is audio.

Just been doing some reading on slave mode ... It's hard! Typical implementations have the read dataout appearing the very next clock after the final bit of the command is clocked in, with no framing at all. The command is interpreted live it would seem. No way the Prop would conform to that. Luckily it don't have to, SPI is pretty ad-hoc.

One detail that does seem consistent is the shift clock is ignored while the SS line is high.

evanh · 2013-10-05 19:59

I stand corrected, Phillips is the author it would seems, it's much older than AC97 and was originally used for CD players. Although it looks there was a modification done for AC97 inclusion.

Cluso99 · 2013-10-05 22:32

I started another thread to discuss the shift registers. This will keep the discussion separate from the other discussions. Here is the link
http://forums.parallax.com/showthread.php/150685-P2-Serial-Shift-Register-discussion?p=1211737#post1211737

Phil Pilgrim (PhiPi) · 2013-10-05 23:32

Cluso,

Thanks, but it might be too late for that, since we've already developed such a thriving discussion in this thread. And Chip doesn't seem to mind the "highjacking." Also, the new thread is premised by your own vision for the hardware, which may or may not be what others have in mind. I suggest we continue the discussion here, given this thread's already-rich context.

'Hope you understand ...
-Phil

P.S. I can merge your thread into this one, if that's okay with you.

Cluso99 · 2013-10-06 01:04

Phil,
I really don't mind Phil. If you think it's better here then please move it back. I will then update the text a little to reflect this.

Ramon · 2013-10-06 01:32

cgracey wrote: »

... supported by Quartus Web Edition (no $3k/year cost for the tools), so that everybody can buy one and compile on it. ...

Yes, I suddenly remember how xilinx webpack didn't allow programming of its biggest spartan-6. Ehh! wait a moment, does altera also make the same for it´s biggest cyclone V? ...... The answer is yes

Cluso99 wrote: »

There is a 300K LE Cyclone V that is about $200. What a project

"Quartus Web Edition supports: Cyclone IV, and Cyclone V FPGAs (excluding 5CEA9, 5CGXC9, and 5CGTD9)."
Sorry, I didn't know that when I posted the comparison of prices. So no Cyclone IV or V bigger than 150K for free Quartus Web Edition.

Price for 150K Cyclone V FPGA alone is around $160 for -8 speed grade, $220 for -7, and $300 for -6 speed grade.

Ramon · 2013-10-06 01:49

cgracey wrote: »

Anyway, I realize the way to do P3 might be to build a generic big-FPGA board with an FPGA that is supported by Quartus Web Edition (no $3k/year cost for the tools), so that everybody can buy one and compile on it.

I am thinking of two ways to do this: we make the layout (*), or we ask terasic to make one.

Advantages and disadvanges:

We make the layout by ourselves (can we? ), so we have a custom layout designed for general use, and also P3 requirements. Disadvantages: high cost, and maybe not enought people (critical mass).

We ask terasic to make a low cost cyclone V board (are they willing to do that?). Advantages: lowest possible cost, broader audience (people outside this forums, could just try and join development). Disadvantages: no flexibility to customize the board or create new versions.

(*) In case we try to do by ourselves, I have as PCB software Diptrace version 2.3 (unlimited pins). But other might have Pulsonix, Protel, Altium ...

jmg · 2013-10-06 02:27

Ramon wrote: »

I am thinking of two ways to do this: we make the layout (*), or we ask terasic to make one.

There is a 3rd obvious alternative : Parallax make the PCB.
They are in the business of supplying modules already, this is just a bump in complexity.

Big update for DE2-115 and DE0-Nano users w/add-on boards

Comments