HoloShield : A soft-peripheral application case study

VBB · 2017-06-02 09:06

I am creating a new thread to discuss a new Propeller based application which I have extended with the propeller to make use of the unique soft-peripheral capabilities of the chip.

By discussing the design decisions the led me to use the propeller perhaps it will help others to consider using the propeller in their applications

The HoloShield was first mentioned in June 2014 where I first started learning the capabilities of the propeller. Back then it was called the VirtualShield-PRO http://forums.parallax.com/discussion/156000/fastest-possible-fifo-buffer-to-infinity-shield-kickstarter-and-beyond

Why the propeller?

The task for the HoloShield is to be able to virtualize the signals of almost any circuit. This means I need to be able to put any peripheral combination on any pin. And not just standard SPI/I2C peripherals but also custom peripherals like a WS2812 neopixel decoder. This is not something that standard microcontrollers can do!

So the choices then are FPGA or soft-peripheral.

The other design requirement is that I be able to almost instantly update the peripheral configuration. The designs are taken from virtual breadboard layouts, so which pins are connected to which modules and hence which peripherals are all taken from the Virtual Breadboard design and then the peripheral layout is generated and uploaded instantly. This is not easily achieved with an FPGA unless it has all the peripherals already installed and you just send pin mappings. However this is an extensible platform and all peripherals are not yet known as they will be added in the future.

This leaves soft-peripherals and this design requirement is what led me to the propeller.

Soft peripheral super chip!

The propeller is without doubt the best chip on the market for creating soft peripherals. It is fast and real time deterministic and has a unique multicore architecture that can be reconfigured and synchronized to handle a wide variety of scenarios.

So far I have been able to get it to implement I2C/SPI/LCD Logic/WS2812/FIFO's/Compressors/FrameBuffers and more.

The P1 is perfectly satisfactory for the first version of the HoloShield and this leads already to huge amount of potential Holographic Hardware.

Future Soft peripheral super-er chip!

Of course it's also useful to plan out a future direction for the product path to remove limitations you find as go along. For example, the current limit for SPI is 4Mhz and there are applications that use 8Mhz. So I have been on the lookout to improve the performance the P1.

Two options present themselves.
* P1V - A stripped down Verilog version to run in a modern FPGA which has the advantage of keeping the code base but disadvantage of re-engineering the FPGA
* P2 - Using the P2 which has the advantage of aligning with future parallax product path but disadvantage of reworking and maintaining dual code bases

I discussed some of these issues in the post on who will buy the P2

http://forums.parallax.com/discussion/166717/who-do-you-think-will-buy-the-prop2#latest

I will continue some points from this discussion on this thread.

VBB · 2017-06-02 09:25

jmg wrote: »

This is probably best in a separate thread - how many IO does your HoloShield need ? (I guess 5V tolerance is not important ?)

There are different form-factors for the HoloShield.
* Arduino Shield
* Arduino MEGA Shield
* Raspberry PI Hat
* Stamp

So the number of IO can vary. The MEGA Shield I am planning to use 2 P1 to get the pins I need but obviously a P2 with larger pin count would be useful here.

The pins are protected by a resistor which is an idea borrowed from the Propeller ASC+ Arduino form factor board so they are 5V tolerant in this way.

The pins are not intended to be current source/sinks to any significant level.

jmg · 2017-06-02 09:48

VBB wrote: »

Of course it's also useful to plan out a future direction for the product path to remove limitations you find as go along. For example, the current limit for SPI is 4Mhz and there are applications that use 8Mhz. So I have been on the lookout to improve the performance the P1.

Two options present themselves.
* P1V - A stripped down Verilog version to run in a modern FPGA which has the advantage of keeping the code base but disadvantage of re-engineering the FPGA
* P2 - Using the P2 which has the advantage of aligning with future parallax product path but disadvantage of reworking and maintaining dual code bases

How is that 8MHz SPI framed ?

What are the other chips doing on the board ?

I can see that 4MHz SPI is pushing things, and 8MHz is a bigger ask...

I think that is just 20 I/O you need so you could use a quite small FPGA/CPLD to pack in stuff the P1 cannot manage.
(like faster SPI and perhaps PWM)

Working backwards by package, if I search Digikey for QFN48 as a manageable size and pitch, there are many choices.
Covering 24,28,49 io, are part families from Lattice & Microsemi
Cheapest is
1100LE ICE5LP1K-SG48ITR50 @ 3.28250/100
next is
'10k gates' A3PN010-QNG48 @ 3.89500 /100
then
'30k gates' A3P030-QNG48 @ 4.99500

The part I like, ICE40UP5K-SG48, at 1+ $7.380, is tagged "Due to restrictions, this part is unavailable." at Digikey, whatever that means ??

VBB · 2017-06-02 10:13

>How is that 8MHz SPI framed ?

Using Chip Select with 8-bit data.

>What are the other chips doing on the board ?

* USB HID Controller with custom firmware for managing bootloader updates to the PIC and Propeller and Arduino if attached to the shield
* I2C EEPROM for propeller
* 4 Channel DAC POT for generating analog outputs to model basic analog sensors/joysticks etc
* PICMicro main application controller and supplemental hard peripherals/ADC/driven 5V IO
* P1 soft peripheral controller

>I think that is just 20 I/O you need so you could use a quite small FPGA/CPLD to pack in stuff the P1 cannot manage

I have configured the P1 as an I2C slave using COG's to pipeline and filter captured data and plan to support multiple P1's on the I2C bus for the MEGA. So it could be possible to integrate say a 4COG P1V version with reduced instructions/features (no video) as a reconfigurable high speed capture element in the pipeline on a separate chip and daisy chain the data into the pipeline making it transparent from the main controller. Could be a interesting approach actually.

jmg · 2017-06-02 10:26

VBB wrote: »

I have configured the P1 as an I2C slave using COG's to pipeline and filter captured data and plan to support multiple P1's on the I2C bus for the MEGA.

What speeds can you manage as i2c ?

VBB wrote: »

So it could be possible to integrate say a 4COG P1V version with reduced instructions/features (no video) as a reconfigurable high speed capture element in the pipeline on a separate chip and daisy chain the data into the pipeline making it transparent from the main controller. Could be a interesting approach actually.

I was thinking from the simpler end of the scale, to (initially) have the small FPGA have peripherals only, no core.
ie solve the problems P1 cannot, then see if there is room for COG.
That means you need to craft a fast link, FPGA <-> P1, maybe something like Framed QuadSPI ?

VBB · 2017-06-02 11:36

jmg wrote: »

What speeds can you manage as i2c ?

I haven't spec'd this yet. I run a traditional hardware interrupt I2C peripheral on the PIC and consuming the data is the bottleneck. It's not the main problem because by then it's reading from a FIFO and is compressed data.

But I think you mean what will the requirement of a dedicated hardware I2C be

jmg wrote: »

I was thinking from the simpler end of the scale, to (initially) have the small FPGA have peripherals only, no core.

I see. This is what I was considering the PIC32 for.. since it has bundles of hardware peripherals that a pin mappable. You mentioned the PIC32 is limited with the pin mappings and I need to look further just how mappable it is.. the PIC32MZ seemed to have quite some mapping ability at first glance and I have a dev kit on my desk waiting for me to (re)learn MPLAB-X and decipher the 1000+ page datasheet to use it!

Anyhow either the PIC32 or dedicated FPGA would be a very good way to handle high-speed peripherals but then suddenly you have moved away from the elegence of soft-peripherals and are hacking together custom bits with custom peripherals and dealing with multiple toolchains and multiple code bases.

In my dreams I drop in a super high performance Prop and do all peripherals as software. It's just elegant and easier to maintain. I think so often the software aspect is ignored and while I understand for large companies and high volumes hardware costs dominate for small guys like me the non recurring engineering costs and time-to market dominate.

Actually when I first started this project I thought the P2 would be here by now and when I first read it's spec I though it was going to be like 10 times the performance per core and that was going to be my target device and all this discussion would be moot.

So if I can possibly get away with one elegent code base that just scales as the processors speed scales it's hard to beat from a software perspective.

KeithE · 2017-06-02 15:15

Weren't some people able to achieve SPI faster than 4 MHz with the Prop1? Maybe someone else will be able to find a good thread, but I quickly found this one:

http://forums.parallax.com/discussion/113722/edited-fast-spi-out-1-bit-per-instruction/p1

I remember some clever code in the globe POV project for apa102 LEDs that was able to achieve high data rates, and given that you're looking at ws2812's you might want to look at some alternate protocols for people needing higher rates. I think that the maximum possible is 10 or 20 MHz on the Prop1.

VBB · 2017-06-02 15:24

KeithE wrote: »

Weren't some people able to achieve SPI faster than 4 MHz with the Prop1?

It's harder to do the SPI In than the SPI Out.. The In direction has to sync to the clock pulse and also consider variable time between bytes. In the outward direction you have complete control of the timing.

KeithE wrote: »

and given that you're looking at ws2812's you might want to look at some alternate protocols

So far I have implemented the 800Khz.. again this inward coming signal and again quite tight on the P1. I would really love a faster P1! .. I will also support outward streaming pixels but this is easier to do than inward capture. Do you have any particular alternate protocols in mind. I want to support as much as I can!

KeithE · 2017-06-02 16:10

VBB wrote: »

So far I have implemented the 800Khz.. again this inward coming signal and again quite tight on the P1. I would really love a faster P1! .. I will also support outward streaming pixels but this is easier to do than inward capture. Do you have any particular alternate protocols in mind. I want to support as much as I can!

Here's the post about the globe which uses APA102 LEDs which are output only, and many people drive them with SPI code or hardware:

http://forums.parallax.com/discussion/165844/led-pov-globe-with-prop

Note that the code references the SPI thread that I posted above. The comment in that code says this about single byte SPI reads:

"High speed read ONLY for SINGLE byte. This method clocks out $00 and reads the data
coming in on MISO. Clock is about 9.6Mhz. Writes the input data back to hub memory
address that was passed to this method. Using lonesock's method from FSRW."

KeithE · 2017-06-02 16:15

KeithE wrote: »

VBB wrote: »

So far I have implemented the 800Khz.. again this inward coming signal and again quite tight on the P1. I would really love a faster P1! .. I will also support outward streaming pixels but this is easier to do than inward capture. Do you have any particular alternate protocols in mind. I want to support as much as I can!

Here's the post about the globe which uses APA102 LEDs which are output only, and many people drive them with SPI code or hardware:

http://forums.parallax.com/discussion/165844/led-pov-globe-with-prop

Note that the code references the SPI thread that I posted above. The comment in that code says this about single byte SPI reads:

"High speed read ONLY for SINGLE byte. This method clocks out $00 and reads the data
coming in on MISO. Clock is about 9.6Mhz. Writes the input data back to hub memory
address that was passed to this method. Using lonesock's method from FSRW."

Edited to add: supporting APA102 at high data rates is something that the Prop1 does well. Not many chips seem to be able do it. The teensy boards are an alternate choice. (See the FastLED community https://plus.google.com/communities/109127054924227823508 for non-Prop LED projects for inspiration.)

VBB · 2017-06-02 16:25

KeithE wrote: »

the globe which uses APA102 LEDs..

Yes so these sort of cool projects are exactly why I crave the faster P1. While you might be able to output SPI at 10Mhz from a P1 it's just not going to work out trying to read the 10Mhz SPI. I mean you could maybe do it for a specialised case where you per clock cycle capture to IO ports into cog ram and then post-process the sample but in the generic sense this is not really practical whereby the data can come at any time and in any bursts.

So that one is in the HoloShield 2 category :-)

KeithE · 2017-06-02 19:15

If I remember correctly the APA102s can go up to 20 MHz, and the Propeller can burst out SPI data at this rate. (Watch out for slow voltage converter chips though.)

This is another thread of interest:

http://forums.parallax.com/discussion/159685/c-super-fast-spi-output

Anyways I would study that code and see if it helps you out. There are quotes such as the following in the thread: "Theoretically for a 80MHz clock, you can now drive SD cards at the specified max data rate (20MHz clock) for both input and output"

There is also some discussion about using the video generator for the output, and for getting fast bidirectional SPI.

http://forums.parallax.com/discussion/141112/using-the-video-generator-for-spi-i2s-etc#latest

Edited to add: and here http://forums.parallax.com/discussion/143679

VBB · 2017-06-02 20:34

KeithE wrote: »

>If I remember correctly the APA102s can go up to 20 MHz, and the Propeller can burst out SPI data at this rate. (Watch out for slow voltage converter chips though.)

Thanks for the tips but keep in mind the HoloShield is trying to capture this data not transmit it so not all of these tips apply directly but they do provide examples to test the capture against.

KeithE · 2017-06-02 20:44

I haven't studied everything in detail but there is discussion about fast input in that thread - e.g. see lonesock's quote.

Regardless of that, I'm not sure why fast output doesn't interest you. e.g. could you do what fade candy does? Perhaps using the video output would make a cog move available to handle dithering. And perhaps it could use APA102 LEDs so that it could handle more LEDs (20 MHz versus 800 kHz - so 25X as many?) in a chain. That might catch someone's attention. (Micah Dowty who did fade candy was working on Prop USB and was very capable, so maybe there'a a reason that Fadecandy didn't use a prop though.)

jmg · 2017-06-02 20:56

VBB wrote: »

...
Anyhow either the PIC32 or dedicated FPGA would be a very good way to handle high-speed peripherals but then suddenly you have moved away from the elegence of soft-peripherals and are hacking together custom bits with custom peripherals and dealing with multiple toolchains and multiple code bases.

Yes and no..
The approach to take to the FPGA is not so much to make it soft, as make it flexible.
The FPGA binary code itself should be stable, and change rarely.
What the FPGA ultimately does is varied by how the P1 configures it.
ie you are doing the PIC32 Peripheral + remapping in the FPGA, but in a more focused manner.

VBB wrote: »

...
So if I can possibly get away with one elegent code base that just scales as the processors speed scales it's hard to beat from a software perspective.

You already have 3 MCU's on that board, so that aspiration went away some time ago...

(Tho I think your USB MCU code will be quite stable, and illustrates what I mean in the stable-FPGA design case)

VBB wrote: »

..the globe which uses APA102 LEDs..
Yes so these sort of cool projects are exactly why I crave the faster P1. While you might be able to output SPI at 10Mhz from a P1 it's just not going to work out trying to read the 10Mhz SPI.

Another option could be to have a variable clock source ?
(of course, that adds more admin pain, so you have to decide which is more important..)

Oscillators are now low cost, and with impressive stability ( see FLiP here thread )
Some allow the choice of 2 frequencies, which is a simple alternative.
-or- Parts like the Si5351A allow any clock generation, but they are not so trivial to reconfigure.
Lowest local overhead on Si5351a, would be something like an i2c link on your USB MCU, that can output a ~100 entry table of address/data pairs, sent from PC host.

VBB · 2017-06-02 21:28

VBB wrote: »

.
You already have 3 MCU's on that board, so that aspiration went away some time ago...

Not exactly .. my final aspiration as I outlined a while ago would be to Merge the P1 core with an AE18 core

http://forums.parallax.com/discussion/156936/can-a-crowdsourced-chip-be-developed-from-the-p1-verilog-code/p2

Thence I can program in unified java as I have extended my java compiler to target the P1 so I can write soft peripherals in java

So single code base for at least 2 of the 3 chips but I get your point and there may need to be some interim step between where I am and where I want to get to.

KeithE · 2017-06-02 21:51

If you want to capture data on the Propeller and post process on a host, then perhaps you should look at the Quicksample Propeller code for Viewport that Hanno wrote.

Yes, it takes 4 cycles for one instruction. So, one cog running at 80MHz can execute 80/4=20M instructions/second. By waiting a specific amount for each cog, you can interleave them. This allows 4 cogs to execute 20M * 4= 80M instructions/second. Quicksample uses dynamic programming to "write" an assembly program that moves 384 samples of INA values to cog memory. This allows ViewPort to take samples at 80Msamples/second- good for events as short as 12.5ns! What makes Quicksample a bit more complicated is it’s ability to trigger on any type of event and to scale the sampling rate to slower intervals. Please try this for yourself to convince yourself: Tutorial #4 uses a counter to create a high speed signal that is then sampled at 80Msps. Quicksample is able to detect a signal up to 40MHz- you need 2 samples/hz to detect the high and low side of the signal.

You can found QuickSample on the Parallax site at https://www.parallax.com/product/32316, but I attached it for convenience.

jmg · 2017-06-02 23:40

VBB wrote: »

Thence I can program in unified java as I have extended my java compiler to target the P1 so I can write soft peripherals in java ( which I also use to develop the PIC18 application controller ) as below but the performance is not there yet for all types of peripherals so I am stuck somewhat with PASM.

Now that looks nifty.
What does that compile to, and can you do in-line PASM ?

VBB · 2017-06-03 09:18

KeithE wrote: »

If you want to capture data on the Propeller and post process on a host, then perhaps you should look at the Quicksample Propeller code for Viewport that Hanno wrote.

This sort of thing will be very useful for certain types of virtualizations but not for others. It can be useful for example if I want to add a virtual logic analyser with triggers to capture short buffer length bursts of data which is a natural thing to do because I have got those for virtual signals. For general purpose streamed data through for example for capturing continuous LED data my USB connection is not high enough speed as it's designed as HID for driverless cross-platform useage and so the propeller needs to capture and delta compress the data to enabled continuous streaming. The cog's can sure be used in interesting ways.

jmg wrote: »

What does that compile to, and can you do in-line PASM ?

It compiles to a PASM framework, so you notice the extends Cog base package this is the framework which contains support methods like waitForCounter and enqueue some of these are PASM helpers and some access the underlying framework. It's not completely generic, as it's designed to be part of a soft peripheral architecture streaming peripheral data in and out from a main processor.

HoloShield : A soft-peripheral application case study

Comments