Fastest Possible FIFO Buffer => To Infinity Shield Kickstarter and beyond
VBB
Posts: 52
Hi,
I have just started evaluating the propeller to become a programmable peripheral co-processor for the 'next generation' version of one of my products.
So on a edge condition of one of the pins I want to capture a port of 8 bits which can be assigned to any pins ie 8 from any of the pins in no specific order so its probably easiest to just capture them all.
How fast this can be done determines how many ways I will be able to use the propeller.
Naturally this will all be assembler. I am an expert in PicMicro assembly but this is my first 2 hours with the propeller so bare with the pseudo-code.
Method 1:
capture_loop:
Wait(forPinEdgeCondition)
MOV (InputRegister) to Main Memory at HUB_FIFO_WRITE_POINTER
Increment HUB_FIFO_WRITE_POINTER
If HUB_FIFO_WRITE_POINTER = HUB_FIFO_WRITE_POINTER_END then HUB_FIFO_WRITE_POINTER = HUB_FIFO_WRITE_POINTER_START
GOTO capture_loop:
So I am guessing about 30 instructions to achieve that and I will implement that as a first pass.
While I am doing that what I am interested in is suggestions in doing this even faster. Has anyone done this already?
The bottle neck is the HUB access so I can imagine a scheme where the COGS take it in turns to capture a value or even push to a local cache before passing control to another COG while pushing their captured data to the main memory but its not clear yet how to synchronize so delicately or even if that's possible.
Sorry for not researching more beforehand but I thought would see if others have done this before or point to example of something like this.
Thanks
-- James
I have just started evaluating the propeller to become a programmable peripheral co-processor for the 'next generation' version of one of my products.
So on a edge condition of one of the pins I want to capture a port of 8 bits which can be assigned to any pins ie 8 from any of the pins in no specific order so its probably easiest to just capture them all.
How fast this can be done determines how many ways I will be able to use the propeller.
Naturally this will all be assembler. I am an expert in PicMicro assembly but this is my first 2 hours with the propeller so bare with the pseudo-code.
Method 1:
capture_loop:
Wait(forPinEdgeCondition)
MOV (InputRegister) to Main Memory at HUB_FIFO_WRITE_POINTER
Increment HUB_FIFO_WRITE_POINTER
If HUB_FIFO_WRITE_POINTER = HUB_FIFO_WRITE_POINTER_END then HUB_FIFO_WRITE_POINTER = HUB_FIFO_WRITE_POINTER_START
GOTO capture_loop:
So I am guessing about 30 instructions to achieve that and I will implement that as a first pass.
While I am doing that what I am interested in is suggestions in doing this even faster. Has anyone done this already?
The bottle neck is the HUB access so I can imagine a scheme where the COGS take it in turns to capture a value or even push to a local cache before passing control to another COG while pushing their captured data to the main memory but its not clear yet how to synchronize so delicately or even if that's possible.
Sorry for not researching more beforehand but I thought would see if others have done this before or point to example of something like this.
Thanks
-- James
Comments
It writes the states of all the pins to the hub on an edge, given by condition and pin_mask. If you're sure that the edge-defining pulse is very short, you can omit the waitpne statement.
-Phil
http://onerobot.org/products/viewport/
Virtual logic analyzer: capture state of all 32 pins at up to 80Msps with trigger
and Propalyzer
http://forums.parallax.com/showthread.php/110762-Propalyzer-Distribution-New-Update-1.0.1.4-Available
both with propeller - and fast
This was the start I needed.. thanks! While I need to study it further it seems to be using synchronized cogs to capture a full speed, so that seems possible although I need to crunch through the details.
Also I misunderstood the clock speed and assumed a divide by 4 clock cycle ( as per PicMicro ) so I thought 23 clocks for hub access was 4 times slower. ie the propeller is faster than I thought!
What I am building is a hardware virtualizer and while it's not exactly a logic analyser it has many similarities. 'Verson 1' already works for many hardware configurations with a regular picmicro's peripherals but I a evaluating the potential of the propeller to expand the library of hardware that can be virtualized.
For example consider a T6963 graphic LCD screen, host micro's often keep an image in memory which they draw to internally and then dump the image using a tight loop. If it' a ChipKit PIC32 dumping the image that can be very fast! In addition there is additional logic like a status read back so you have to switch between read and write modes within the specification of the T6963 (50ns) or risk driving pins at he wrong time. This is not really possible with a regular micro but synchronized multi-core techniques down to 12.5ns look like they could do the trick at least for write. Even the propeller might be stuck to read back from RAM in 50ns window but this feature is not often used but you never know - the more I learn about the propeller the more that might become possible!
Thanks for the pointers.
Cheers,
James
www.virtualbreadboard.com
Looking at the datasheet of the T6963 right now, I do NOT see any tight timing.
The 50ns hold time on write might limit the speed, but nothing critical.
And the operating frequency of 2.75 MHz max is also quite moderate.
If I miss s.th. please show me ... ;-)
MJB
Well I was looking mainly at tACC in the datasheet which I attached. Your right - it's 150ns MAX not 50ns.
I think though the difference is I am trying to *BE* the T6963 not drive the T6963. So I have to handle he tightest possible cases to work for all host T6963 drivers although being slower will work for many. BTW this is not just about the T6963, it's also for others like KS108 or many other types of hardware.
So at this stage I am investigating what techniques can be used with the propeller to handle what is normally done in integrated logic .
Look good so far :-)
Simple PLD might have worked as well, but Prop gave sooo much more.
Tightest was to release the bus fast enough, when the bus master strobed his priority request.
I wrote an analyser (cannot recal its name just at the moment - its in the obex tools section).
I synchronised 4 cogs to interleave to achieve 12.5ns. BTW its possible to reliably overclock to 100MHz (10ns) or better. I typically overclock my boards to 104MHz (6.5MHz xtal).
FWIW each cog runs at full speed (typically 80MHz) so instructions execute mostly at 4 clocks = 50ns. Jumps often have conditionals (ie if_z) and therefore take 4 clocks. A jump only takes 8 clocks if there is no conditional and its not taken (eg a DJNZ not taken).
There is actually a 6 stage ppipeline where each instruction overlaps the previous instruction giving an effective 4 clock execution.
Each cog is truly a 32 bit cpu. The only time they get slowed is when they access hub memory, and here, each cog gets 1 access in turn, in a fixed 16 clock cycle. This is why RD/WR BYTE/WORD/LONG take a variable number of clocks, depending on its current position in the hub window. Therefore, we can execute exactly 2 instructions between 2 successive hub read/writes.
We have a lot of tricks that we can use to increase performance.
Hope this helps. And dont hesitate to ask. There are many very knowledgable people here just waitjng to help, particularly when it gets very technical.
So I am making progress with the propeller. For external communications 'object' I have gone for a I2C slave FIFO reading a shared buffer using one of the I2C objects as a starting point. I have it working with my master I2C controller board so I am able to output values from the propeller. I am having trouble now though passing parameters and it's not clear what I am doing wrong.
I want to pass a buffer length value and when I do that with a byte and use rdbyte from the entry list it all works fine. However the buffer length can be longer than 255 so I am trying to pass a long but I can't seem to get the value to pass over. With the same parameter of 100 the output becomes 66 when using a long. I have been trying various combinations but it doesn't seem to make sense. Is there some trick/trap that I am not aware of? Byte alignment or byte order in the long or some other declaration I am supposed to make?
Thanks!
The idea behind the VirtualShield-PRO is to virtualize a wide variety of hardware shields taking advantage of the defacto Arduino 'BUS' standard to enable a range of host boards to access a mega catalog of virtual hardware shields at the signal level. Just like a 'mini matrix' for the arduino form-factor host controller board.
With Version 1 of the VirtualShield I was already able to create many shields such as virtual LCD's, 7 segment, Matrix LED's even TFT screen but there are limitations such as speed and fixed pin configurations and also specialised logic interfaces that just couldn't be done with the regular master controller.
Enter the propeller! I have used the case study of the T6963 which is a popular graphics LCD. I used the propeller as an I2C slave 'programmable peripheral co-processor' to do the work of capturing the T6963 RD/WR signals and buffer it and then send the data through to the master controller on demand. I then drove the virtual hardware with an Arduino open source graphics library to test it.
Once captured the data is then virtualized into a T6963 in the VBB software and I am pleased to say it all works very nicely. Furthermore I now have the pattern to apply to create additional virtual hardware, other LCD's, specialised lighting, motor drivers etc. The propeller has opened up a huge number of possibilities and it really is an ideal micro for this application. Its been a bit of a learning curve but I think I have become a bit of a fan.
I have attached an animated gif recording of the virtualized T6963 running on a prototype VirtualShield-PRO. You can see in the picture of the prototype using an existing VirtualShield 1 ( formally called the ICEShield ) atttached to a propeller based ASC+ board communicating via I2C. An Arduino UNO running the graphics driver program is the host controller for the 'stack'
Next I need to design a new PCB with the Propeller and supporting chips integrated and figure out a few extra details like runtime firmware updates but the proof of concept is done.
The plan is to take the VirtualShield-PRO to Kickstarter to fund a production run. I hope you can give me some feedback and help me with that challenge. To help visualise the application concept I have also attached is a block diagram of how things fit together to create a virtual:real interface. I also want to consider other defacto bus standards like the PICTAIL.. perhaps there is a defacto connectivity standard for parallax products I should support. Suggestions on that would be appreciated.
Thanks!
-- James
www.virtualbreadboard.com
I wonder if there would be interest in a propeller module in VBB. I did a similar thing with an open source C# AVR implementation and that worked out quite well.
I do this full-time so I need to charge something for VBB modules but I try to keep it reasonable. Probably I will post a poll or something down the track to check on interest.
So having it in VBB (which I don't know .. besides the shots on the web site) might complement this.
Being able to interface to virtual instruments for interactive IO is a pig plus.
I know PROTEUS which is a great tool for analog and digital design and simulation - it also has a great AVR module, which allows real
model based development with real simulated peripherals and analog / digital periphery.
Attaching a virtual Display, a virtual poti connected to a pin, or even virtual RS232, USB Ethernet ...
is a big boost in productivity.
p.s. I saw the costs $29 for different add on modules - but not the price for the base system ...
I agree GEAR standing alone is much less useful than it would be in the VBB universe and that's the interface I would charge for.
So with VBB you would be able to do all those all those things with virtual propeller ie connect to virtual displays, potentiometers, Ethernet, virtual rs232 and more.. also to real devices via the VirtualShield-PRO
The business model I use is to provide the base software free and charge for integrations with various microcontrollers. I am also rolling out a quick turn PCB service shortly so designs are also makeable which will also help fund things. That's one big difference with PROTEUS.
Actually the project that started this thread for the Virtual Shield the idea is that you can use any microcontroller with the virtual hardware through the VirtualShield interface so this another type of microcontroller integration. This removes the need to have a microcontroller model for every single micro out there.
Kickstarter is coming soon so I am counting on your support!
This demo shows the propeller powered infinity shield (actually it's offscreen) doing real time SPI capture and decoding of SSD1306 display at 4Mhz SPI. Pretty sweet for a software peripheral and just enough grunt to reach the 4Mhz which is the default Arduino SPI speed.
The demo shows a Virtualization of the Arduboy just for fun. Still working on best conditions for HoloLens recordings so video's will get better but time to start sharing.
The use of a texture is a bit fuzzier than a 3d model but it's an experiment to allow users to apply their own. For example this will work for the Propeller Badge ( Same OLED ) just with a different texture and using the propeller badge code. So I will do that as an example also ie virtualise the Propeller Badge. Anyone got any games for the Propeller Badge??
Also planning to virtualize the Lamestation, neopixels and more.. Next few weeks will be fun as I roll out the demos and then the kickstarter.
P1 is awesome for this application as generic soft peripheral co-processor. Could do with more grunt though. Currently tops out at 4Mhz SPI which is the default Arduino setting but for example ArduBoy actually uses 8Mhz SPI so it doesn't work 'out of the box' which is a primary goal. So room for future acceleration.. P2?
Current Infinity Shield prototype - needs a new revision!
Not to brag but I have done some really fun work to make this possible. Some of this will make it's way into VBB over time.
* Visual Studio C# P1 instruction set emulator (PASM only) framework with full multi-core, multi-thread unit testing framework. Couldn't live without full featured debugging.
* Dynamic peripheral code generation based on VBB circuit configuration with open-spin compiling and just-in-time programming of the propeller on the shield on project launch
* Java AOT compilation of java defined hard peripherals and framework for seamlessly working with the soft peripherals as java threads.
More soon!
The Infinity Shield is intended to work pretty much exclusively with the Virtual Breadboard software - primarily the Windows Store UWP version for Windows 10, Windows Mobile and Windows IoT Core for Raspberry Pi. (Also a version for Linux/mac will be made available). Any extra LED's you might need can be virtualized in the VBB software and rendered over the physical board or any prototype you might be making - augmented/mixed reality style :-)
I chose to use a custom USB HID Chip because HID USB is the only solution I have found that works with all the platforms without needing any drivers to be installed..
Apparently FTDI can be made to work with Windows 10 UWP but requires special procedures for a custom USB driver install. Basically a PITA! Instead I developed custom USB HID firmware and also the propeller firmware uploader to work with the USB HID chip which can upload new firmware either the propeller or the embedded java micro running the main firmware application.
You're better off just posting the bare link without any tags.
Here's the video which doesn't show up in the above post.