Shop OBEX P1 Docs P2 Docs Learn Events
Fastest Possible FIFO Buffer => To Infinity Shield Kickstarter and beyond — Parallax Forums

Fastest Possible FIFO Buffer => To Infinity Shield Kickstarter and beyond

VBBVBB Posts: 52
edited 2017-03-18 09:59 in Propeller 1
Hi,

I have just started evaluating the propeller to become a programmable peripheral co-processor for the 'next generation' version of one of my products.

So on a edge condition of one of the pins I want to capture a port of 8 bits which can be assigned to any pins ie 8 from any of the pins in no specific order so its probably easiest to just capture them all.

How fast this can be done determines how many ways I will be able to use the propeller.

Naturally this will all be assembler. I am an expert in PicMicro assembly but this is my first 2 hours with the propeller so bare with the pseudo-code.

Method 1:

capture_loop:
Wait(forPinEdgeCondition)
MOV (InputRegister) to Main Memory at HUB_FIFO_WRITE_POINTER
Increment HUB_FIFO_WRITE_POINTER
If HUB_FIFO_WRITE_POINTER = HUB_FIFO_WRITE_POINTER_END then HUB_FIFO_WRITE_POINTER = HUB_FIFO_WRITE_POINTER_START
GOTO capture_loop:

So I am guessing about 30 instructions to achieve that and I will implement that as a first pass.

While I am doing that what I am interested in is suggestions in doing this even faster. Has anyone done this already?

The bottle neck is the HUB access so I can imagine a scheme where the COGS take it in turns to capture a value or even push to a local cache before passing control to another COG while pushing their captured data to the main memory but its not clear yet how to synchronize so delicately or even if that's possible.

Sorry for not researching more beforehand but I thought would see if others have done this before or point to example of something like this.

Thanks

-- James

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-06-06 13:43
    Welcome to the forum, James! Here's an example:
    :loop   waitpne condition,pin_mask
            waitpeq condition,pin_mask
            mov     temp,ina
            wrlong  temp,hub_ptr
            cmp     hub_ptr,end_addr wz
      if_nz add     hub_ptr,#4
      if_z  mov     hub_ptr,begin_addr
            jmp     #:loop
    

    It writes the states of all the pins to the hub on an edge, given by condition and pin_mask. If you're sure that the edge-defining pulse is very short, you can omit the waitpne statement.

    -Phil
  • MJBMJB Posts: 1,235
    edited 2014-06-06 14:51
    Have a look at Viewport
    http://onerobot.org/products/viewport/
    Virtual logic analyzer: capture state of all 32 pins at up to 80Msps with trigger

    and Propalyzer
    http://forums.parallax.com/showthread.php/110762-Propalyzer-Distribution-New-Update-1.0.1.4-Available

    both with propeller - and fast
  • VBBVBB Posts: 52
    edited 2014-06-07 02:58
    MJB wrote: »

    This was the start I needed.. thanks! While I need to study it further it seems to be using synchronized cogs to capture a full speed, so that seems possible although I need to crunch through the details.

    Also I misunderstood the clock speed and assumed a divide by 4 clock cycle ( as per PicMicro ) so I thought 23 clocks for hub access was 4 times slower. ie the propeller is faster than I thought!

    What I am building is a hardware virtualizer and while it's not exactly a logic analyser it has many similarities. 'Verson 1' already works for many hardware configurations with a regular picmicro's peripherals but I a evaluating the potential of the propeller to expand the library of hardware that can be virtualized.

    For example consider a T6963 graphic LCD screen, host micro's often keep an image in memory which they draw to internally and then dump the image using a tight loop. If it' a ChipKit PIC32 dumping the image that can be very fast! In addition there is additional logic like a status read back so you have to switch between read and write modes within the specification of the T6963 (50ns) or risk driving pins at he wrong time. This is not really possible with a regular micro but synchronized multi-core techniques down to 12.5ns look like they could do the trick at least for write. Even the propeller might be stuck to read back from RAM in 50ns window but this feature is not often used but you never know - the more I learn about the propeller the more that might become possible!

    Thanks for the pointers.

    Cheers,
    James

    www.virtualbreadboard.com
  • MJBMJB Posts: 1,235
    edited 2014-06-07 06:55
    VBB wrote: »
    In addition there is additional logic like a status read back so you have to switch between read and write modes within the specification of the T6963 (50ns) or risk driving pins at he wrong time. This is not really possible with a regular micro but synchronized multi-core techniques down to 12.5ns look like they could do the trick at least for write. Even the propeller might be stuck to read back from RAM in 50ns window but this feature is not often used but you never know - the more I learn about the propeller the more that might become possible!
    I was using the T6963 some years ago and I could not remember such tight timing constraints.
    Looking at the datasheet of the T6963 right now, I do NOT see any tight timing.
    The 50ns hold time on write might limit the speed, but nothing critical.
    And the operating frequency of 2.75 MHz max is also quite moderate.
    If I miss s.th. please show me ... ;-)

    MJB
  • VBBVBB Posts: 52
    edited 2014-06-07 07:54
    Hi, thanks for he feedback.

    Well I was looking mainly at tACC in the datasheet which I attached. Your right - it's 150ns MAX not 50ns.

    I think though the difference is I am trying to *BE* the T6963 not drive the T6963. So I have to handle he tightest possible cases to work for all host T6963 drivers although being slower will work for many. BTW this is not just about the T6963, it's also for others like KS108 or many other types of hardware.

    So at this stage I am investigating what techniques can be used with the propeller to handle what is normally done in integrated logic .

    Look good so far :-)
    1024 x 854 - 82K
  • MJBMJB Posts: 1,235
    edited 2014-06-07 09:12
    VBB wrote: »
    I think though the difference is I am trying to *BE* the T6963 not drive the T6963. So I have to handle he tightest possible cases to work for all host T6963 drivers although being slower will work for many. BTW this is not just about the T6963, it's also for others like KS108 or many other types of hardware.

    So at this stage I am investigating what techniques can be used with the propeller to handle what is normally done in integrated logic .

    Look good so far :-)
    I recently used the Prop to replace a HW interface for a legacy 8-bit bus.
    Simple PLD might have worked as well, but Prop gave sooo much more.
    Tightest was to release the bus fast enough, when the bus master strobed his priority request.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-06-07 20:31
    First, welcome to the forum.

    I wrote an analyser (cannot recal its name just at the moment - its in the obex tools section).
    I synchronised 4 cogs to interleave to achieve 12.5ns. BTW its possible to reliably overclock to 100MHz (10ns) or better. I typically overclock my boards to 104MHz (6.5MHz xtal).

    FWIW each cog runs at full speed (typically 80MHz) so instructions execute mostly at 4 clocks = 50ns. Jumps often have conditionals (ie if_z) and therefore take 4 clocks. A jump only takes 8 clocks if there is no conditional and its not taken (eg a DJNZ not taken).

    There is actually a 6 stage ppipeline where each instruction overlaps the previous instruction giving an effective 4 clock execution.

    Each cog is truly a 32 bit cpu. The only time they get slowed is when they access hub memory, and here, each cog gets 1 access in turn, in a fixed 16 clock cycle. This is why RD/WR BYTE/WORD/LONG take a variable number of clocks, depending on its current position in the hub window. Therefore, we can execute exactly 2 instructions between 2 successive hub read/writes.

    We have a lot of tricks that we can use to increase performance.

    Hope this helps. And dont hesitate to ask. There are many very knowledgable people here just waitjng to help, particularly when it gets very technical.
  • VBBVBB Posts: 52
    edited 2014-06-16 00:55
    Hi again,

    So I am making progress with the propeller. For external communications 'object' I have gone for a I2C slave FIFO reading a shared buffer using one of the I2C objects as a starting point. I have it working with my master I2C controller board so I am able to output values from the propeller. I am having trouble now though passing parameters and it's not clear what I am doing wrong.

    I want to pass a buffer length value and when I do that with a byte and use rdbyte from the entry list it all works fine. However the buffer length can be longer than 255 so I am trying to pass a long but I can't seem to get the value to pass over. With the same parameter of 100 the output becomes 66 when using a long. I have been trying various combinations but it doesn't seem to make sense. Is there some trick/trap that I am not aware of? Byte alignment or byte order in the long or some other declaration I am supposed to make?

    Thanks!
    VAR
      
      byte  _slave_address
      byte  SCL_pin
      byte  SDA_pin
     { byte _bufferLength  {works value is passed value of 100}}
     { long _bufferLength  {doesnt work value = 66?}}
      long _writePos   
      long _bufferStartAddress
          
      byte  cog
    
    
    PUB start(clk_pin, data_pin, slave_address, writePos, buffer, bufferLen) : okay
      stop
      
      _slave_address := slave_address
      SCL_pin := clk_pin
      SDA_pin := data_pin
              
      _writePos := writePos
      _bufferStartAddress := buffer
      _bufferLength :=  bufferLen
           
      okay := cog := cognew(@entry, @_slave_address) + 1
    
    
    PUB stop
      if cog
        cogstop(cog~ - 1)
        
    DAT                     org
    entry
                            mov       t1,par                                        '  pin assignments from the VAR block,  
                            rdbyte    device_address,t1                             '  and create bit masks
                            shl       device_address,#1
                            add       t1,#1
                            rdbyte    t2,t1
                            mov       SCL_mask,#1                                 
                            shl       SCL_mask,t2
                            add       t1,#1
                            rdbyte    t2,t1
                            mov       SDA_mask,#1
                            shl       SDA_mask,t2
                            add       t1,#1
                           { rdlong    buffer_len,t1   }
                            {mov      buffer_len, #101  { This works so its not the format of buffer_len when being sent} }
                            rdbyte    buffer_len,t1   {works when the definition is a byte } 
    
  • kuronekokuroneko Posts: 3,623
    edited 2014-06-16 01:01
    Variables for an object are sorted. First long followed by word and byte. So just rearrange your parameter area and go from there, i.e. change the cognew parameter to first long and adjust the read order in PASM.
  • VBBVBB Posts: 52
    edited 2014-06-16 01:13
    Yep - that was it! Thanks.. not something I would have guessed.
  • VBBVBB Posts: 52
    edited 2014-06-21 04:43
    Success! Thanks to the feedback from the forum I have been able to add the propeller as a signal capture 'co-processor' and am well on the way now to creating the 'VirtualShield-PRO'

    The idea behind the VirtualShield-PRO is to virtualize a wide variety of hardware shields taking advantage of the defacto Arduino 'BUS' standard to enable a range of host boards to access a mega catalog of virtual hardware shields at the signal level. Just like a 'mini matrix' for the arduino form-factor host controller board.

    With Version 1 of the VirtualShield I was already able to create many shields such as virtual LCD's, 7 segment, Matrix LED's even TFT screen but there are limitations such as speed and fixed pin configurations and also specialised logic interfaces that just couldn't be done with the regular master controller.

    Enter the propeller! I have used the case study of the T6963 which is a popular graphics LCD. I used the propeller as an I2C slave 'programmable peripheral co-processor' to do the work of capturing the T6963 RD/WR signals and buffer it and then send the data through to the master controller on demand. I then drove the virtual hardware with an Arduino open source graphics library to test it.

    Once captured the data is then virtualized into a T6963 in the VBB software and I am pleased to say it all works very nicely. Furthermore I now have the pattern to apply to create additional virtual hardware, other LCD's, specialised lighting, motor drivers etc. The propeller has opened up a huge number of possibilities and it really is an ideal micro for this application. Its been a bit of a learning curve but I think I have become a bit of a fan.

    I have attached an animated gif recording of the virtualized T6963 running on a prototype VirtualShield-PRO. You can see in the picture of the prototype using an existing VirtualShield 1 ( formally called the ICEShield ) atttached to a propeller based ASC+ board communicating via I2C. An Arduino UNO running the graphics driver program is the host controller for the 'stack'


    Next I need to design a new PCB with the Propeller and supporting chips integrated and figure out a few extra details like runtime firmware updates but the proof of concept is done.

    The plan is to take the VirtualShield-PRO to Kickstarter to fund a production run. I hope you can give me some feedback and help me with that challenge. To help visualise the application concept I have also attached is a block diagram of how things fit together to create a virtual:real interface. I also want to consider other defacto bus standards like the PICTAIL.. perhaps there is a defacto connectivity standard for parallax products I should support. Suggestions on that would be appreciated.

    Thanks!

    -- James

    www.virtualbreadboard.com
    480 x 440 - 179K
    661 x 225 - 123K
    647 x 628 - 30K
  • MJBMJB Posts: 1,235
    edited 2014-06-24 05:45
    just had a look VBB looks great
  • VBBVBB Posts: 52
    edited 2014-06-24 11:39
    Thanks! Having made a start with the propeller I am contemplating integrating the open source C# 'Gear' propeller emulator as a VBB micro module. VBB tends to grow with what I use!

    I wonder if there would be interest in a propeller module in VBB. I did a similar thing with an open source C# AVR implementation and that worked out quite well.

    I do this full-time so I need to charge something for VBB modules but I try to keep it reasonable. Probably I will post a poll or something down the track to check on interest.
  • MJBMJB Posts: 1,235
    edited 2014-06-25 08:02
    VBB wrote: »
    Thanks! Having made a start with the propeller I am contemplating integrating the open source C# 'Gear' propeller emulator as a VBB micro module. VBB tends to grow with what I use!

    I wonder if there would be interest in a propeller module in VBB. I did a similar thing with an open source C# AVR implementation and that worked out quite well.

    I do this full-time so I need to charge something for VBB modules but I try to keep it reasonable. Probably I will post a poll or something down the track to check on interest.
    one thing that I did not like about GEAR is, that it provides only very limited IO functionality.
    So having it in VBB (which I don't know .. besides the shots on the web site) might complement this.
    Being able to interface to virtual instruments for interactive IO is a pig plus.
    I know PROTEUS which is a great tool for analog and digital design and simulation - it also has a great AVR module, which allows real
    model based development with real simulated peripherals and analog / digital periphery.
    Attaching a virtual Display, a virtual poti connected to a pin, or even virtual RS232, USB Ethernet ...
    is a big boost in productivity.

    p.s. I saw the costs $29 for different add on modules - but not the price for the base system ...
  • VBBVBB Posts: 52
    edited 2014-06-25 09:54
    Yes, there are many similarities between VBB and PROTEUS. VBB is like PROTEUS for everyone, arguably VBB is easier to use and targets MAKER and self-education types. The Arduino features have been the most popular so that gives some idea of the type of users.

    I agree GEAR standing alone is much less useful than it would be in the VBB universe and that's the interface I would charge for.

    So with VBB you would be able to do all those all those things with virtual propeller ie connect to virtual displays, potentiometers, Ethernet, virtual rs232 and more.. also to real devices via the VirtualShield-PRO

    The business model I use is to provide the base software free and charge for integrations with various microcontrollers. I am also rolling out a quick turn PCB service shortly so designs are also makeable which will also help fund things. That's one big difference with PROTEUS.

    Actually the project that started this thread for the Virtual Shield the idea is that you can use any microcontroller with the virtual hardware through the VirtualShield interface so this another type of microcontroller integration. This removes the need to have a microcontroller model for every single micro out there.
  • Getting there.. the P1 powered 'Virtual Shield Pro' has been renamed to 'Infinity Shield' and the product line has shifted focus into Mixed Reality with support for HoloLens and smart phone viewers.

    Kickstarter is coming soon so I am counting on your support!



    This demo shows the propeller powered infinity shield (actually it's offscreen) doing real time SPI capture and decoding of SSD1306 display at 4Mhz SPI. Pretty sweet for a software peripheral and just enough grunt to reach the 4Mhz which is the default Arduino SPI speed.

    The demo shows a Virtualization of the Arduboy just for fun. Still working on best conditions for HoloLens recordings so video's will get better but time to start sharing.
    The use of a texture is a bit fuzzier than a 3d model but it's an experiment to allow users to apply their own. For example this will work for the Propeller Badge ( Same OLED ) just with a different texture and using the propeller badge code. So I will do that as an example also ie virtualise the Propeller Badge. Anyone got any games for the Propeller Badge??

    Also planning to virtualize the Lamestation, neopixels and more.. Next few weeks will be fun as I roll out the demos and then the kickstarter.

    P1 is awesome for this application as generic soft peripheral co-processor. Could do with more grunt though. Currently tops out at 4Mhz SPI which is the default Arduino setting but for example ArduBoy actually uses 8Mhz SPI so it doesn't work 'out of the box' which is a primary goal. So room for future acceleration.. P2?

    InfinityShieldProto.png

    Current Infinity Shield prototype - needs a new revision!

    Not to brag but I have done some really fun work to make this possible. Some of this will make it's way into VBB over time.
    * Visual Studio C# P1 instruction set emulator (PASM only) framework with full multi-core, multi-thread unit testing framework. Couldn't live without full featured debugging.
    * Dynamic peripheral code generation based on VBB circuit configuration with open-spin compiling and just-in-time programming of the propeller on the shield on project launch
    * Java AOT compilation of java defined hard peripherals and framework for seamlessly working with the soft peripherals as java threads.

    More soon!
  • Nice work. Are there components on the rear? I don't see 4 bypass caps close to the prop power pins.
  • No, components are only on the top. Nice to get such design tip - probably I should publish the final schematic for tips like this before committing to a final production.
  • T ChapT Chap Posts: 4,223
    edited 2017-03-18 13:55
    If you have any free pins you could add some leds to use for various reasons. If this is for general public I suggest not using ftdi with driver requirements. I have lost much time trying to help customers upgrade firmware and they can't get ftdi to program mysteriously so they go borrow other computers to get it done. Tons of hours lost over the years. I suggest silabs cp2110 USB uart. I have a prop loader for it and the app I built can be modified for your own cosmetics in Xojo.
  • There are a couple of basic LEDS, power, rx/tx, pin 13 but this is not a general purpose development board ( that's a future product line though )

    The Infinity Shield is intended to work pretty much exclusively with the Virtual Breadboard software - primarily the Windows Store UWP version for Windows 10, Windows Mobile and Windows IoT Core for Raspberry Pi. (Also a version for Linux/mac will be made available). Any extra LED's you might need can be virtualized in the VBB software and rendered over the physical board or any prototype you might be making - augmented/mixed reality style :-)

    I chose to use a custom USB HID Chip because HID USB is the only solution I have found that works with all the platforms without needing any drivers to be installed..

    Apparently FTDI can be made to work with Windows 10 UWP but requires special procedures for a custom USB driver install. Basically a PITA! Instead I developed custom USB HID firmware and also the propeller firmware uploader to work with the USB HID chip which can upload new firmware either the propeller or the embedded java micro running the main firmware application.
  • There's a glitch in the way the forum handles the "YouTube" embed option.

    You're better off just posting the bare link without any tags.

    Here's the video which doesn't show up in the above post.



Sign In or Register to comment.