Tachyon V4 "DAWN" - exploring new worlds

1235»

Comments

  • Peter JakackiPeter Jakacki Posts: 6,087
    edited January 18 Vote Up0Vote Down
    I would probably just put in enough cog code to load multiple sequential blocks, even if the files were fragmented the cluster size is still 32kB so there is no need to even follow the cluster chain.

    Also SD cards are formatted with a 2MB partition before the FAT32 partition to optimize how the SD card controller handles wear leveling and prioritizing FAT tables etc. A quick scan of an 8GB card shows it is all blank in the first 4194304 bytes except for sector 0.

    So we could even just stick an image or images somewhere in there so the bootloader wouldn't even need to worry about the FAT, it could just init the card and load in from a starting sector straight into RAM.

    Anyway I'm doing the FTP style load over the ping-pong network so it doesn't matter as much but either way I could also save the image to the SD card that way. The main thing is that I want to treat the lower 32k of EEPROM as boot ROM.
  • Peter JakackiPeter Jakacki Posts: 6,087
    edited January 29 Vote Up0Vote Down
    PING PONG+ update
    As many of you know I have multidrop communications network built into Tachyon called ping-pong due to the one for one transactions between a master and slave Prop. This network mainly supports RS-485 for long distances but is also usable with simple single I/O networks usually with a small value resistor in series with the pin. The ping-pong nature allows me to effect virtual full-duplex one on connections over long distances that operate as if I am plugged directly into the serial console port so that I can interact with Tachyon the same way.

    Over the last week or so I have added 9-bit mode so that I can transmit 8-bit binary data transparently if needed and some of the $100-$1FF address codes are reserved as network commands codes. Using these codes one Prop can access the hub memory of another Prop in the background and use this to clone an image onto the whole network of matching devices. At present at 2Mbps I can handle transfers at around 100kB/sec so a full hub write takes around 300ms. Because groups of devices can be selected this means I can have the basic networking and perhaps Tachyon preloaded in EEPROM on each device yet have the master download the final image at startup to all similar devices in parallel. Since this is asynch then there are no special tricks that are needed such as matching clocks etc plus even standard MCUs with 9-bit mode UARTS can be used and indeed this is the plan too.

    The network speed has been tested up to 4Mbps but I have some tweaking to do to get it to run reliably at that rate. This includes the HSUART communications ROM too as this is used for the master.

    Anyway, this mode might be beyond most simple LED blinker uses but I'm stoked that I can now develop on one board and immediately upgrade the software on a whole network in the blink of an eye. My current project uses about 40+ Props in such a network. Now to polish it up and I may add some modes.

    PING PONG COMMANDS:
    SELECT n - select a point to point connection with a device, 0 deselects all
    GLOBAL - select all devices in listen only mode
    GROUP n - select a group of devices that match in listen only mode
    INDUCT - add another device as a member to the current selection
    WRITE addr cnt <data> - write to hub memory
    READ addr cnt <data> - read from hub memory
    HALT - halt all other cogs in the selected devices
    INIT addr par cog - load a cog in the selected devices)
    RESET - reset all devices


  • Peter-
    I personally find it really cool when users such as you and jonnymac use the propeller in industrial settings. As a maintenance technician, I see PLC's all over the place and talking to coworkers you would think that nothing else could exist in a industrial environment. Then I come home and see it proved otherwise. Awesome job!
  • Peter JakackiPeter Jakacki Posts: 6,087
    edited January 30 Vote Up0Vote Down
    Shawn Lowe wrote: »
    Peter-
    I personally find it really cool when users such as you and jonnymac use the propeller in industrial settings. As a maintenance technician, I see PLC's all over the place and talking to coworkers you would think that nothing else could exist in a industrial environment. Then I come home and see it proved otherwise. Awesome job!

    Thanks Shawn, I enjoy using the Prop, even without P2 I can certainly squeeze a lot out of it, and more each day it seems. It's a shame that real P1 use is languishing because so many are holding off for P2 (for years), and it seems it's mostly because of language choices. However I'm having a lot of fun at the moment cloning from one Prop to many with the new ping-pong loader commands, I type "CLONE" and the selected units immediately start up as clones :) Of course I can also specify a file image to use instead. I'm not just "cloning" around either, as you know, this is real serious industrial stuff.

  • Peter,
    I like what I see with your new Tycon. I have two questions:how is the best way to start learning Forth and have you ever tried running your ping pong net wirelessly like via xbee?
    Jim
  • RS_Jim wrote: »
    Peter,
    I like what I see with your new Tycon. I have two questions:how is the best way to start learning Forth and have you ever tried running your ping pong net wirelessly like via xbee?
    Jim

    Hi Jim, I think the best way to learn Forth on an MCU vs PC is to play with the hardware. So if you had a ping sensor, or LEDs, or motors etc, then try simple things and build up from there. This way you are learning the language by using it to do something, and see the results. Once you start seeing how quick and interactive it is you will get hooked. Avoid "learning Forth" on a PC and avoid trying to convert code from one language to another. Normally there are different and better ways to implement functions in Forth.

    However as to your question regarding using ping-pong over the likes of xbee I would think that this wouldn't be suitable as ping-pong relies on fast transmit/receive switching which is fine for I/O pins and RS485 but wireless loves its preambles. You might think that the Xbee's UART "transparent data mode" should work with ping-pong but it is still RF and it would have to packetize even a single character much like Ethernet. So you could do it but I don't think that the effective speed would be great. However, don't let me stop you although I might hook up some simple RF modules and try this myself.

  • RS_Jim wrote: »
    Peter,
    I like what I see with your new Tycon. I have two questions:how is the best way to start learning Forth and have you ever tried running your ping pong net wirelessly like via xbee?
    Jim

    Hi Jim, I think the best way to learn Forth on an MCU vs PC is to play with the hardware. So if you had a ping sensor, or LEDs, or motors etc, then try simple things and build up from there. This way you are learning the language by using it to do something, and see the results. Once you start seeing how quick and interactive it is you will get hooked. Avoid "learning Forth" on a PC and avoid trying to convert code from one language to another. Normally there are different and better ways to implement functions in Forth.

    However as to your question regarding using ping-pong over the likes of xbee I would think that this wouldn't be suitable as ping-pong relies on fast transmit/receive switching which is fine for I/O pins and RS485 but wireless loves its preambles. You might think that the Xbee's UART "transparent data mode" should work with ping-pong but it is still RF and it would have to packetize even a single character much like Ethernet. So you could do it but I don't think that the effective speed would be great. However, don't let me stop you although I might hook up some simple RF modules and try this myself.

    I was thinking about Ping-Pong over wireless/RF or maybe even ESP8266 as well to extend the NW without cables.
    But did not spend enough time to understand Ping-Pong.
    Is it possible to ping-pong on a command / response or 'line' level instead the character level you mention above?
    For low speed that would be ok for my applications.


    http://www.smmu.info (german) Source-Measure-Multiplex-Unit = professional test system for electronic components, sensors, assemblies
    Tachyon code and documentation snippets from Tachyon thread
  • Peter JakackiPeter Jakacki Posts: 6,087
    edited February 1 Vote Up0Vote Down
    MJB wrote: »
    I was thinking about Ping-Pong over wireless/RF or maybe even ESP8266 as well to extend the NW without cables.
    But did not spend enough time to understand Ping-Pong.
    Is it possible to ping-pong on a command / response or 'line' level instead the character level you mention above?
    For low speed that would be ok for my applications.

    Well even though ppnet (might be easier to refer to) does ping-pong at the byte level it also works with blocks of data when it reads and writes hub memory, so I guess this is possible. So I will look into it as it might be useful for me to bridge the hardwired network in some cases. I haven't really had time to play with the ESP chips yet but since we can use these in telnet mode then they should well too.
  • Peter JakackiPeter Jakacki Posts: 6,087
    edited February 12 Vote Up0Vote Down
    Even though V4 is still in the testing folder I have been using it in new designs in place of V3. This has helped me fill in the holes so to speak and I have included various optimizations as well. Take for instance the humble high level toggle.

    V4 250kHz toggle
    ( 0011 $18C0  ok )   LAP 1000000 FOR 29 HIGH 29 LOW NEXT LAP .LAP
    384000224 cycles at 96000000Hz  or 4000.002ms   
    

    V3 186kHz toggle
    LAP 1000000 FOR 29 HIGH 29 LOW NEXT LAP .LAP 5.333secs ok
    

    Contrast this with pure Spin which comes in at 9.71kHz (@96Mhz clock), over 25 times slower than V4
    pub toggle
      dira[29] := 1
      repeat
        outa[29] := 1
        outa[29] := 0
    
    The simpler looking Spin that uses !outa[29] is much slower again at 7.19kHz

    So I may be moving V4 across to the main folder and put V3 and V4 into sub-folders.

    btw, Using the fast pin method V4 can toggle at 1MHz
    ( 0019 $18C0  ok )   29 MASK MPIN 1000000 LAP FOR H L NEXT LAP .LAP
    96000176 cycles at 96000000Hz  or 1000.001ms 
    

    The fast pin allows high level to pulse lines faster than 1us, or precisely 1us with "L L L H"
  • Hi Peter,
    I just dowloaded V4, thanks for making it public.
    It's indeed much faster. With V3 I had to set a char delay of 1ms to be able to load any file (with a 6MHz xtal). This now works without any char delay, nice.
    But I have the problem that after loading extend-v4.fth it hangs, no boot up message nothing, not even the serial TX is active. There is nothing special on my board,just 64K EEPROM and some SPI devices.
    BTW there is an error in EXTEND-V4.FTH at line 163. It complains that id is unknown.

    Cheers from Taiwan
    Reinhardt, putting Propellors into real helicopters.

    --
    Reinhardt
  • Peter JakackiPeter Jakacki Posts: 6,087
    edited February 12 Vote Up0Vote Down
    rbehm wrote: »
    Hi Peter,
    I just dowloaded V4, thanks for making it public.
    It's indeed much faster. With V3 I had to set a char delay of 1ms to be able to load any file (with a 6MHz xtal). This now works without any char delay, nice.
    But I have the problem that after loading extend-v4.fth it hangs, no boot up message nothing, not even the serial TX is active. There is nothing special on my board,just 64K EEPROM and some SPI devices.
    BTW there is an error in EXTEND-V4.FTH at line 163. It complains that id is unknown.

    Cheers from Taiwan
    Reinhardt, putting Propellors into real helicopters.

    Hi Reinhardt,
    4.1 is the current version to use as ID! replaces the " ok" in the prompt with a 3 character ID which is very useful for me when I am talking to other Props over PINGNET.
    Be aware of the defaults that I have been testing with as I use 6MHZ and 921600 baud.
    Propeller .:.:--TACHYON--:.:. Forth V4.1 DAWN 410170212.1230
    
    MODULES LOADED: 
    18C0: EXTEND.fth          Primary extensions to TACHYON+ kernel  - 170201-0430
    
    AUTORUN BOOT
    Loading cog 3 E4E2 F32     
    *** ROMS ***
    0,848 VGA32x15  
    0,352 HSUART    
    1,900 F32       
    CODE:$2D06 =11014 bytes   NAME:$5DB2 =5710 bytes   DATA:$76D4 =196 bytes    =12460 bytes free    Data Stack (0)
    --------------------------------------------------------------------------------
    ( 0001 $2D06  ok )   " PBJ" ID!
    ( 0002 $2D06 PBJ )   
    

    Cog usage
    • COG 0 Tachyon
    • COG 1 Console and PINGNET
    • COG 2 Timers
    • COG 3 Loader or F32
  • rbehm wrote: »
    Hi Peter,
    I just dowloaded V4, thanks for making it public.
    It's indeed much faster. With V3 I had to set a char delay of 1ms to be able to load any file (with a 6MHz xtal). This now works without any char delay, nice.
    But I have the problem that after loading extend-v4.fth it hangs, no boot up message nothing, not even the serial TX is active. There is nothing special on my board,just 64K EEPROM and some SPI devices.
    BTW there is an error in EXTEND-V4.FTH at line 163. It complains that id is unknown.

    Cheers from Taiwan
    Reinhardt, putting Propellors into real helicopters.

    Hi Reinhardt,
    4.1 is the current version to use as ID! replaces the " ok" in the prompt with a 3 character ID which is very useful for me when I am talking to other Props over PINGNET.
    Be aware of the defaults that I have been testing with as I use 6MHZ and 921600 baud.
    Propeller .:.:--TACHYON--:.:. Forth V4.1 DAWN 410170212.1230
    
    MODULES LOADED: 
    18C0: EXTEND.fth          Primary extensions to TACHYON+ kernel  - 170201-0430
    
    AUTORUN BOOT
    Loading cog 3 E4E2 F32     
    *** ROMS ***
    0,848 VGA32x15  
    0,352 HSUART    
    1,900 F32       
    CODE:$2D06 =11014 bytes   NAME:$5DB2 =5710 bytes   DATA:$76D4 =196 bytes    =12460 bytes free    Data Stack (0)
    --------------------------------------------------------------------------------
    ( 0001 $2D06  ok )   " PBJ" ID!
    ( 0002 $2D06 PBJ )   
    

    Cog usage
    • COG 0 Tachyon
    • COG 1 Console and PINGNET
    • COG 2 Timers
    • COG 3 Loader or F32

    Thanks Peter,
    I will try it.
    Am I correct that V4 is case sensitive?
    --
    Reinhardt
  • Yes, V4 an V3 are case sensitive although I did include a feature in V3 that in case it didn't find the word in the dictionary that it would then convert to uppercase and try again. I have considered making V4 case insensitive though, or at least include the option.
  • Ok tested V4.1. After adapting to my board it works as it should.
    How about making every word lower case. Then we would not be forced to wear out the shift key.
    --
    Reinhardt
  • To me the best would be: support lower case, but don't allow ambiguities, that is, if a word once is defined WrdDef (word definition), no upper case word is allowed anymore WrDDef (write double defined)
  • ErNa wrote: »
    To me the best would be: support lower case, but don't allow ambiguities, that is, if a word once is defined WrdDef (word definition), no upper case word is allowed anymore WrDDef (write double defined)

    I'm looking at how I want to handle it now but I prefer not enforcing anything at all if I can help it. Words such as wrddef WRDDEF WrDDef should really be stored as they are but then that complicates the search routines which need to be fast. So the only real way around it is to do the fast search, if it fails then do another slower search which might be ok in interactive mode but block mode ( TACHYON ... END ) should be exact match I think.

  • Peter JakackiPeter Jakacki Posts: 6,087
    edited February 13 Vote Up0Vote Down
    Here are some low level timings comparing V4 with V3. What's interesting is that while V3 had a few fast constants in code, V4 can handle 15-bit constants easily.

    Timings are for 96MHz on both
    V4                              V3
    Push literal    833ns           833..1333..1833..2000
    DROP            833ns           666ns..1000 (>4 items)
    32-bit push     1166            3000
    DO LOOP         500ns           500ns
    FOR NEXT        333ns           333
    DUP             666ns           833
    SWAP            500ns           333
    NOP             333ns           333
    1+              333ns           333
    8<<             333ns           333
    

    So operations that do not push or pop are the same and really reflect the time it takes to fetch the instruction but pushing literals is much faster especially considering that V3 reserved some opcodes for fast constants but these varied in execution time with the fastest being 0 at 833ns if there were no more than 3 items on the stack and the slowest 2us. V4 handles all literals up to 15-bits in 833ns and uses an internal stack rather than V3's assigned external stack. Both versions still have 4 fixed locations in the cog for the top items of the stack. SWAP being a fraction slower on V4 has me a little confused since the code is identical!

    The faster data stack push and pops in V4 make a big difference in overall speed too. I/O operations are faster too as HIGH and LOW are opcodes and then there is PIN to specify a fast pin and H and L to set them high and low with 333ns execution times. So H L H will generate a 1us high pulse with a 333ns notch in the middle of it.

    I think the reason for the slightly longer SWAP in V4 is due to the one extra instruction in the doNEXT loop before it executes SWAP causing it to wait longer for the hub on the next doNEXT.


  • Peter JakackiPeter Jakacki Posts: 6,087
    edited February 14 Vote Up0Vote Down
    The WS2812 RGB LED timing is normally implemented in PASM due to the speed required. Just for a lark I thought I'd try it totally in high level code. This routine outputs to a single LED but could just as easily output a whole array of course. No PASM required!
    pub XLED ( ggrrbb pin -- )	DUP PIN L MASK SWAP 8 REV 24 FOR H SHROUT L NEXT 2DROP ;
    

  • The WS2812 RGB LED timing is normally implemented in PASM due to the speed required. Just for a lark I thought I'd try it totally in high level code. This routine outputs to a single LED but could just as easily output a whole array of course. No PASM required!
    pub XLED ( ggrrbb pin -- )	DUP PIN L MASK SWAP 8 REV 24 FOR H SHROUT L NEXT 2DROP ;
    

    Congrats, that's impressive speed. So I assume V3 will stay where it is and we should be moving to V4 going forward?

  • V4 will get all the bells and whistles which it may be missing some of at present, but it is the version that runs faster and takes up less memory overall. If that is the case why do we need to keep maintaining or developing V3, or V2 or V1 for that matter?
    V4 has internal data stacks, no hub stacks required, still has room for more cog instructions, has faster constants and literals. Pingnet fully supported. The list goes on.
  • EXTEND has been patched to work correctly with 32kB EEPROMs as part of the SAVEROM setup. This ended up wiping all the cog images for the kernel back in the first 32k since access higher than 32k simply mirrors back to the first 32k.
Sign In or Register to comment.