help understanding spin and assembler memory access

Cncjerry · 2012-06-07 18:18

I am having a problem understanding how sharing memory between spin and ASM works, maybe someone can point me in the right direction.

The reason I am trying the prop chip is that LCD writes, encoder reads and switch reads (with debounce) are inherently slow. I don't care if my LCD lags the UI that consists of a set of switches, encoders and joysticks as long as the LCD catches up, I am fine. This is for a radio frequency generator that ran on a large PIC chip, was converted to arduino and now to the prop. The chip, a DDS, is really only a special function DAC.

So if I have this application architected correctly, I'll have a main routine and I assume it has to be written in Spin. I would like the main routine to do initialization, housekeeping and kick-off the other cogs. I would put multiple assembler programs in DAT sections and then cognew or coginit to get them running with most of them running all the time. This I imderstand.

But is there a way to share more memory other than through the PAR register? So if I have a number of variables as defined in the main spin program, or just a user defined block of memory, can I easily modify them from assembler (running in different cogs) on the fly? Ideally, the PAR register would just point to the beginning of a shared memory space and the cogs would go from there. Is that possible?

I have about 10 variables that need to be modified across 6 or more cogs.

The main loop example:

0) The first cog is the main initialization cog. It will initialize the LCD, DDS chip, switches, etc. then kick-off the other cogs.

1) cog one is running an optical rotary encoder for fine tuning. I want it to add/subtract (modify) the base frequency depending on a tuning rate from memory. So this cog needs acess to one 48 bit word for the frequency and one byte for the freq change increments.

2) As the base frequency is changing, a tuning word of 64 bits is generated with only the low order 48bits being modified. So I was thinking about a cog that looks at the frequency set in (1) and does the calculation from a 48 bit frequency word to a 64 bit tuning word. So it needs access to the 48 bit word from (1) above and a 64 bit word for output. The output can be 48 bits.

3) Another cog is looking at the tuning word and writing it to the DDS chip over 5 wires not unlike SPI. I am considering locking the tuning word, if possible, when it is read. If the tuning word changes, this cog writes the new tuning word to the DDS chip. The DDS chip can be clocked very fast so that the frequency changes can be made at more than 100,000 changes per second. So this cog just needs access to the tuning word calc'ed in (2).

4) another cog will look at buttons for the menu system. It will signal the main cog to save/restore memory, vary the phase and amplitude of the output, etc.

5) cog to ADC the joystick L/R functions.

6) cog to ADC the joystick U/D functions.

If I write it all in Spin, then I am pretty sure this all will work across the cogs and I have tested most of the shared memory functions. But I think the interpreter running from memory is a little weak and I really want this to be responsive.

So if you see problems with sharing as above, or can point me at models, I would appreciate it.

Sorry for the long post. I would hate to go down this road if I can't get the performance boost I want. The alternative is to just take all the old PIC code, rework it for the 32bit pics, crank the PIC clock up and go from there. I love the concept of the prop chip and this app lends itself to parallel processing since the 48 bit math based on the frequency word and write to the DDS chip can all be somewhat asynchronous. I'll gladly trade-off some CNC work if someone wants to jump in here to help me architect this code.

Thanks.

Jerry

average joe · 2012-06-07 18:28

Not much time to post, but you should be able to do it. The best way to share memory is to point to it and pass the address in the par register. Something like this:

PUB start_ram : err_
' Initialise the Drac Ram driver. No actual changes to ram as the read/write routines handle this
  command := "I"
  cog := 1 + cognew(@tbp2_start, @command)
  if cog == 0
    err_ := $FF                 ' error = no cog
  else
    repeat while command        ' driver cog sets =0 when done
    err_ := errx                ' driver cog sets =0 if no error, else xx = error code

CON
'' Modified code from Cluso's triblade
'' commands to move blocks of data to the ILI9325 touchscreen display
' DoCmd(command_, hub_address, ram_address, block_length)
' I - initialise     
' S - Move data from hub to ram
' T - Move data from ram to hub
' U - Move data from ram to display
' V - Hub to display
' W - not working - writecom in pasm
' E - convert from .raw RGB to two byte ILI format RRRRRGGG_GGG_BBBBB
' F - convert from .bmp BGR format to two byte ILI format
' X - merge icon and background based on a mask
' Y - Change 137 output Returns P0-P20 and P22 in HiZ. Pass hubaddrs
' Z - Set 161 pins. Returns in group 1
VAR

' communication params(5) between cog driver code - only "command" and "errx" are modified by the driver
   long  command, hubaddrs, ramaddrs, blocklen, errx, cog ' rendezvous between spin and assembly (can be used cog to cog)
'        command  = A to Z etc =0 when operation completed by cog
'        hubaddrs = hub address for data buffer
'        ramaddrs = ram address for data
'        blocklen = ram buffer length for data transfer
'        errx     = returns =0 (false=good), else <>0 (true & error code)
'        cog      = cog no of driver (set by spin start routine)
   

DAT
'' +-----------------------------------------------------------------------------------------------+
'' | Touchblade 161 Ram Driver (with grateful acknowlegements to Cluso and Average Joe)            |
'' +-----------------------------------------------------------------------------------------------+
                        org     0
tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                      '  +-- These instructions are overwritten as variables after start
comptr                  mov     comptr, par     ' -|  hub pointer to command                
hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
errptr                  add     ramptr, #8      '  |  hub pointer to error status           
cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
hubaddr                 add     lenptr, #12     '  |  hub address                           
ramaddr                 mov     errptr, par     '  |  ram address                           
len                     add     errptr, #16     '  |  length                                
err                     nop                     ' -+  error status returned (=0=false=good) 

                        rdlong  cmd, comptr     wz      ' command ?
              if_z      jmp     #pause                  ' not yet
' decode command
                        cmp     cmd, #"S"       wz      ' hub to ram
              if_z      jmp     #pasmhubtoram           
                        cmp     cmd, #"T"       wz      ' ram to hub
              if_z      jmp     #pasmramtohub
                        cmp     cmd, #"U"       wz      ' ram to display
              if_z      jmp     #pasmramtodisplay
                        cmp     cmd, #"V"       wz      ' hub to display
              if_z      jmp     #pasmhubtodisplay           
                        cmp     cmd, #"E"       wz      ' convert 3 byte .raw format to 2 byte .ili format - hub to hub
              if_z      jmp     #rawtoiliformat
                        cmp     cmd, #"F"       wz      ' convert 3 byte .bmp format BGR to 2 byte ili format (same as E but order reversed)
              if_z      jmp     #bmptoiliformat              
 '                       cmp     cmd, #"W"       wz      ' lcdwritecom in pasm, not working
 '             if_z      jmp     #pasmlcdwritecom
                        cmp     cmd, #"X"       wz      ' merge icon and background based on a mask
              if_z      jmp     #mergeicons
                        cmp     cmd, #"Y"       wz      ' change the 137 output
              if_z      jmp     #changegroup
                        cmp     cmd, #"Z"       wz      ' set the 161 counters
              if_z      jmp     #set161          
                        cmp     cmd, #"I"       wz      ' init
              if_z      jmp     #init     
                        mov     err, cmd                ' error = cmd (unknown command)
                        jmp     #done
' ----------------- common routines -------------------------------------

get_values              rdlong  hubaddr, hubptr         ' get hub address
                        rdlong  ramaddr, ramptr         ' get ram address
                        rdlong  len, lenptr             ' get length
                        mov     err, #5                 ' err=5

Pay attention to the VAR section, and the GetValues in the DAT section

Duane Degn · 2012-06-07 18:43

The Prop is perfect for this type of project.

Most of the serial objects use rx and tx buffers in hub RAM while the bit bashing driver is running in PASM in it's own cog. It's very common to have PASM cogs reading and writing to hub RAM. There are lots of examples of this (Average Joe's code being one).

I think your being very conservative with your cog allocations. You'll be able to have a single cog to multiple tasks that you presently assign to their own cog.

Cluso99 · 2012-06-07 21:49

The prop will be a perfect solution for you...

Cncjerry wrote: »

I am having a problem understanding how sharing memory between spin and ASM works, maybe someone can point me in the right direction.

The reason I am trying the prop chip is that LCD writes, encoder reads and switch reads (with debounce) are inherently slow. I don't care if my LCD lags the UI that consists of a set of switches, encoders and joysticks as long as the LCD catches up, I am fine. This is for a radio frequency generator that ran on a large PIC chip, was converted to arduino and now to the prop. The chip, a DDS, is really only a special function DAC.

So if I have this application architected correctly, I'll have a main routine and I assume it has to be written in Spin. Normally it is in spin because the main routine is setup to be not time sensitive, but it can be in pasm (or C, etc) I would like the main routine to do initialization, housekeeping and kick-off the other cogs. I would put multiple assembler programs in DAT sections (best to keep each cog DAT code in a separate file called an object) and then cognew or coginit to get them running with most of them running all the time. This I imderstand. Exactly.

But is there a way to share more memory other than through the PAR register? So if I have a number of variables as defined in the main spin program, or just a user defined block of memory, can I easily modify them from assembler (running in different cogs) on the fly? Ideally, the PAR register would just point to the beginning of a shared memory space and the cogs would go from there. Is that possible? This is the main purpose of the PAR register... to point to the start of a block of hub ram. The best way in your cog is to save this pointer into a register in cog ram (just a cog memory location). If you are doing a lot of work on a few hub locations, then save these pointers as PAR and add "+4" for longs, "+2" for words and "+1" for bytes. Remember you will always read into cog ending up in a 32bit long. Cog memory is addressed as longs while hub memory is addressed as bytes!

I have about 10 variables that need to be modified across 6 or more cogs.

The main loop example:

0) The first cog is the main initialization cog. It will initialize the LCD, DDS chip, switches, etc. then kick-off the other cogs. If it has no more work to do, you can even shut this cog down, or keep it running to send out some debugging info.

1) cog one is running an optical rotary encoder for fine tuning. I want it to add/subtract (modify) the base frequency depending on a tuning rate from memory. So this cog needs acess to one 48 bit word for the frequency and one byte for the freq change increments. As you can only read a long (32bits) at a time, you will just need to be careful that a cog cannot change one part in between accesses by another cog. Ask when you get to this point and post your code.

2) As the base frequency is changing, a tuning word of 64 bits is generated with only the low order 48bits being modified. So I was thinking about a cog that looks at the frequency set in (1) and does the calculation from a 48 bit frequency word to a 64 bit tuning word. So it needs access to the 48 bit word from (1) above and a 64 bit word for output. The output can be 48 bits.

3) Another cog is looking at the tuning word and writing it to the DDS chip over 5 wires not unlike SPI. I am considering locking the tuning word, if possible, when it is read. If the tuning word changes, this cog writes the new tuning word to the DDS chip. The DDS chip can be clocked very fast so that the frequency changes can be made at more than 100,000 changes per second. So this cog just needs access to the tuning word calc'ed in (2).

4) another cog will look at buttons for the menu system. It will signal the main cog to save/restore memory, vary the phase and amplitude of the output, etc.

5) cog to ADC the joystick L/R functions.

6) cog to ADC the joystick U/D functions.

At least cogs 5 & 6 can be one combined cog. It may even be possible for cog 4 to be included as well.

If I write it all in Spin, then I am pretty sure this all will work across the cogs and I have tested most of the shared memory functions. But I think the interpreter running from memory is a little weak and I really want this to be responsive. Spin is maybe 100 times slower than pasm but it realy depends on what you are doing. Best to try and get it running in spin first, and then replace the spin cogs with pasm cogs for the ones needing faster execution.

So if you see problems with sharing as above, or can point me at models, I would appreciate it. There is an example posted. IIRC there is already a Wii Joystick controller code posted which is likely to do what you require for cogs 4,5 & 6.

Sorry for the long post. I would hate to go down this road if I can't get the performance boost I want. The alternative is to just take all the old PIC code, rework it for the 32bit pics, crank the PIC clock up and go from there. I love the concept of the prop chip and this app lends itself to parallel processing since the 48 bit math based on the frequency word and write to the DDS chip can all be somewhat asynchronous. I'll gladly trade-off some CNC work if someone wants to jump in here to help me architect this code.

Just remember, you have 8 * 80MHz 32bit RISC cores running and no interrupts to worry about. In most cases, if it's running on an old PIC you should have no trouble on a Prop.

Thanks.

Jerry

help understanding spin and assembler memory access

Comments