Shop OBEX P1 Docs P2 Docs Learn Events
"Direct Inward Dial": Calling asm routings from Spin without a jump table — Parallax Forums

"Direct Inward Dial": Calling asm routings from Spin without a jump table

Dennis FerronDennis Ferron Posts: 480
edited 2008-10-05 11:35 in Propeller 1
(Disclaimer: I'm using this technique for code in my Propeller Design Contest entry (which I haven't sent off yet) - but I don't mind if anyone borrows this idea and uses it in their projects.)

I've noticed that a lot of objects (e.g. Graphics.spin, Float32.spin, etc.) pass commands from Spin to the asm executing in the cog using command codes (1, 2, 3, etc.) which are indexes into a jump table in asm. There are several disadvantages to this method of communicating between Spin and asm. First of all, it's verbose. The programmer has to list the commands in 3 different places: once in a CON section, once in the jump table, and finally in the label for the section of asm code. Second, the jump table itself takes up a lot of space. If you could eliminate the jump table from some of these library objects, you could fit more features into cogs that are currently "full" - both because the jump table gone would free up space, and because the new commands wouldn't need jump table entries! Finally, the jump table method imposes a small speed penalty, which could be optimized out if you didn't have to do a table lookup.

I call my method for eliminating the need for jump tables "Direct Inward Dial" because instead of passing command codes from Spin to asm, the Spin code simply directly passes the address to execute. Think about how the jump table works: the Spin code passes a command code, let's say it's 3. The cog running asm picks up this code and looks up entry 3 in the jump table. Let's say that the address in slot 3 is $123. The asm code then stuffs that $123 into a jmp command and executes a jmp #$123. What if the Spin code just passed in $123 instead of 3? Then the asm code would just take what was passed in ($123) and jump to it - why go to any more trouble than that?

The only difficult thing about this process is knowing what address to pass in. Luckily you can take the address of a DAT label in Spin just as easily as you can use that label in assembly. The label will resolve to a different address in Spin than in asm because in asm the label refers to cog RAM starting at 0, while in Spin it refers to the copy of that asm code as it existed somewhere in hub RAM before the cog was loaded. But - here's the important thing - even though the base addresses of the two copies of the DAT data differ, the distance between two labels remains the same in both copies! So if, in Spin, you take the address of the asm label you want to call, and subtract the address of the origin of the cog data (what you passed to cognew), then you actually get the byte offset of the label into asm. Finally, shift that result right by 2 to convert byte addresses to long address. And now you have a cog RAM location that can be jmp'ed to directly. Like this:

command := (cmd_addr - base_addr) >> 2



The really neat thing is that you don't even have to ensure your DAT section in hub RAM is long-aligned: the necessary step of shifting right 2 bits, also drops the bits which would be 1 if the address is not aligned. So, it doesn't matter whether the DAT section is aligned or not!

The Spin code I've attached demonstrates the technique and should run on your demo board. It alternates executing assembly routines cmd1_ which returns 42, and cmd2_ which returns 23. It shows "Cmd: 07" which is the address the Spin code calculated for cmd1_, and "09" which is the address the Spin calculated for cmd2_.

The attached code also illustrates the speed advantage I promised I can give you over the jump table method: when the jump table method runs, it compares the command code to 0 to see if a command is there, but I don't even do any branching on the command at all! I simply execute the command every time - when there is not a command to execute, instead of setting the command to 0, I set it to the label of the poll loop! That way, the command jumps to the top of the loop again. This ensures that the desired command gets executed nearly immediately, because it's jumped to the very instant it is set with no instructions in between. This saves only a few clock cycles, but in some commands it could mean the difference between hitting the next hubop cycle, and having to wait 17 cycles.

Edit:· Due to formatting issues I took the code out of the post and attached a spin file instead.

Post Edited (Dennis Ferron) : 10/4/2008 10:13:31 PM GMT

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-10-04 21:07
    Dennis,

    Could you please remove the trailing whitespace from your code? Your post is having to scroll horizontally, which makes it difficult to read.
    (The problem is in the line "TERMS OF USE: MIT LICENSE")

    Thanks,
    Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!

    Post Edited (Phil Pilgrim (PhiPi)) : 10/4/2008 9:14:41 PM GMT
  • hippyhippy Posts: 1,981
    edited 2008-10-04 21:31
                            rdlong  jmploc, cmdSrc          ' Get command from hub ram
                            movs    :jmp_cmd, jmploc        ' Modify jmp command to go to this location                            
    :jmp_cmd                jmp     #0                      ' Jmp to command address.   
    
    
    



    Shouldn't there be a NOP between the MOVS and JMP ?

    It probably works but by jumping to $000, and repeating again when ':jmp_cmd' has been set as it should be the first time through.

    I'd have expected it to execute the previous command requested next time through after that but I haven't studied it in depth.
  • Dennis FerronDennis Ferron Posts: 480
    edited 2008-10-04 22:12
    Oh you might be right, hippy.· I must confess I don't fully understand the issue why the nop is necessary although I do dimly recall reading about that.· Does it have to do with instruction prefetch?
  • tpw_mantpw_man Posts: 276
    edited 2008-10-04 22:41
    Yes it does have to do with the instruction prefetch. The cycle is 3 steps overlapped I think, so by the time the first command is executed, the processor logic has already loaded the instruction, source, and destination contents.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I am 1011, so be surprised!


    Advertisement sponsored by dfletch:
    Come and join us on the Propeller IRC channel for fast and easy help!
    Channel: #propeller
    Server: irc.freenode.net or freenode.net
    If you don't want to bother installing an IRC client, use Mibbit. www.mibbit.com
    tongue.gif
  • TimmooreTimmoore Posts: 1,031
    edited 2008-10-04 22:49
    I like the idea but a couple of coments - you can simplify the jmp by
    poll                    rdlong  jmploc, cmdSrc          ' Get command from hub ram
                            nop
                            jmp     jmploc                  ' Jmp to command address.   
    

    ie change the jmp to an indirect jmp via the jmploc variable rather than patching the jmp instruction after the read

    Also I am pretty sure your init_poll code doesn't work
    init_poll
                            wrlong  cmdSrc, poll    ' Set jmp location to "poll" so that last command doesn't get re-executed
    

    wrlong has the parameters the other way round and I think you write the contents of the memory location poll. I think you meant to write the address #poll to the hub memory location cmdSrc i.e command. I believe you need the following
    init_poll               mov     jmploc, #poll           ' Set jmp location to "poll" so that last command doesn't get re-executed
                            wrlong  jmploc, cmdSrc
    
    

    Once you do this the spin code does need to waitcnt(1000 + cnt) but can do a repeat loop
    ··· repeat until command == (@poll· [url=mailto:-@asm_start]-@asm_start[/url]) >> 2

    There is also a less likely but possible case that when the·cog is first started spin·will continous, if it sets the first command before the·cog got to the poll loop, the first command will not be run. You need the·above repeat util command == @poll after calling cognew to be sure that this doesn't happen.
    Fixed a couple of bugs in the code.
    Playing with this style a bit, the thing thats not very nice, is you can't give the commands friendly names using CON, so you spread through your code lots of (@cmdX - @asm_start) >> 2 and (@poll - @asm_start) >> 2 statements rather than putting all that mess in 1 place.


    Post Edited (Timmoore) : 10/5/2008 12:39:23 AM GMT
  • Dennis FerronDennis Ferron Posts: 480
    edited 2008-10-05 00:13
    Thanks guys.

    I doubt I'm the first one to think of this technique, but I do wonder why the library objects don't do this?· Is·my method an improvement or does it have its own problems?
  • Lord SteveLord Steve Posts: 206
    edited 2008-10-05 00:24
    Dennis said...
    command·:=·(cmd_addr·-·base_addr)·>>·2
    I wouldn't be so concerned about speed and counting cycles with regards to this technique.· Needing to do the line I quoted in Spin will eat any imagined savings gained by not doing the jmp table in PASM.· It could be the same case for doing the·quoted calculation in PASM:· the overhead of doing the subtract and then the shift left by two·and then jumping verses that of the actual jmp table may be a wash.· (Not sure.)

    Your method does save the space of the jmp table,·and that can be a significant portion of the COG RAM.· But I don't see this as being at all faster.

    Correct me if I'm·mistaken.· Maybe I'm missing something.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-10-05 04:18
    Lord Steve,

    The extra cost you cite can easily be saved by passing @asm_start to the cog via PAR (or just plugging it into a DAT variable before the COGNEW) and letting the ASM routine do the address computation using @cmd1, etc.

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • Dennis FerronDennis Ferron Posts: 480
    edited 2008-10-05 05:15
    Yes I'm aware that it takes as long or longer to do the pointer math in Spin than it does to do the table lookup in asm. Good catch Lord Steve but actually it's kind of the point of what I'm doing: I'm moving the work out of the asm program and into the Spin program. Note that the cog doing the math in Spin is not the same cog as the one running the asm program - so it's not zero sum; the same or more work is being done in total but a smaller portion of it is carried by the asm cog. In my particular use case, I'm doing a lot of other time-sensitive things in the asm cog, so I have an interest in dispatching the command as simply as possible with minimal latency so that I can get on with the other things. (Due to the complexity of the circuit I'm controlling, it's not easy to divide the asm program among multiple cogs, and command execution has to be synchronized with the rest of the work.) OTOH the Spin side of my program is driven by the user interface, and it can take as long as it needs.

    Edit: In theory all the math could be done at compile time, but I couldn't get CON expressions to work with DAT label addresses - I don't think the Spin compiler supports using the language that way. If you knew the locations would never change, you could just work out the addresses by hand and hard-code them as literal CONstants. You wouldn't actually want to do that because it would break if you changed any code, but it illustrates that there isn't an essential need for run-time computation, only an accidental one because there isn't a practical way to specify it in the Spin language without resorting to some math. So it would be technically possible for Spin to be extended to support this directly in the compiler without any runtime overhead, though I don't know what the language feature to support it would be called nor do I really expect to see it. Just saying, though: It could be done.

    Post Edited (Dennis Ferron) : 10/5/2008 5:24:30 AM GMT
  • hippyhippy Posts: 1,981
    edited 2008-10-05 11:35
    Dennis Ferron said...
    I doubt I'm the first one to think of this technique, but I do wonder why the library objects don't do this? Is my method an improvement or does it have its own problems?

    There's a problem in having to do "(@x-@y)>>2"; if you don't have scope access to the 'x' and
    'y' labels you are onto a hiding to nowhere. If the PASM is in a sub-object then the labels are
    out of scope to the program using the object but can be exported as CON constants. As it's a
    good idea to export 'command values' this way it's not really a problem but it does mean that
    the values can change which makes debugging a bit more difficult at times.

    Does PropTool allow such exports though ? In the wrong OS to try but I did have a case where
    an exported CON didn't have the value it should have but I failed to note what that situation was.
    I'm sure it was a case of using a forward reference which couldn't have been calculated until
    after compilation was complete.

    There is a particular case where this idea doesn't work, and that's where the code to execute
    is not within the Cog but executed by the Cog as LMM code.
Sign In or Register to comment.