Passing multiple variables to asm

MagIO2 · 2009-04-25 19:34

There's still a lot of potential to do further optimizations for the driver!
Currently I expect it to be just a little bit faster than the SPIN lookdown. That's because it does not have the SPIN interpreter overhead. The SPIN interpreter has to read the command from HUB-RAM and check which opcode it is, jump to the right place and then do the job. But on the other hand we have synchronizations in the SPIN code which needs some time as well. Will test that.

Further improvements depend on the usage of the driver. If you say the cuelist does not change, we could think of loading the cuelist into COG-RAM as well. Loop will be a lot faster then. When we look at the size of the code there is no problem putting 128 words in COG-RAM as well. And that would definitely be faster than the SPIN lookdown which has no chance in reading the cuelist from COG RAM. If you say the cuelist only changes sometimes and is then used a lot of times we can implement that as well.
Maybe the list can be somehow prepared to allow faster search than a sequential loop. (binary search for example).

James Long · 2009-04-25 19:43

MagIO2 said...
There's still a lot of potential to do further optimizations for the driver!
Currently I expect it to be just a little bit faster than the SPIN lookdown. That's because it does not have the SPIN interpreter overhead. The SPIN interpreter has to read the command from HUB-RAM and check which opcode it is, jump to the right place and then do the job. But on the other hand we have synchronizations in the SPIN code which needs some time as well. Will test that.

Further improvements depend on the usage of the driver. If you say the cuelist does not change, we could think of loading the cuelist into COG-RAM as well. Loop will be a lot faster then. When we look at the size of the code there is no problem putting 128 words in COG-RAM as well. And that would definitely be faster than the SPIN lookdown which has no chance in reading the cuelist from COG RAM. If you say the cuelist only changes sometimes and is then used a lot of times we can implement that as well.
Maybe the list can be somehow prepared to allow faster search than a sequential loop. (binary search for example).

Mag,

List is a set of numbers loaded from a SD card, from a CSV file. Once they are read, they never change on that iteration of running. They may change on the next power up.

The values are the heart of what I'm doing. Because the values, I do not see a way to optimize it to not be a sequential loop. Their position within the array is extremely important. I do not know of any other method to do an operation like this.

James L

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

MagIO2 · 2009-04-25 20:36

I have no idea which kind of values we talk about. Note seems to be something with music?
The values are of course stored sequential. The question is how do we search in this list. For example, if the list is ordered by nature, instead of looping from beginning to end we can do a binary search which in average is much faster.

Binary search works like that:

Let's say we search for $18 in the following cue_list (of course a short form ;o) 
$01 $10 $18 $33 $48 $77 $a0
 
Binary search starts in the middle of the list. The list now has 3 elements to the left and 3 elements to the right. We know that the list is sorted. So, we simply compare $18 with $33 and find out that the given value is smaller than the current element in the list. So our value has to be somewhere on the left side. Again we take the element which is in the middle of the left part of our list. This time it is $10. After the compare we know that the value must be on the right and with the next compare we find it.
 
It's called binary search because you have two options to choose from. Either the search goes on on the left side or on the right side of the current element. For your 128 elements list you need at max 7 compares. So in average you need 3 compares for that. With sequential search the average would be 64.

Even if the list is not sorted by nature, you can sort it and store each value together with the index it would have in a naturally ordered list. Then you can use binary search again.

MagIO2 · 2009-04-25 21:11

Outcome of the performance test with 10 values:
PASM search for the first element in the list: 1504 clockcycles
PASM search for the last element in the list: 880 clockcycles

SPIN search for the first element in the list: 1136 clockcycles
SPIN search for the last element in the list: 4896 clockcycles

Really amazing. That tells us a little bit about how SPIN lookdown works. First I thought it's simply storing the values as a list. Did not see a code example of lookdown yet which did not give a list of constants. But the long runtime for searching the last element told me that there's more going on. SPIN lookdown is more powerfull, because each element in the list can be an expression. Maybe a function call as well?! That's cool. One can use it to call a function (or several functions) n times depending on the value you search for.
....
Sorry ... lost track a bit ;o)
You have to pay for this power with runtime.

One more number I figured out: the synchronisation with SPIN and the assignment of a value to note eats up 1088 clockcycles of the measured runtime. So, the PASM itself only needs ~800 and ~200 cycles.

Here's the code I used for testing:

  ' this is for checking how long storing start and endtime need
  ' of course we can subtract that from the time we measure for the different tests
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(13) )

  ' I did the same twice, because I wanted to see if reading the bytecode might cause
  ' different runtimes. It's just an indication an not a proof.
  strt:=cnt
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(13) )

  ' Here the runtime of the plain SPIN code is tested. No wait for the PASM result.
  strt:=cnt
  note:=1
  repeat while false
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(13) )
  
  ' Search for the first element in the list. Of course this needs more time than
  ' searching for the last value, because we search the list from end to beginning
  strt:=cnt
  note:=1
  repeat while note
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(" "))
  debug.dec( output )
  debug.str( string(13) )
  
  strt:=cnt
  note:=26
  repeat while note
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(" "))
  debug.dec( output )
  debug.str( string(13) )

  ' here I tested if you really can have variables in the lookdown-list and it did compile.
  ' this is not the version used for the above given results, because using a variable instead
  ' of a constant would have increased runtime.
  strt:=cnt
  output:=lookdownz( 1: 1,20,50,note,77,12,11,100,230,26)
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(" "))
  debug.dec( output )
  debug.str( string(13) )

  strt:=cnt
  output:=lookdownz( 26: 1,20,50,88,77,12,11,100,230,26)
  end:=cnt
  debug.dec( end-strt )
  debug.str( string(" "))
  debug.dec( output )
  debug.str( string(13) )

Post Edited (MagIO2) : 4/25/2009 9:25:37 PM GMT

James Long · 2009-04-25 22:32

Mag,

The values are midi note numbers and where on the output bus they occur. The list is oriented from bus output #1.

So like the following:

output 1, midi note 53
output 2, midi note 74
output 3, midi note 61

etc.

There are many ways to improve on the format of which the information is saved. Because dealing with a CSV file, it is easiest to save the information sequentially.

The programming is actually playing a midi file and turning on outputs depending on the midi music. Midi streams at 38.4K, and one output chip takes three bytes to change its outputs. With 8 chips, that is 24 bytes. That doesn't sound complicated, but the system outputs should run 8x faster than the music streams in. I do not want to take any chances on the output bus being too slow. With PASM doing the look down and i2c(already done), I think the output bus will definitely be at 300k.

We can change it anyway you think would be better. I was keeping it simple in the beginning for myself. The program is pretty large and bloated. It will require a lot of work to be final.

James L

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

James Long · 2009-04-26 06:48

Now that we got past the issues of the "Par" register, and everything is working (although not optimized), I have uploaded a video of the test board.

www.youtube.com/watch?v=OyBgHSvVgUY

This is a video (not good) of what I've been working on. This is only a programming version of what is to come.

The next version will be all self contained, and be expandable to 160 outputs (no LED's).

Tell me what you think.

James L

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

MagIO2 · 2009-04-26 10:58

Hi James,

did you use the PASM lookdown in the code used to take the video? For me it looks like you're planning a device for lightshows which is driven by MIDI - files.

How many COGs are free? I ask because when I saw the vid, a nice idea came into my mind. My understanding is that you are currently switch lights on and off by assigning a note to an output. Is that right? How about triggering subroutines with a note or maybe with a sequence of notes? So, if the device reads "da da da dum" it switchs or fades to a predefined light ambiente. Or if it reads "da di da da" a laser show starts ;o) - whatever.

My idea would be to have PASM code which is doing some complex light steering. This PASM code can be loaded dynamically (as your cuelist), so you can build up a library of light-effects. In your setup of the lightshow you have the cuelist and a list of effects. I have to search for the thread, I already posted a .SPIN which allows you to test such PASM-programs and would store it as *.COG file on SD. The demo also shows how you can load the *.COG files directly to a COG. With the driver-concept we used here in the thread they can wait until the LOOKDOWN (which then will be much more than only a lookdown) tells it to do something.

Nevertheless .. you stated that the lookdown needs to be as fas as possible and lookdown is included in some kind of timing. So, for further optimization I would define 2 goals:
1. lookdown should be as fast as possible
2. but runtime should be equal for each lookdown
I would like to follow the idea of doing a binary search. As already described in a list of 128 elements it only needs 7 lookups max. And you can - of course - code it in a way that it's doing exactly 7 lookups each time (even if up to 6 are never needed), so you have a constant decode time.

What do you think?

James Long · 2009-04-26 15:27

MagIO2 said...
Hi James,

did you use the PASM lookdown in the code used to take the video? For me it looks like you're planning a device for lightshows which is driven by MIDI - files.

How many COGs are free? I ask because when I saw the vid, a nice idea came into my mind. My understanding is that you are currently switch lights on and off by assigning a note to an output. Is that right? How about triggering subroutines with a note or maybe with a sequence of notes? So, if the device reads "da da da dum" it switchs or fades to a predefined light ambiente. Or if it reads "da di da da" a laser show starts ;o) - whatever.

My idea would be to have PASM code which is doing some complex light steering. This PASM code can be loaded dynamically (as your cuelist), so you can build up a library of light-effects. In your setup of the lightshow you have the cuelist and a list of effects. I have to search for the thread, I already posted a .SPIN which allows you to test such PASM-programs and would store it as *.COG file on SD. The demo also shows how you can load the *.COG files directly to a COG. With the driver-concept we used here in the thread they can wait until the LOOKDOWN (which then will be much more than only a lookdown) tells it to do something.

Nevertheless .. you stated that the lookdown needs to be as fas as possible and lookdown is included in some kind of timing. So, for further optimization I would define 2 goals:
1. lookdown should be as fast as possible
2. but runtime should be equal for each lookdown
I would like to follow the idea of doing a binary search. As already described in a list of 128 elements it only needs 7 lookups max. And you can - of course - code it in a way that it's doing exactly 7 lookups each time (even if up to 6 are never needed), so you have a constant decode time.

What do you think?

Well, really you are not close at all. The system is actually going to control the valves of a mechanical organ (google street organ). The lights are a way for me to insure the outputs are actually working with the music. The lights are actually directly representative of the lower 16 notes of a 26 note organ scale.

I'm not sure if the lookdowns should be even, but a binary search would probably cut the time a lot.

James L

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

MagIO2 · 2009-04-26 18:04

Ups ... the video has sound

I usually have the speakers of my laptop switched off.

How should we proceed? Do you want to give the binary search a first try by yourself? You seemed to be willing trying things by yourself.

James Long · 2009-04-26 18:28

MagIO2 said...
Ups ... the video has sound
I usually have the speakers of my laptop switched off.

How should we proceed? Do you want to give the binary search a first try by yourself? You seemed to be willing trying things by yourself.

I'm not even sure I know how to begin on a binary search, mainly because I'm not sure all the commands in PASM. I didn't even know there was a cmp command until you said there was, and I have a manual. I actually looked for a cmp command, and missed it.

I'll have to look at the commands and see if I can figure out how to tell higher or lower than value, if I can do that, I probably could figure out the search.

I like trying things for myself, but do not think I wrote the object myself. It is mostly a hack of Mike Green (got to love Mike's objects), some Andy( don't know Ariba's last name), some Tom Dimock. Oh heck, there is a bunch of other peoples stuff in there all hacked together. I would say the LED lookdown is about 2% mine, and about another 0.5% of the code is mine. The rest is really "borrowed". So I have 2.5% of the total investment, but 100% of frustration to get all of that to work together.

Let me look over the PASM commands, and see if I can figure a way to search binarily (is that a word??), and I'll get back to you.

James L

Just as a side note, I've been playing with the Propeller since it came out, and it took someone who just recently picked it up to teach me how to use the PASM part (not that the others couldn't have, I just wasn't ready till now)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

James Long · 2009-04-26 19:08

Mag,

It looks as if I could start in the middle of the array and use cmp wz, wc .

Then I would need to divide the remainder (upper or lower) remaining bytes in half and go to the center of those and do it again.

Lets see, we have actually 128 bytes, so 64 would be about half, 32 would be a quarter, 16 would be an eighth, 8 would be a sixteenth, 4 would be a thritysecond, 2 would be a sixthfourth, one.....we know what that is.

Now moving those amounts in the array would be a challenge, because it could be add or sub. The great thing we could start off with 127, then divide by 2, then by 2 again, then by two, so one and so forth until we are left with one. That would only work with an even amount, but we are using an even amount.

Very interesting so far. I'll have to give it a go.

James L

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

MagIO2 · 2009-04-26 20:17

Yep, WZ and WC together are your friends to find out if a value is bigger, lower or equal to another value. Internally cmp is nothing else but a substraction. I think in propeller those two instructions even have the same instruction opcode. Only other default flag settings.
cmp x, y
simply subtracts y from x. If WC is set, then it had to borrow a bit, which means that y is bigger than x. If WC is cleared x is bigger OR equal to y.

For the binary search:
First we do some initialisation. The list_ptr should directly point to the middle of the array. The binery search is perfect for (2 to the power of x-1) number of elements. Of course in general it also works with other lists as well, but with easiest possible integer arithmetic it's best. Then we really have a middle of the array. If we don't find it in the list of 127 elements, we can simply do one additional compare. (Or is it maybe sure that note will be found in the list? Then we can return 127)

Let's have a look at a list with 7 elements
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

7 is the middle, our stepwidth would be 4
(7-4) 3 is the middle between 0 and 6, (7+4) 11 is the middle between 8 and 14, next step is 2
(3-2) 1 is the middle between 0 and 2 .... next step 1
You know how to divide by two in PASM? Shifting maybe?

1/2 is zero in integer arithmetics, so if the stepwidth is zero, we can stop searching.

James Long · 2009-04-27 03:13

MagIO2 said...
Yep, WZ and WC together are your friends to find out if a value is bigger, lower or equal to another value. Internally cmp is nothing else but a substraction. I think in propeller those two instructions even have the same instruction opcode. Only other default flag settings.
cmp x, y
simply subtracts y from x. If WC is set, then it had to borrow a bit, which means that y is bigger than x. If WC is cleared x is bigger OR equal to y.

For the binary search:
First we do some initialisation. The list_ptr should directly point to the middle of the array. The binery search is perfect for (2 to the power of x-1) number of elements. Of course in general it also works with other lists as well, but with easiest possible integer arithmetic it's best. Then we really have a middle of the array. If we don't find it in the list of 127 elements, we can simply do one additional compare. (Or is it maybe sure that note will be found in the list? Then we can return 127)

Let's have a look at a list with 7 elements
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

7 is the middle, our stepwidth would be 4
(7-4) 3 is the middle between 0 and 6, (7+4) 11 is the middle between 8 and 14, next step is 2
(3-2) 1 is the middle between 0 and 2 .... next step 1
You know how to divide by two in PASM? Shifting maybe?

1/2 is zero in integer arithmetics, so if the stepwidth is zero, we can stop searching.

I've been thinking of all this, and I still need to index the items in the array. I'm having trouble getting my head around that right now.

Also, while you are contemplating that (which I'm sure you already have an idea that you didn't share), I have another question that I haven't asked.

Is it possible to pass a result from PASM to another cog running PASM. I figure it is, just by using it's parameter address, but I thought I would ask.

James L

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

kuroneko · 2009-04-27 03:50

MagIO2 said...
Yep, WZ and WC together are your friends to find out if a value is bigger, lower or equal to another value. Internally cmp is nothing else but a substraction. I think in propeller those two instructions even have the same instruction opcode. Only other default flag settings.
cmp x, y
simply subtracts y from x. If WC is set, then it had to borrow a bit, which means that y is bigger than x. If WC is cleared x is bigger OR equal to y.

Oh dear [noparse];)[/noparse] Yes, cmp and sub are the same thing, the difference being that the former implies NR and the latter WR.

As for your WC interpretation ... if you don't specify anything (neither WC nor WZ) then the flags are not affected by this instruction. Specifying WC will cause the carry flag to be updated according to the result of the operation. I'm sure you meant to express the right thing. i.e. "If the carry flag is set, then it had to borrow a bit, which means that y is bigger than x. If the carry flag is cleared x is bigger OR equal to y."

cmp x, y        ' effectively a NOP
cmp x, y wc     ' do the comparison and update the flags

MagIO2 · 2009-04-27 06:27

Ok ... first iteration of help:

1. Initialize list_loc with list_ptr + 64
2. Initialize a step variable with 64
3. Initialize a return variable wit list_ptr + $ff ($ff now means that we did not find the value in the list)
now comes the loop
4. reat note_val from list_loc
5. compare the note_val with list, setting both flags WZ and WC
6. if equal copy list_loc to a return variable
7. divide step by 2 setting the zero-flag
8. depending on carry flag you have to add or subtract steps to/from list_loc
9. depending on zero flag you jump to loop (4.)
10. as last step we compare note_val with element 0 in the list only WZ is needed
11. if equal you copy list_ptr to return variable
12. subtract list_ptr from return variable
13. write return variable to output_ptr

This description is close to code, in most cases you only have to find the correct instruction.
Of course you could leave the loop as soon as you found the value, but that means that you have different runtimes depending on the position of the value in the list. Here we have a constant runtime which might make timing in the code using this driver easier.

James Long · 2009-04-27 07:06

Well, that is an interesting version.

I'm not sure of 11 and 12, I'll have to write that out for it to make sense.

James L

Do I ever sleep???!!!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer

Lil Brother SMT Assembly Services

MagIO2 · 2009-04-27 07:33

About the parameter passing question you had:
read this thread http://forums.parallax.com/showthread.php?p=803097

Why should you sleep when things are so interesting? I had 4 hours last night ;o) Working on a nice little RAM interface which only needs 3 port pins.

Passing multiple variables to asm

Comments