Problems with wrlong
Tectu
Posts: 22
Hello folks!
I have a little issue in an assembler section....
I want to copy the value from INA to the main RAM using wrlong. This is what my code looks like:
When I execute that, the main RAM is $0000_0000. Then, I tryed this one:
That one is working pretty fine... but WHY???
And how can I fix it? It's very important, that that subrutine (makescan) takes exactly 16 cycles...
Can anyone help me?
Greetings Tectu
I have a little issue in an assembler section....
I want to copy the value from INA to the main RAM using wrlong. This is what my code looks like:
makescan wrlong INA, address add address, #4 djnz samples, #makescan
When I execute that, the main RAM is $0000_0000. Then, I tryed this one:
makescan mov temp, INA wrlong temp, address add address, #4 djnz samples, #makescan
That one is working pretty fine... but WHY???
And how can I fix it? It's very important, that that subrutine (makescan) takes exactly 16 cycles...
Can anyone help me?
Greetings Tectu
Comments
On Edit: There's no way to fix this if your requirement is to really record all 32 bits of INA every 16 clocks, but if your real need is more limited it might be possible to do some kind of partial caching into Cog RAM. If you tell us more about the application there might be a workaround.
Thank you for your very fast response.
I would like to build a Logic Analyzer. I know that there is the Parallax Digital Storage Logic Analyzer and also the Propalyzer, but I'd like to do one completly on my own, for learning... Another reason is that I want more than a 4.44MHz sample rate.
Now, should I talk more about my ideas, or will you say me now something like "Don't invent the wheel new" ?
Thanks for your help!
It's possible to use more than one cog to sample I/O pins so that one sample is made each system clock cycle. You have to synchronize the cogs so that their 4 clock instruction cycles are offset, each by one. This is done with a WAITCNT instruction with each cog waiting for a different system clock value to pick up execution. One cog waits for TIME+0. Another cog waits for TIME+1, etc. The cogs would store the samples in a buffer in their own memory, then copy the saved values to a common buffer in hub memory for processing. That's how the existing Propeller logic analyzers work.
You write your cog program to do, say, 4x interleaved writes. Each instance doesn't care about the other instances, it just reads, then waitcnts until the right time to do the next 4x interleaved read. When you start the cogs, you give them each a starting CNT to wait for, chosen a judicious way into the future considering cogstart overhead, and offset by the appropriate number of clocks for each of the four cogs. When your four cogs start they each wait for their individual offset start points and go about their individual merry ways stuffing data into Hub RAM at 4 long / 16 byte intervals. What your top spin app sees is a smooth stream of data.
To bad that I am new to Assembler and Propeller that I cannot understand it
I don't want to use things, when i don't know how they work...
So some people (Hi Phil!) came up with the sub #7/djnz approach. This exploits the fact that rdlong/wrlong ignores the lowest two address bits. Say you have a 4n address. You set the bottom two bits thereby making it 4n+3. Doing a rdlong will read data from 4n (lower two bits ignored). Then you subtract #7 and we end up with 4n-4 (the long address before that). Finally the djnz subtracts #1 and we end up with 4n-5 (or 4(n-2)+3). In the end we transferred two longs and adjusted the address by 8 (7+1) which is what we would have done anyway (4+4). It's a bit tricky the first time you see it but take the time to go through an example (on paper) and follow the steps.
Something is still missing. The loop size. For a read loop (hub to cog) that was initially handled in that the code being loaded overwrote the final djnz of the transfer loop and the code simply continued with what we just loaded.
This overwrite-drop-though didn't work too well with the opposite direction though (cog to hub). Because there isn't anything to overwrite. But don't despair. This is where shadow registers come in. Basically some of the special registers ($1F0-$1FF) have special behaviour depending whether you just read from or write to them (or both). One of them is the counter phase accumulator (phsx).
- If you read from it you get the counter value (value := counter[phsx]).
- If you write to it its shadow location and the counter are written to. (shadow[phsx] := counter[phsx] := value)
- Finally, read-modify-write performs the operation based on shadow[phsx] but updates both shadow and counter.
So what I've done is to exploit this disconnectedness of shadow and counter register. I use the shadow as loop counter (sub phsa, #7 wz/djnz phsa, #:copy7). Given a loop of e.g. 10 longs doesn't give me proper addresses though (the counter register is used as wrlong target). So what we do now is to enable the counter and let it add the base address. Let's look at these two lines: Using the 10 longs as an example we feed phsa with 40-1 (one less than byte count). Up to the point when the wrlong has collected all its operands frqa has been added twice to phsa (that's simply something you have to know or figure out). Which means we divide our base address by 2 (it's 4n so no issues here) and place it into frqa. So the first write goes to base/2 + base/2 + 39 = base + 36 + 3. 36 is the offset of element 9, that's what we want (followed by 8 down to 0).I'm not a great explainer but I hope that gives you an idea why this works. Feel free to ask more questions (preferably in the Propeller sub forum).
I have just tw omore questions to your explanation:
€dit: Someone should move this thread where it should be, I chould not figure out the right board for it - sorry
4n simply stands for a number divisible by 4 without remainder and is used when the actual value isn't important, e.g. it could be 1024 or 44. A long variable is usually stored at long aligned addresses which - given its size of 4 bytes - is 4n.
As for size, this is the amount of longs you want to transfer in bytes -1. Just follow the link to the POC thread and have a look at the code listing (64 longs transferred amounts to 64*4-1 = 255). Or have a look at the [post=978929]cog storage code[/post] which uses - IIRC - 484 longs (size = 1935). HTH
I just get H(+??????????? from the serial terminal, without any newlines.
The spin (in normal case) code just reads the main RAM and sends it to the serial terminal, newline after every long.
Sorry, for whatever I am doing wrong. It's my first project with a propeller, so please, tell me what I should do better ;-)
~ Tectu
To try the routines, I tooked my old wrlong scan method, I would replace that with your 16 cycles routine to be faster.
This is what I wrote:
and this is what I get on serial terminal:
no idea why the first cog is not working properly...
But anyway, this is not why this thread exists.
~ Tectu