Problems with wrlong
Hello folks!
I have a little issue in an assembler section....
I want to copy the value from INA to the main RAM using wrlong. This is what my code looks like:
When I execute that, the main RAM is $0000_0000. Then, I tryed this one:
That one is working pretty fine... but WHY???
And how can I fix it? It's very important, that that subrutine (makescan) takes exactly 16 cycles...
Can anyone help me?
Greetings Tectu
I have a little issue in an assembler section....
I want to copy the value from INA to the main RAM using wrlong. This is what my code looks like:
makescan wrlong INA, address
add address, #4
djnz samples, #makescan
When I execute that, the main RAM is $0000_0000. Then, I tryed this one:
makescan mov temp, INA
wrlong temp, address
add address, #4
djnz samples, #makescan
That one is working pretty fine... but WHY???
And how can I fix it? It's very important, that that subrutine (makescan) takes exactly 16 cycles...
Can anyone help me?
Greetings Tectu

Comments
On Edit: There's no way to fix this if your requirement is to really record all 32 bits of INA every 16 clocks, but if your real need is more limited it might be possible to do some kind of partial caching into Cog RAM. If you tell us more about the application there might be a workaround.
Thank you for your very fast response.
I would like to build a Logic Analyzer. I know that there is the Parallax Digital Storage Logic Analyzer and also the Propalyzer, but I'd like to do one completly on my own, for learning... Another reason is that I want more than a 4.44MHz sample rate.
Now, should I talk more about my ideas, or will you say me now something like "Don't invent the wheel new" ?
Thanks for your help!
It's possible to use more than one cog to sample I/O pins so that one sample is made each system clock cycle. You have to synchronize the cogs so that their 4 clock instruction cycles are offset, each by one. This is done with a WAITCNT instruction with each cog waiting for a different system clock value to pick up execution. One cog waits for TIME+0. Another cog waits for TIME+1, etc. The cogs would store the samples in a buffer in their own memory, then copy the saved values to a common buffer in hub memory for processing. That's how the existing Propeller logic analyzers work.
You write your cog program to do, say, 4x interleaved writes. Each instance doesn't care about the other instances, it just reads, then waitcnts until the right time to do the next 4x interleaved read. When you start the cogs, you give them each a starting CNT to wait for, chosen a judicious way into the future considering cogstart overhead, and offset by the appropriate number of clocks for each of the four cogs. When your four cogs start they each wait for their individual offset start points and go about their individual merry ways stuffing data into Hub RAM at 4 long / 16 byte intervals. What your top spin app sees is a smooth stream of data.
[COLOR="blue"]mov tmp, ina[/COLOR] ' +8 mov phsa, size ' -4 hub byte count (8n + 7) :copy7 [COLOR="blue"]wrlong tmp, phsa[/COLOR] ' +0 = transfer long between cog and hub [COLOR="orange"]mov tmp, ina[/COLOR] ' +8 sub phsa, #7 wz ' -4 :copy1 [COLOR="orange"]wrlong tmp, phsa[/COLOR] ' +0 = transfer long between cog and hub [COLOR="blue"]if_nz mov tmp, ina[/COLOR] ' +8 if_nz djnz phsa, #:copy7 ' -4To bad that I am new to Assembler and Propeller that I cannot understand it
I don't want to use things, when i don't know how they work...
So some people (Hi Phil!) came up with the sub #7/djnz approach. This exploits the fact that rdlong/wrlong ignores the lowest two address bits. Say you have a 4n address. You set the bottom two bits thereby making it 4n+3. Doing a rdlong will read data from 4n (lower two bits ignored). Then you subtract #7 and we end up with 4n-4 (the long address before that). Finally the djnz subtracts #1 and we end up with 4n-5 (or 4(n-2)+3). In the end we transferred two longs and adjusted the address by 8 (7+1) which is what we would have done anyway (4+4). It's a bit tricky the first time you see it but take the time to go through an example (on paper) and follow the steps.
Something is still missing. The loop size. For a read loop (hub to cog) that was initially handled in that the code being loaded overwrote the final djnz of the transfer loop and the code simply continued with what we just loaded.
This overwrite-drop-though didn't work too well with the opposite direction though (cog to hub). Because there isn't anything to overwrite. But don't despair. This is where shadow registers come in. Basically some of the special registers ($1F0-$1FF) have special behaviour depending whether you just read from or write to them (or both). One of them is the counter phase accumulator (phsx).
- If you read from it you get the counter value (value := counter[phsx]).
- If you write to it its shadow location and the counter are written to. (shadow[phsx] := counter[phsx] := value)
- Finally, read-modify-write performs the operation based on shadow[phsx] but updates both shadow and counter.
So what I've done is to exploit this disconnectedness of shadow and counter register. I use the shadow as loop counter (sub phsa, #7 wz/djnz phsa, #:copy7). Given a loop of e.g. 10 longs doesn't give me proper addresses though (the counter register is used as wrlong target). So what we do now is to enable the counter and let it add the base address. Let's look at these two lines: Using the 10 longs as an example we feed phsa with 40-1 (one less than byte count). Up to the point when the wrlong has collected all its operands frqa has been added twice to phsa (that's simply something you have to know or figure out). Which means we divide our base address by 2 (it's 4n so no issues here) and place it into frqa. So the first write goes to base/2 + base/2 + 39 = base + 36 + 3. 36 is the offset of element 9, that's what we want (followed by 8 down to 0).' phsx read-modify-write issue mov temp, phsx ' temp := counter[phsx] shr temp, #1 ' temp >>= 1 mov phsx, temp ' update shadow and counter ' is equivalent to mov phsx, phsx ' shadow[phsx] := counter[phsx] shr phsx, #1 ' r: operate on shadow[phsx] ' m: shadow[phsx] >>= 1 ' w: update shadow and counterI'm not a great explainer but I hope that gives you an idea why this works. Feel free to ask more questions (preferably in the Propeller sub forum).
I have just tw omore questions to your explanation:
€dit: Someone should move this thread where it should be, I chould not figure out the right board for it - sorry
4n simply stands for a number divisible by 4 without remainder and is used when the actual value isn't important, e.g. it could be 1024 or 44. A long variable is usually stored at long aligned addresses which - given its size of 4 bytes - is 4n.
As for size, this is the amount of longs you want to transfer in bytes -1. Just follow the link to the POC thread and have a look at the code listing (64 longs transferred amounts to 64*4-1 = 255). Or have a look at the [post=978929]cog storage code[/post] which uses - IIRC - 484 longs (size = 1935). HTH
I just get H(+??????????? from the serial terminal, without any newlines.
The spin (in normal case) code just reads the main RAM and sends it to the serial terminal, newline after every long.
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 VAR long data[100] byte i OBJ Display : "VGA_text" Key : "Keyboard" Debug : "Parallax Serial Terminal" PUB main Debug.start(115200) cognew(@entry, @data{0}) waitcnt(clkfreq + cnt) Debug.char(13) Debug.str(string("Begin now: ")) 'begin output now Debug.char(13) repeat i from 0 to 99 Debug.bin(data[i], 32) Debug.char(13) Debug.str(string("Finished!!")) 'output finished DAT org entry mov samples, #100 'make 100 samples mov size, #3 'we want to sent 1 long (4 bytes) - 1 = 3 to main RAM makescan mov tmp, ina ' +8 mov phsa, size ' -4 hub byte count (8n + 7) :copy7 wrlong tmp, phsa ' +0 = transfer long between cog and hub mov tmp, ina ' +8 sub phsa, #7 wz ' -4 :copy1 wrlong tmp, phsa ' +0 = transfer long between cog and hub if_nz mov tmp, ina ' +8 if_nz djnz phsa, #:copy7 ' -4 djnz samples, #makescan :here jmp #:here 'never-ever-lands tmp RES 1 size RES 1 'need to send a long to main RAM samples RES 1 'amount of samplesSorry, for whatever I am doing wrong. It's my first project with a propeller, so please, tell me what I should do better ;-)
~ Tectu
To try the routines, I tooked my old wrlong scan method, I would replace that with your 16 cycles routine to be faster.
This is what I wrote:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 VAR long waitcog 'how long the cog has to wait long samples_amount 'how many samples should be made long coghptr 'to switch to the right pointer long data[100] 'space on main RAM byte i OBJ Display : "VGA_text" Key : "Keyboard" Debug : "Parallax Serial Terminal" PUB main Debug.start(115200) samples_amount := 5 'samples that will be done *4 coghptr := 0 waitcog := 100_000 + cnt '---------------------------------- coghptr += 0 waitcog += 0 waitcnt(10_000+cnt) cognew(@entry, @waitcog) coghptr += 4 waitcog += 32 waitcnt(10_000+cnt) cognew(@entry, @waitcog) coghptr += 4 waitcog += 32 waitcnt(10_000+cnt) cognew(@entry, @waitcog) coghptr += 4 waitcog += 32 waitcnt(10_000+cnt) cognew(@entry, @waitcog) '---------------------------------- waitcnt(clkfreq + cnt) Debug.char(13) Debug.str(string("Begin now: ")) 'begin output now Debug.char(13) repeat i from 0 to samples_amount*4-1 Debug.bin(data[i], 32) Debug.char(13) Debug.str(string("Finished!!")) 'output finished DAT org 0 entry mov tmp, par rdlong wait, tmp add tmp, #4 rdlong samples, tmp add tmp, #4 rdlong addhptr, tmp add tmp, #4 mov hptr, tmp 'copy pointer to begin of main RAM add hptr, addhptr mov dira, 0 'make all pins input waitcnt wait, #0 'wait for sncy nop makescan mov tmp, ina 'write INA to tmp wrlong tmp, hptr 'write the sample to main RAM add hptr, #16 'point to the fourth-next long nop nop nop djnz samples, #makescan 'do for amount of samples here jmp #here wait RES 1 'how many cyles should you wait hptr RES 1 'pointer to main RAM samples RES 1 'amount of samples that will be done tmp RES 1 addhptr RES 1and this is what I get on serial terminal:
no idea why the first cog is not working properly...
But anyway, this is not why this thread exists.
~ Tectu
CON _clkmode = XTAL1|PLL16X _xinfreq = 5_000_000 CON lcnt = 32 ' must be an even number of longs VAR long data[lcnt] OBJ debug: "Parallax Serial Terminal" PUB main | i debug.start(115200) waitcnt(clkfreq*3 + cnt) debug.char(0) ' generate data dira[16]~~ ctra := constant(%0_00100_000 << 23 | 16) frqa := constant(%00001_0000 << 23) ' change every 16 cycles ' start sampler data[0] := constant(lcnt*4 -1) ' transfer array length cognew(@entry, @data{0}) waitcnt(clkfreq + cnt) ' display data debug.str(string(13, "Begin now: ", 13)) ' begin output now repeat i from 0 to constant(lcnt -1) debug.bin(data[i], 32) debug.char(13) debug.str(string("Finished!!")) ' output finished DAT org 0 entry movi ctra, #%0_11111_000 ' -4 LOGIC always rdlong size, par ' +0 = read byte count -1 mov frqa, par ' +8 data buffer (base address) shr frqa, #1 ' -4 base/2 long 0[2] {2 x nop} ' +0 = mov temp, ina ' +8 mov phsa, size ' -4 hub byte count (8n - 1) :copy7 wrlong temp, phsa ' +0 = transfer long between cog and hub mov temp, ina ' +8 sub phsa, #7 wz ' -4 :copy1 wrlong temp, phsa ' +0 = transfer long between cog and hub if_nz mov temp, ina ' +8 if_nz djnz phsa, #:copy7 ' -4 cogid cnt ' cogstop cnt ' sayonara ... ' initialised data and/or presets ' uninitialised data and/or temporaries size res 1 temp res 1 fit DAT org 0 ' array validation res lcnt & 1 ' lcnt must be 2n (even) fit 0 DATNote that when you go multi cog the success/failure depends on how you do it. While this is blindingly obvious, using the 16 cycle loop has certain conditions in order to get it to work interleaved. For example each of the 4 cogs would sample 4 cycles apart. This also means that the hub access is 4 cycles apart. That's where the problem lies. Cog N and cog N+1 have their respective hub window slots 2 cycles apart. So you can't use them together however much you sync them with waitcnt. This means that for a 4 cog sampler using the above sample loop you have to use either all even (2n) or odd (2n+1) numbered cogs.