PHSA: A question for Kuroneko & an idea for storing data in unused cog ram (from OBC)

Cluso99 · 2011-02-18 00:36

OBC raised some interesting ideas and it seems RossH is using a similar idea in Catalina for temporary storage. Along those lines, here are a few ideas...

I have a fast overlay program which would work (ideas from others put together, as usual on this great forum). It unravels backwards, which is fine for transferring to the cog ram. However, this will not work for moving back to hub ram.

So I got to thinking (and not sure if anyone has thought about it or done it)... And yes, thinking is dangerous at my age

So, we need a loop like this....

loop     rdlong    *-*, PHSA        ' becomes wrlong for storing back to hub (easy, just set the NR bit appropriately)
           add        loop,H0200      ' inc to next cog address
           djnz       count,#loop

My assumptions are...
1. I believe we should be able to have PHSA increment by 4 every 16 clocks.
2. I believe we should be able to have PHSA incremented after each "loop" instruction has executed.
3. I believe we can use the PHSA this way (get the value of PHSA and not the shadow ram)

Are these correct assumptions ??? Has anyone done it ???

Now, if this works, I can probably get this to run in a zero footprint mode, and possibly grab another 4 longs in shadow space as well. This would give a cool 2,000 bytes exactly. Nice for a part of a screen buffer !!!
Currently no time to check this out. If I don't write it somewhere I will most likely forget.

kuroneko · 2011-02-18 00:52

It's all been done already in your [thread=104167]overlay thread[/thread] (auto increment). So, yes to all 3 assumptions. Problem is incrementing once every 16 cycles, i.e. it eats a pin. I posted a split-interleave-writer somewhere (lonesock thread) which basically does 4 groups for 16n, 16n+4, 16n+8 and 16n+12.

Here is the thread in question [thread=118012]Quick Cog-to-Hub transfer[/thread].

Cluso99 · 2011-02-18 02:23

Thanks kuroneko. I presume you have not found a way to divide the clock by 4 without using a pin ?

Just re-looking at the counters to see how the PLL and the clocks are tied together. Any simple counter explanations anywhere (while I look)?

I have just been looking at the shadow ram and the use of the registers, etc. I am convinced I can get this side to work.

kuroneko · 2011-02-18 02:32

Cluso99 wrote: »

I presume you have not found a way to divide the clock by 4 without using a pin ?

Not many rainy days yet this year but I'm working on it

Cluso99 wrote: »

Just re-looking at the counters to see how the PLL and the clocks are tied together. Any simple counter explanations anywhere (while I look)?

Define simple. What you see is what you get. I always found the data sheet and the counter app note sufficient. Sometimes a counter is just a counter

Cluso99 · 2011-02-18 03:11

Here is my counter understanding...
So the only way we can run the counter without outputting to pin(s) is PLL internal (video mode) %00001 and logic always %11111.
Both these add FRQA to PHSA on each clock cycle.
The PLL is on the output of PHSA pin 31, so it is no help to us.
We will not have a problem of counter jitter because we ignore the bottom 2 bits when accessing hub by longs.

So the only method found so far is the staging of every 4th long, then returning 3 more times, each time offsetting 1 more long.

Is this correct?

Wish those PortB pins were inside!

kuroneko · 2011-02-18 03:25

That about sums it up so far unless someone keeps quiet about a possible solution. And you always have the slow mode using every nth hub window. Which for some use cases might just be enough (size vs speed).

As for not outputting stuff, just clear dira if you don't want to be seen

Cluso99 · 2011-02-18 03:33

As for not outputting stuff, just clear dira if you don't want to be seen

What are you saying here? Does this mean it will work but not output to the outside world. i.e. provided I am not using the eeprom, I could use this pin with DIRA as inputs??? Maybe a solution

kuroneko · 2011-02-18 03:39

Cluso99 wrote: »

What are you saying here? Does this mean it will work but not output to the outside world. i.e. provided I am not using the eeprom, I could use this pin with DIRA as inputs??? Maybe a solution

No, just an observation as you excluded NCO and DUTY as address counters. They do normally generate output but if that's not enabled you don't get any (you still get the increments though). As for using them as inputs, no way. Those really have to come from outside.

Cluso99 · 2011-02-18 03:52

Perhaps I did not word that correctly.
If I have an input pin (such as the eeprom SDA when not using the eeprom) and I output the counterA in /16 mode (but disable the DIRA pin so no effective output goes out the pin) and use that to enable the accumulation of FRQB into PHSB in CTRB, does this still work? If so, what do I get if I happen to read this pin with INA - do I see the outside world's pin value or the CTRA pins value???

kuroneko · 2011-02-18 03:59

I did understand what you meant but you can't enable/control a counter like this. They see what ina sees. And in order to affect that (ina) bit pattern you have to set the relevant bits in dira as an output unfortunately.

Cluso99 · 2011-02-18 04:06

Oh well, it was nice while the fun lasted

Oldbitcollector (Jeff) · 2011-02-18 09:53

Ah, so I've provoked the wizards... <smirk> I like where this is going...

Yes, it's pays to always keep someone around who doesn't know what can't be done.
They ask for things that people know can't be done, so they are never tried. (In this case, I'm that guy.)

...continues to watch the thread...

Sapieha · 2011-02-18 10:10

Hi Oldbitcollector.

I can onl ad to that --- Read my FOOTER

Oldbitcollector wrote: »

Ah, so I've provoked the wizards... <smirk> I like where this is going...

Yes, it's pays to always keep someone around who doesn't know what can't be done.
They ask for things that people know can't be done, so they are never tried. (In this case, I'm that guy.)

...continues to watch the thread...

Phil Pilgrim (PhiPi) · 2011-02-18 10:36

Why not put the transfer loop in the SFRs. Then do the transfer in two batches: all the even longs, then all the odd longs.

[b]phsa:[/b]      rdlong     0-0,#0-0
[b]phsb:[/b]      djnz       count,#phsa
[b]vcfg:[/b]      jmp        #done

For every eight clocks, you want phsa to increment by $408. So set frqa to $81. Then it's just a bunch of fiddling with addresses, clock counts, and pipeline stuff to figure out what to initialize phsa to before jmping to it.

-Phil

lonesock · 2011-02-18 10:41

Phil Pilgrim (PhiPi) wrote: »
Why not put the transfer loop in the SFRs. Then do the transfer in two batches: all the even longs, then all the odd longs.
[b]phsa:[/b]      rdlong     0-0,#0-0
[b]phsb:[/b]      djnz       count,#phsa
[b]vcfg:[/b]      jmp        #done
For every eight clocks, you want phsa to increment by $408. So set frqa to $81. Then it's just a bunch of fiddling with addresses, clock counts, and pipeline stuff to figure out what to initialize phsa to before jmping to it.

-Phil

I think with the rdlong embedded in there, you end up with 16 clocks per loop. And you will need to do a dummy hub op to sync up before setting PHSA and doing your jump. So 4 passes, total.

Jonathan

Phil Pilgrim (PhiPi) · 2011-02-18 10:51

Oh, right! I didn't figure in the hub delay. So, yes: four groups. Also, the hub delay renders the advantage of putting the code in the registers moot, since there's no penalty for the extra add.

-Phil

Cluso99 · 2011-02-18 12:11

Phil:
Yes, the transfer loop will be in the SFRs. Just not sure which ones yet. If at all possible, I want to reserve the first 4 to make 500 longs total (=2000 bytes). I will do a mix of LMM style here just like I did with my zero footprint debugger, but the copy loop will be SFR resident when run. I have been looking at the SFRs to see what I can use.

It's a real shame we didn't have the internal silicon for the PortB pins. I keep drooling at what we could do! (Never happy are we). Or even be able to feed the PLL output back into the second counter without using a pin - just a single gate enabled by an unused config bit. Isn't hindsight wonderful.
You know, I have never made such comments about any other micro I have used. Why? Because there were so many things it wasn't worth thinking about!!! It never ceases to amaze me just how much we can do with this prop. And it's first silicon. Just gotta say Chip... Congratulations!!!

There will still be 16 clocks between each rdlong/wrlong executed, as lonesock pointed out.

kuroneko · 2011-02-18 19:37

Phil Pilgrim (PhiPi) wrote: »

Oh, right! I didn't figure in the hub delay. So, yes: four groups. Also, the hub delay renders the advantage of putting the code in the registers moot, since there's no penalty for the extra add.

Also, while counter[phsa] is in fact incremented, instructions are always fetched from (static) shadow[phsa].

Phil Pilgrim (PhiPi) · 2011-02-18 20:16

kuroneko wrote:

Also, while counter[phsa] is in fact incremented, instructions are always fetched from (static) shadow[phsa].

How inconvenient!

-Phil

Cluso99 · 2011-02-18 21:09

But it also has its advantages too Phil. For instance, what kuroneko has done in the new thread. I use some shadow registers for code, constants and variables. But not all shadow registers are equal, and you also have to be careful you don't inadvertently start something off. But there is another 16 longs to play with :-)

Phil Pilgrim (PhiPi) · 2011-02-18 21:25

It does seem a bit strange, though. I can understand why phsx can't be the destination register in read-modify-writes while the counter is running. But instruction fetches are read-only. I should think they would've been granted source-register-access status.

-Phil

kuroneko · 2011-02-18 21:39

True, then again what seems more natural, fetching from RAM (shadow) or from a special h/w register (counter)?

Phil Pilgrim (PhiPi) · 2011-02-18 23:20

'Interesting point, although it's hard to know what documented purpose the shadow RAM even exists for, other than to flesh out the address space (and to give us the building blocks for PASM trickery -- or fits of frustration). I would really like to see a hardware schematic for this bit of Propeller arcana.

-Phil

Cluso99 · 2011-02-19 01:43

IMHO, here is what I think (could be completely wrong of course)... I think it comes from the fact that we have 512 longs of memory, and rather than disable the memory which required more gates, it was left active. Then the registers were added and only the gating necessary for accessing them in the ways thought of were done.

Now, of course, none of this was supposed to be discovered. But we are a nosey bunch and we try to squeeze every last ounce out of this great chip, so we are always looking for other things to do. I think that is the elegance of Chip's design. It just does so much. If we had say an XYZ chip, we would be spending all out time using different XYZ variations, so we would never try the oddball things.

Here is a suggestion... We want some extra wires in this chip. Parallax has this great machine that can cut and add wires etc to a die. So, we should work out a modification and get Parallax to build us a special die, just for us fanatics !!! ..... OUCH.. I can hear Ken from here in Oz - sorry Ken

I have said before, I started doing the instruction functions of a cog on a Xilinx Spartan 3A FPGA. What I found when doing the emulation is that Chip is/was so smart. He has re-used lots of gates to make this regular instruction set do amazing things with a humungus saving in gates. Like a number here, I would like to see the schematics of the counters.

I think I have a grasp of the shadow ram. The I phase fetches the instruction, and that comes from the shadow ram. The S & D phases depends on the specific registers. You cannot use the D if you are fetching some registers, as in read/modify/write, as the fetch phase of D will return the shadow ram. The R phase varies with the register as some are read only, but it always writes to the shadow ram as well.

kuroneko · 2011-02-20 05:00

Just to add something to the existing mix. It's not the fastest solution^A but it fits completely into the SPR area (above $1F0). Which means you end up with 496 longs of extra storage. It always transfers the whole block from (or to) cog memory to (or from) a hub location of your choice (4n).

The archive contains a demo, the PASM section can be used from any calling language.

^A you have to pick speed or size

Cluso99 · 2011-02-20 05:32

nice :-) Did you check the modes of the counters CTRA & CTRB? Both will be operating, one with the djnz and the other wrlong?

Even tho' it missed the hub so takes 32 clocks per long, you do get an extra 2K in hub

And n x 2KB is you have n free cogs

It is a pitty we do not have a waitlock command as we could use that to cut power when not required.

kuroneko · 2011-02-20 05:38

Cluso99 wrote: »

Did you check the modes of the counters CTRA & CTRB? Both will be operating, one with the djnz and the other wrlong?

Both are covered by adds ($1F8/$1F9) which means they are off. Even if they were switched on it wouldn't matter because dira is cleared (one could argue that an out-of-spec PLL setup may be bad).

Cluso99 · 2011-02-20 08:55

kuroneko: Of course they are both adds, and adds have the bit pattern to disable the counters (I misread the locations).

kuroneko · 2011-02-20 19:34

Here is an update including the [thread=129719]full speed version[/thread]. Same test setup, select the storage driver by uncommenting the appropriate object line (SPR is the slow version). No trickery with shadow RAM this time although I use some locations for storage. The cost is only 12 longs (484 longs available).

Update: Latest version (unreleased, counter based) offers full speed (clkfreq/4 B/s) and 486 longs available.

kuroneko · 2012-07-05 18:51

Attached is a generic driver (no counter usage) for the demo in the [post=978929]previous post[/post]. Command recovery time is slightly slower than with a counter based approach but you get 486 longs and max transfer speed.

Cluso99 · 2012-07-05 22:06

Nice work kuroneko. Just wondering what other uses this could have???

PHSA: A question for Kuroneko & an idea for storing data in unused cog ram (from OBC)

Comments