Shop OBEX P1 Docs P2 Docs Learn Events
PHSA: A question for Kuroneko & an idea for storing data in unused cog ram (from OBC) — Parallax Forums

PHSA: A question for Kuroneko & an idea for storing data in unused cog ram (from OBC)

Cluso99Cluso99 Posts: 18,069
edited 2012-07-05 22:06 in Propeller 1
OBC raised some interesting ideas and it seems RossH is using a similar idea in Catalina for temporary storage. Along those lines, here are a few ideas...

I have a fast overlay program which would work (ideas from others put together, as usual on this great forum). It unravels backwards, which is fine for transferring to the cog ram. However, this will not work for moving back to hub ram.

So I got to thinking (and not sure if anyone has thought about it or done it)... And yes, thinking is dangerous at my age :)

So, we need a loop like this....
loop     rdlong    *-*, PHSA        ' becomes wrlong for storing back to hub (easy, just set the NR bit appropriately)
           add        loop,H0200      ' inc to next cog address
           djnz       count,#loop

My assumptions are...
1. I believe we should be able to have PHSA increment by 4 every 16 clocks.
2. I believe we should be able to have PHSA incremented after each "loop" instruction has executed.
3. I believe we can use the PHSA this way (get the value of PHSA and not the shadow ram)

Are these correct assumptions ??? Has anyone done it ???

Now, if this works, I can probably get this to run in a zero footprint mode, and possibly grab another 4 longs in shadow space as well. This would give a cool 2,000 bytes exactly. Nice for a part of a screen buffer !!!
Currently no time to check this out. If I don't write it somewhere I will most likely forget.
«1

Comments

  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 00:52
    It's all been done already in your [thread=104167]overlay thread[/thread] (auto increment). So, yes to all 3 assumptions. Problem is incrementing once every 16 cycles, i.e. it eats a pin. I posted a split-interleave-writer somewhere (lonesock thread) which basically does 4 groups for 16n, 16n+4, 16n+8 and 16n+12.

    Here is the thread in question [thread=118012]Quick Cog-to-Hub transfer[/thread].
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 02:23
    Thanks kuroneko. I presume you have not found a way to divide the clock by 4 without using a pin ?

    Just re-looking at the counters to see how the PLL and the clocks are tied together. Any simple counter explanations anywhere (while I look)?

    I have just been looking at the shadow ram and the use of the registers, etc. I am convinced I can get this side to work.
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 02:32
    Cluso99 wrote: »
    I presume you have not found a way to divide the clock by 4 without using a pin ?
    Not many rainy days yet this year but I'm working on it :)
    Cluso99 wrote: »
    Just re-looking at the counters to see how the PLL and the clocks are tied together. Any simple counter explanations anywhere (while I look)?
    Define simple. What you see is what you get. I always found the data sheet and the counter app note sufficient. Sometimes a counter is just a counter :)
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 03:11
    Here is my counter understanding...
    So the only way we can run the counter without outputting to pin(s) is PLL internal (video mode) %00001 and logic always %11111.
    Both these add FRQA to PHSA on each clock cycle.
    The PLL is on the output of PHSA pin 31, so it is no help to us.
    We will not have a problem of counter jitter because we ignore the bottom 2 bits when accessing hub by longs.

    So the only method found so far is the staging of every 4th long, then returning 3 more times, each time offsetting 1 more long.

    Is this correct?

    Wish those PortB pins were inside!
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 03:25
    That about sums it up so far unless someone keeps quiet about a possible solution. And you always have the slow mode using every nth hub window. Which for some use cases might just be enough (size vs speed).

    As for not outputting stuff, just clear dira if you don't want to be seen :)
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 03:33
    As for not outputting stuff, just clear dira if you don't want to be seen :smile:

    What are you saying here? Does this mean it will work but not output to the outside world. i.e. provided I am not using the eeprom, I could use this pin with DIRA as inputs??? Maybe a solution :)
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 03:39
    Cluso99 wrote: »
    What are you saying here? Does this mean it will work but not output to the outside world. i.e. provided I am not using the eeprom, I could use this pin with DIRA as inputs??? Maybe a solution :)
    No, just an observation as you excluded NCO and DUTY as address counters. They do normally generate output but if that's not enabled you don't get any (you still get the increments though). As for using them as inputs, no way. Those really have to come from outside.
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 03:52
    Perhaps I did not word that correctly.
    If I have an input pin (such as the eeprom SDA when not using the eeprom) and I output the counterA in /16 mode (but disable the DIRA pin so no effective output goes out the pin) and use that to enable the accumulation of FRQB into PHSB in CTRB, does this still work? If so, what do I get if I happen to read this pin with INA - do I see the outside world's pin value or the CTRA pins value???
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 03:59
    I did understand what you meant but you can't enable/control a counter like this. They see what ina sees. And in order to affect that (ina) bit pattern you have to set the relevant bits in dira as an output unfortunately.
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 04:06
    Oh well, it was nice while the fun lasted :)
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2011-02-18 09:53
    Ah, so I've provoked the wizards... <smirk> I like where this is going...

    Yes, it's pays to always keep someone around who doesn't know what can't be done.
    They ask for things that people know can't be done, so they are never tried. (In this case, I'm that guy.)

    ...continues to watch the thread...
  • SapiehaSapieha Posts: 2,964
    edited 2011-02-18 10:10
    Hi Oldbitcollector.

    I can onl ad to that --- Read my FOOTER
    Ah, so I've provoked the wizards... <smirk> I like where this is going...

    Yes, it's pays to always keep someone around who doesn't know what can't be done.
    They ask for things that people know can't be done, so they are never tried. (In this case, I'm that guy.)

    ...continues to watch the thread...
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-02-18 10:36
    Why not put the transfer loop in the SFRs. Then do the transfer in two batches: all the even longs, then all the odd longs.
    [b]phsa:[/b]      rdlong     0-0,#0-0
    [b]phsb:[/b]      djnz       count,#phsa
    [b]vcfg:[/b]      jmp        #done
    

    For every eight clocks, you want phsa to increment by $408. So set frqa to $81. Then it's just a bunch of fiddling with addresses, clock counts, and pipeline stuff to figure out what to initialize phsa to before jmping to it.

    -Phil
  • lonesocklonesock Posts: 917
    edited 2011-02-18 10:41
    Why not put the transfer loop in the SFRs. Then do the transfer in two batches: all the even longs, then all the odd longs.
    [b]phsa:[/b]      rdlong     0-0,#0-0
    [b]phsb:[/b]      djnz       count,#phsa
    [b]vcfg:[/b]      jmp        #done
    

    For every eight clocks, you want phsa to increment by $408. So set frqa to $81. Then it's just a bunch of fiddling with addresses, clock counts, and pipeline stuff to figure out what to initialize phsa to before jmping to it.

    -Phil
    I think with the rdlong embedded in there, you end up with 16 clocks per loop. And you will need to do a dummy hub op to sync up before setting PHSA and doing your jump. So 4 passes, total.

    Jonathan
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-02-18 10:51
    Oh, right! I didn't figure in the hub delay. So, yes: four groups. Also, the hub delay renders the advantage of putting the code in the registers moot, since there's no penalty for the extra add.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 12:11
    Phil:
    Yes, the transfer loop will be in the SFRs. Just not sure which ones yet. If at all possible, I want to reserve the first 4 to make 500 longs total (=2000 bytes). I will do a mix of LMM style here just like I did with my zero footprint debugger, but the copy loop will be SFR resident when run. I have been looking at the SFRs to see what I can use.

    It's a real shame we didn't have the internal silicon for the PortB pins. I keep drooling at what we could do! (Never happy are we). Or even be able to feed the PLL output back into the second counter without using a pin - just a single gate enabled by an unused config bit. Isn't hindsight wonderful.
    You know, I have never made such comments about any other micro I have used. Why? Because there were so many things it wasn't worth thinking about!!! It never ceases to amaze me just how much we can do with this prop. And it's first silicon. Just gotta say Chip... Congratulations!!!

    There will still be 16 clocks between each rdlong/wrlong executed, as lonesock pointed out.
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 19:37
    Oh, right! I didn't figure in the hub delay. So, yes: four groups. Also, the hub delay renders the advantage of putting the code in the registers moot, since there's no penalty for the extra add.
    Also, while counter[phsa] is in fact incremented, instructions are always fetched from (static) shadow[phsa].
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-02-18 20:16
    kuroneko wrote:
    Also, while counter[phsa] is in fact incremented, instructions are always fetched from (static) shadow[phsa].
    How inconvenient!

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 21:09
    But it also has its advantages too Phil. For instance, what kuroneko has done in the new thread. I use some shadow registers for code, constants and variables. But not all shadow registers are equal, and you also have to be careful you don't inadvertently start something off. But there is another 16 longs to play with :-)
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-02-18 21:25
    It does seem a bit strange, though. I can understand why phsx can't be the destination register in read-modify-writes while the counter is running. But instruction fetches are read-only. I should think they would've been granted source-register-access status.

    -Phil
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 21:39
    True, then again what seems more natural, fetching from RAM (shadow) or from a special h/w register (counter)?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-02-18 23:20
    'Interesting point, although it's hard to know what documented purpose the shadow RAM even exists for, other than to flesh out the address space (and to give us the building blocks for PASM trickery -- or fits of frustration). I would really like to see a hardware schematic for this bit of Propeller arcana.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-19 01:43
    IMHO, here is what I think (could be completely wrong of course)... I think it comes from the fact that we have 512 longs of memory, and rather than disable the memory which required more gates, it was left active. Then the registers were added and only the gating necessary for accessing them in the ways thought of were done.

    Now, of course, none of this was supposed to be discovered. But we are a nosey bunch and we try to squeeze every last ounce out of this great chip, so we are always looking for other things to do. I think that is the elegance of Chip's design. It just does so much. If we had say an XYZ chip, we would be spending all out time using different XYZ variations, so we would never try the oddball things.

    Here is a suggestion... We want some extra wires in this chip. Parallax has this great machine that can cut and add wires etc to a die. So, we should work out a modification and get Parallax to build us a special die, just for us fanatics !!! ..... OUCH.. I can hear Ken from here in Oz - sorry Ken :(

    I have said before, I started doing the instruction functions of a cog on a Xilinx Spartan 3A FPGA. What I found when doing the emulation is that Chip is/was so smart. He has re-used lots of gates to make this regular instruction set do amazing things with a humungus saving in gates. Like a number here, I would like to see the schematics of the counters.

    I think I have a grasp of the shadow ram. The I phase fetches the instruction, and that comes from the shadow ram. The S & D phases depends on the specific registers. You cannot use the D if you are fetching some registers, as in read/modify/write, as the fetch phase of D will return the shadow ram. The R phase varies with the register as some are read only, but it always writes to the shadow ram as well.
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-20 05:00
    Just to add something to the existing mix. It's not the fastest solutionA but it fits completely into the SPR area (above $1F0). Which means you end up with 496 longs of extra storage. It always transfers the whole block from (or to) cog memory to (or from) a hub location of your choice (4n).

    The archive contains a demo, the PASM section can be used from any calling language.

    A you have to pick speed or size
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-20 05:32
    nice :-) Did you check the modes of the counters CTRA & CTRB? Both will be operating, one with the djnz and the other wrlong?

    Even tho' it missed the hub so takes 32 clocks per long, you do get an extra 2K in hub :) And n x 2KB is you have n free cogs :)

    It is a pitty we do not have a waitlock command as we could use that to cut power when not required.
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-20 05:38
    Cluso99 wrote: »
    Did you check the modes of the counters CTRA & CTRB? Both will be operating, one with the djnz and the other wrlong?
    Both are covered by adds ($1F8/$1F9) which means they are off. Even if they were switched on it wouldn't matter because dira is cleared (one could argue that an out-of-spec PLL setup may be bad).
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-20 08:55
    kuroneko: Of course they are both adds, and adds have the bit pattern to disable the counters (I misread the locations).
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-20 19:34
    Here is an update including the [thread=129719]full speed version[/thread]. Same test setup, select the storage driver by uncommenting the appropriate object line (SPR is the slow version). No trickery with shadow RAM this time although I use some locations for storage. The cost is only 12 longs (484 longs available).

    Update: Latest version (unreleased, counter based) offers full speed (clkfreq/4 B/s) and 486 longs available.
  • kuronekokuroneko Posts: 3,623
    edited 2012-07-05 18:51
    Attached is a generic driver (no counter usage) for the demo in the [post=978929]previous post[/post]. Command recovery time is slightly slower than with a counter based approach but you get 486 longs and max transfer speed.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-07-05 22:06
    Nice work kuroneko. Just wondering what other uses this could have???
Sign In or Register to comment.