Double the Cog RAM options

tonyp12 · 2014-04-06 09:48

Starting a new thread on something I mention in other ones.
Going from 2Kb of cog ram to 3.8Kb to double the available space for instructions/variables.

Increasing the bits from 32 to 36 is a possibility, but probably best left for P3

So the best solution I can think of for now is Bank Switching.
Prop-Tool will have two windows, so you can see the code in each bank at the same time.

The -wc flag- in a jmpret/jmp instruction is what sets the bank that is gone be active, so no extra registers, no wasted clocks or cog space.

jmp# can be done between banks with no care as the compiler handles when to insert the wc.

With Call(jmpret) as wc can not be determined at the time of compile, it would default no-wc so it would always jump back to bank1,
So make bank2 only have subroutines that are called from bank1 main routine, I like that for the 2 window PropTool anyway it's a nice structured view.
The bank switch should happen between the 3rd and 4th clock, just before a jmpret writes its return address to the ret location.

The Special Registers are shared so only one set of those memory locations.
Probably best to also share location 495 (give it a name) so the programs in each bank can share a long as when main calls a subroutine in bank2 and needs to pass along a variable..
Shadow registers will also be shared but with the D side only options, passing along variables can only be the 'add to this variable' situations.

Fitting in the ram needed for a second set of 495 longs in a new die is not a problem, with 16cogs P1+ it would add 30.9Kb to find space for.

Heater. · 2014-04-06 09:57

"Bank switching" Ahhggg, nooooo....

OK. How does code in one bank access data in the other bank.

tonyp12 · 2014-04-06 10:11

>access data in the other bank.
It can not, there have to be some type of compromise or we will be stuck with 496longs for 10 more years
Only one long (location 495) are shared between banks and also the shadow registers could be used.

But both banks write/reads to the same shared Special Registers and is than not what cog code does mostly anyway?
If you for some reason need to pass along larger arrays, use hub as a buffer.

If you also use the wz flag, 4 banks is possible with 1980 longs available for code but need more die space of course.

Heater. · 2014-04-06 10:55

Who knows what COG code does. Could be a floating point library, may be an FFT, may never touch any I/O.

Not being able to pass parameters and get results when calling subroutines between banks seems unworkable. Passing data through HUB would do it but is slow.

If you really need more PASM that fits in a COG add an LMM loop to your code and put the excess PASM in HUB. That's what I did for a lot of the lesser used opcode handling in the Z80 emulator.

I'm not sure we are "stuck with 496 longs". With all the RAM now available C compilers can generate nice fast LMM code that is executed from HUB and have 496 registers to play with in the processors. Most compiler writers would jump for joy at having so many registers!

tonyp12 · 2014-04-06 11:03

>Could be a floating point library, may be an FFT, may never touch any I/O. Not being able to pass parameters.

You have one r/w long that you can pass back and forth, 6 more r/w longs using shared FRQA to VSCL locations if you don't plan to use counters and video in this cog.

You simple have to structure your program right, or keep it to one bank and ignore the second one as the extra silicon does not cost money.

Use this feature for debugging, a beta version of the code is in bank2 for example.

jmg · 2014-04-06 14:25

tonyp12 wrote: »

Starting a new thread on something I mention in other ones.
Going from 2Kb of cog ram to 3.8Kb to double the available space for instructions/variables.

This broad idea has merit, especially for a P2 COG where the Logic is much larger than the RAM.
It makes rather less sense on a P1 COG, where the RAM is already larger than the Logic.

On a P2, giving each Task more 'elbow room' would certainly help.
To keep the 9 bit opcode reach, and focus on Task code, one way to manage this extra memory would be to have the lower 50% as shared data, all tasks can RMW this.
The upper 50% is then Task-selected from 4 50% blocks. Available code area is increased by 5/2, over the 4 tasks.
Indirect opcodes can read other-task's code areas, allowing that memory to be Arrays/buffers on < 4 Task cases.

Someone wanting a 'single task' programming model with more code, could place subroutines in the other task-code-memory and then enable those routines as needed, either via task-slot mapping, or via semaphores.

The P2 Logic is quite a bit larger than the COG ram, so once mathops are removed into a shared pool, there will be room to almost swap-in the extra 3 x 50% blocks this needs.

tonyp12 · 2014-04-06 14:36

On a 4cog P2, extra 1 or 3 banks per cog is nothing in die space.
And pretty sure you can use the AUX to pass large arrays of variables between banks if needed.

hubexe is not the same as 100% pure pasm as it can not self-modify it's own code I take, you have to treat the code as it was in flash and not in ram though it is.
Your code needs a few jmp/call anyway and if you can switch between banks without extra registers or a single extra clock do so, what do you have to loose to include this?

Yes Bank Switching is frown on, but any mcu that had it would not been better off if they did not include it at all as what is the alternative.

jmg · 2014-04-06 14:49

tonyp12 wrote: »

On a 4cog P2, extra 1 or 3 banks per cog is nothing in die space.
And pretty sure you can use the AUX to pass large arrays of variables between banks if needed.

You still need to watch total areas, (nothing is free) but the ideas are quite similar, and the threshold of what I'll call Task-mapped memory, can be anything.
I suggested 50%, but your 100% means there is no common data area, but 75%, or any other slice is possible, to give some shared data area.

dMajo · 2014-04-06 15:00

I've seen now this thread, this is my idea http://forums.parallax.com/showthread.php/155089-Ease-of-use-P2-vs-P1-variant?p=1257058&viewfull=1#post1257058
I've linked it to not double-post

jmg · 2014-04-06 15:03

moved from other thread.

dMajo wrote: »

If you go to 4 cog P2 perhaps it is worthwhile to extend the cog ram. You can have 2048 longs one bank of 512 dedicated to each task, no register remapping, each task starts execution from register 0 of its bank. If a task is not used then its bank is wasted.

If a task is not used then its bank is wasted is rather wasteful, but need not be.
if you allow indirect addressing to access the banks, then it is not wasted as data - and someone wanting to use a single-task model can put subroutine-code in other task areas, and use task-maps or flags to control.
Some slicing point under 100% would make sense to allow a shared data area.

tonyp12 · 2014-04-06 15:30

>I suggested 50%, but your 100% means there is no common data area.
I suggested the upper 17 is shared, the 16 special registers and then one extra long for bank communications.
But make it 4 longs for it, If there is Bank Switching I want to make sure it adds as many extra possible instructions as possible to be worth it.
Aux or Hub should handle the 5-32 longs that could be needed for some subroutines.

But if die space is tight and 3 x 96% will not fit, then 3 x 50% is better than 1 x 96%

Bill Henning · 2014-04-06 15:40

$000-$0FF x 4, one per task
$100-$1E1 - shared between tasks

Easy, makes sense.

jmg wrote: »

You still need to watch total areas, (nothing is free) but the ideas are quite similar, and the threshold of what I'll call Task-mapped memory, can be anything.
I suggested 50%, but your 100% means there is no common data area, but 75%, or any other slice is possible, to give some shared data area.

Double the Cog RAM options

Comments