Does anyone recall how the internal stack works?

Cluso99 · 2016-04-19 06:48

The internal stack for call/ret/push/pop is 8 deep as I recall.

What happens when the stack is exceeded? I mean, what gets lost, the first or last entry?

I have in mind a hubexec routine that will use the internal stack. But it needs to work with any other user code, and just be called using the internal stack, which is 1 level.
When my hubexec code begins to execute, I propose to pop the stack 8 levels, storing each level. Then I can use the stack for my code. When my code is complete, just before returning to the user code, I will push all 8 saved levels back onto the stack and return via the stack.

I am expecting this to just work. Anyone think otherwise???

Cluso99 · 2016-04-19 07:01

Just found some info from OnSemi regarding OTP etc.
1T-OTP
http://www.onsemi.com/PowerSolutions/newsItem.do?article=2519

http://www.businesswire.com/news/home/20140506005549/en/Semiconductor-Introduces-Qualified-IP-Blocks-180-nm

cgracey · 2016-04-19 07:22

Cluso99 wrote: »

The internal stack for call/ret/push/pop is 8 deep as I recall.

What happens when the stack is exceeded? I mean, what gets lost, the first or last entry?

I have in mind a hubexec routine that will use the internal stack. But it needs to work with any other user code, and just be called using the internal stack, which is 1 level.
When my hubexec code begins to execute, I propose to pop the stack 8 levels, storing each level. Then I can use the stack for my code. When my code is complete, just before returning to the user code, I will push all 8 saved levels back onto the stack and return via the stack.

I am expecting this to just work. Anyone think otherwise???

That should work.

Here's the Verilog for the hardware stack:

// stack

reg [7:0][21:0] stk;

always @(posedge clk)
if (go && (push || callr || calli || pop || ret))
	stk <=	push || callr || calli	?	{stk[6:0], push ? d[21:0] : {c, z, p}}		// push/call
					:	{stk[7], stk[7:1]};				// pop/ret

evanh · 2016-04-19 08:02

Cluso99 wrote: »

Just found some info from OnSemi regarding OTP etc.

Ha! What's wrong with this picture? ...

EDIT: Piccy from http://www.onsemi.com/PowerSolutions/content.do?id=16558

evanh · 2016-04-19 08:14

Hmm, the OTP announce was five years ago but it's still not listed in the feature matrix - http://www.onsemi.com/PowerSolutions/content.do?id=16678

They list it separately, and without any indication of expected density, speed or even power ratings.

evanh · 2016-04-19 08:24

Uh-oh, ONC18 HVT ROM is listed as 150 MHz max. Is that webpage just plain out of date?

ozpropdev · 2016-04-19 11:34

Cluso
The only other thing to consider is the stack is only 22 bits wide.
This caught me out early on.

Cluso99 · 2016-04-19 13:52

Chip,
Thanks for the info.

ozpropdev,
Yes. It does not matter for the way I will be using it - just with "CALL" instructions, and for saving and restoring what was previously on the stack, whatever that may be.

Seairth · 2016-04-19 15:27

cgracey wrote: »
Cluso99 wrote: »

The internal stack for call/ret/push/pop is 8 deep as I recall.

What happens when the stack is exceeded? I mean, what gets lost, the first or last entry?

I have in mind a hubexec routine that will use the internal stack. But it needs to work with any other user code, and just be called using the internal stack, which is 1 level.
When my hubexec code begins to execute, I propose to pop the stack 8 levels, storing each level. Then I can use the stack for my code. When my code is complete, just before returning to the user code, I will push all 8 saved levels back onto the stack and return via the stack.

I am expecting this to just work. Anyone think otherwise???

That should work.

Here's the Verilog for the hardware stack:
// stack

reg [7:0][21:0] stk;

always @(posedge clk)
if (go && (push || callr || calli || pop || ret))
	stk <=	push || callr || calli	?	{stk[6:0], push ? d[21:0] : {c, z, p}}		// push/call
					:	{stk[7], stk[7:1]};				// pop/ret

Since you have some extra die space, you could make the stack a bit deeper. Ignoring recursion, I suspect 32 elements would be ample. Ray likely wouldn't even need to manipulate the stack at that point. I still wish there was an event for when the stack was "full" and "empty", so that it could be virtualized.

Edit: even increasing the stack to 16 would be an improvement. At size 8, I think people are going to be hesitant to use the stack, particularly if they have to worry about third-party code also using it. You should not need to contemplate copying the stack every time you enter a CALLed code block "just in case". If the stack can't be easily virtualized (e.g. with the addition of empty/full events), then making it relatively large makes it much safer to use without having to worry about overflowing.

Dave Hein · 2016-04-19 15:36

Seairth wrote: »

Since you have some extra die space, you could make the stack a bit deeper. Ignoring recursion, I suspect 32 elements would be ample. Ray likely wouldn't even need to manipulate the stack at that point. I still wish there was an event for when the stack was "full" and "empty", so that it could be virtualized.

It might be good to use some of that extra die space to make the stack a full 32-bits wide also.

Seairth · 2016-04-19 15:45

Dave Hein wrote: »

Seairth wrote: »

Since you have some extra die space, you could make the stack a bit deeper. Ignoring recursion, I suspect 32 elements would be ample. Ray likely wouldn't even need to manipulate the stack at that point. I still wish there was an event for when the stack was "full" and "empty", so that it could be virtualized.

It might be good to use some of that extra die space to make the stack a full 32-bits wide also.

I thought about suggesting that, but I like the fact that the stack is really just a "return stack". This discourages its use as a data stack. Once you start including data, then even a 32-deep stack is nowhere near large enough.

cgracey · 2016-04-19 18:16

Adding lots of flops to the hardware stack takes lots of area and power. At some point, using a RAM instance would be better.

Last night, I made the LUT RAM dual-port, so that you can update it while you stream from it. We could just use that as the hardware stack. It would save resources.

Seairth · 2016-04-19 19:06

cgracey wrote: »

Adding lots of flops to the hardware stack takes lots of area and power. At some point, using a RAM instance would be better.

Last night, I made the LUT RAM dual-port, so that you can update it while you stream from it. We could just use that as the hardware stack. It would save resources.

Okay, then. No change to the return stack.

It's unfortunate you can't use PTRx with RDLUT/WRLUT. That would give you PUSH/POP behavior without additional instructions. You could even provide an alias PUSHLUT/POPLUT.

Also, the term "LUT" just doesn't quite match anymore. The old "AUX" term might be better.

Dave Hein · 2016-04-19 21:18

I apologize for suggesting a change in the stack width. I would rather see the chip produced with the current design than to speculate on any additions a this time. I have struck out my previous post.

jmg · 2016-04-19 21:23

Dave Hein wrote: »

I apologize for suggesting a change in the stack width. I would rather see the chip produced with the current design than to speculate on any additions a this time. I have struck out my previous post.

The question is valid, and Chip did then say this

Last night, I made the LUT RAM dual-port, so that you can update it while you stream from it. We could just use that as the hardware stack. It would save resources.

Cryptic, but sounds rather promising.
A more conventional stack, without width and depth caveats, would be easier to use and explain, and could simplify things, as Chip says it would save resources.

evanh · 2016-04-19 22:52

It might be time to put in new suggestions for extending Streamer features me thinks. Make use of this additional LUT investment.

cgracey · 2016-04-19 23:36

I've been thinking over the past 24 hours that this thing just needs to get done. Period.

There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.

We need to get on with the tools and start making more progress on software fronts. I'm thinking today that I've been buried in this project for so long that the microcontroller world has likely changed in many ways I'm not aware of. I wish I had an up-to-date knowledge of what can be done with current microcontrollers and what it feels like to program them. If it feels awful, then we are in good shape. If they've actually made it pleasurable, then we may be in trouble.

ozpropdev · 2016-04-19 23:45

cgracey wrote: »

I've been thinking over the past 24 hours that this thing just needs to get done. Period.

There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.

We need to get on with the tools and start making more progress on software fronts. I'm thinking today that I've been buried in this project for so long that the microcontroller world has likely changed in many ways I'm not aware of. I wish I had an up-to-date knowledge of what can be done with current microcontrollers and what it feels like to program them. If it feels awful, then we are in good shape. If they've actually made it pleasurable, then we may be in trouble.

They need to amend the Oxford Dictionary with the following:
"FUN" feeling experienced when programming/using a Parallax Propeller 2 Microcontroller
IMHO your is good shape Chip.

Seairth · 2016-04-19 23:49

cgracey wrote: »

I've been thinking over the past 24 hours that this thing just needs to get done. Period.

There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.

Feature freeze from the man himself! Bravo!

evanh · 2016-04-19 23:56

I was teasing just a teensy bit.

You are! in good shape.

mindrobots · 2016-04-19 23:57

cgracey wrote: »

I've been thinking over the past 24 hours that this thing just needs to get done. Period.

There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.

We need to get on with the tools and start making more progress on software fronts. I'm thinking today that I've been buried in this project for so long that the microcontroller world has likely changed in many ways I'm not aware of. I wish I had an up-to-date knowledge of what can be done with current microcontrollers and what it feels like to program them. If it feels awful, then we are in good shape. If they've actually made it pleasurable, then we may be in trouble.

Yay!!!

What little I've done lately has been micros with higher level languages (micro-python, espruino, lua, micromite basic, raspberry pi's) Those are fun but at a different level. The most pure fun I've had has been with the P2 FPGA things I've played with since the reborn P2 images started coming out. It's still understandable (barely

) and I can still wrap my pea brain around the hardware at the lowest levels. Anything else now days, is just a computer - feed in high level code and get results. I don't think about the wheels and gears spinning inside to get something done anymore.

cgracey · 2016-04-19 23:58

Seairth wrote: »

cgracey wrote: »

I've been thinking over the past 24 hours that this thing just needs to get done. Period.

There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.

Feature freeze from the man himself! Bravo!

In talking about what we could do, additionally, for stack space, using LUT, we were heading right back into Prop2 Hot territory. It's like a giant Merry-Go-Round. We need to get off the thing.

Cluso99 · 2016-04-20 00:18

We need a P1+ now!!!
After all, the P1+ is really the original P2, so call it that. This is what we all screamed out for after P2HOT.
We know you could do the few required things in under a week! So even take a month

Put it in the P2 frame, and take the required pins from the I/O pins to check the frame.
It will then be compatible with the P2/P3/??
Maximum delay to P2/P3 might be a month - it has been 8+??? years already. It would give you breathing space to clear your head.

If it all works, then we have a P1+ ready for final silicon.
No tools need to be done and we have existing objects that will work!

It will give you a break to refresh. Then back to P2 (which will be P3 or something else).
Do the few things left to tidy the P2/P3 up, produce the FPGA images for testing.
Then we can all get on with the tools and objects.
No use in silicon if we don't have any reasonable tools and objects!

BTW All modern micros now have internal Flash.
Seriously, why on earth cannot the P2 have some Flash or OTP (OnSemi have 1T-OTP in 180nm) ???

Cluso99 · 2016-04-20 00:29

Regarding dual port LUT...

Do we need it?
What code does it benefit?
Wouldn't an extra 512x32 LUT be better use of the die space?

Wouldn't a pair of [32-bit latch plus a data available SR latch] (for 2-way use) between each set of adjacent cogs be of more use?

jmg · 2016-04-20 00:53

Cluso99 wrote: »

Regarding dual port LUT...

Do we need it?
What code does it benefit?

I think Chip said it allows the Streamer to operate, while code updates Streamer LUT.
That's a fairly important sounding feature, as many streamer apps will not want to run/pause/run/pause

jmg · 2016-04-20 01:02

cgracey wrote: »

There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.

Sounds good, Smart pins are nearly there, and streamer does need 4 bits, which brings the others along too.
I would say the Streamer also needs simple HW handshake support, as you cannot rely on whatever is connected to be a continual-capable device.

cgracey wrote: »

We need to get on with the tools and start making more progress on software fronts.

True, but good test coverage of the FPGA image, should not be lost in the rush to 'tools and software'

cgracey wrote: »

I'm thinking today that I've been buried in this project for so long that the microcontroller world has likely changed in many ways I'm not aware of. I wish I had an up-to-date knowledge of what can be done with current microcontrollers and what it feels like to program them. If it feels awful, then we are in good shape. If they've actually made it pleasurable, then we may be in trouble.

For most, you can at least program the core in C, so that 'feels' much the same across all vendors.
The peripherals are where many Software types come unstuck, and many vendors struggle to have adequate DOCs and examples on their peripherals.
Some get around that with various frameworks, but that adds considerable bloat, and slows things down, so you need another ratchet up on the Code-Size and MHz settings...

Does anyone recall how the internal stack works?

Comments