Does anyone recall how the internal stack works?
Cluso99
Posts: 18,069
The internal stack for call/ret/push/pop is 8 deep as I recall.
What happens when the stack is exceeded? I mean, what gets lost, the first or last entry?
I have in mind a hubexec routine that will use the internal stack. But it needs to work with any other user code, and just be called using the internal stack, which is 1 level.
When my hubexec code begins to execute, I propose to pop the stack 8 levels, storing each level. Then I can use the stack for my code. When my code is complete, just before returning to the user code, I will push all 8 saved levels back onto the stack and return via the stack.
I am expecting this to just work. Anyone think otherwise???
What happens when the stack is exceeded? I mean, what gets lost, the first or last entry?
I have in mind a hubexec routine that will use the internal stack. But it needs to work with any other user code, and just be called using the internal stack, which is 1 level.
When my hubexec code begins to execute, I propose to pop the stack 8 levels, storing each level. Then I can use the stack for my code. When my code is complete, just before returning to the user code, I will push all 8 saved levels back onto the stack and return via the stack.
I am expecting this to just work. Anyone think otherwise???
Comments
1T-OTP
http://www.onsemi.com/PowerSolutions/newsItem.do?article=2519
http://www.businesswire.com/news/home/20140506005549/en/Semiconductor-Introduces-Qualified-IP-Blocks-180-nm
That should work.
Here's the Verilog for the hardware stack:
Ha! What's wrong with this picture? ...
EDIT: Piccy from http://www.onsemi.com/PowerSolutions/content.do?id=16558
They list it separately, and without any indication of expected density, speed or even power ratings.
The only other thing to consider is the stack is only 22 bits wide.
This caught me out early on.
Thanks for the info.
ozpropdev,
Yes. It does not matter for the way I will be using it - just with "CALL" instructions, and for saving and restoring what was previously on the stack, whatever that may be.
Since you have some extra die space, you could make the stack a bit deeper. Ignoring recursion, I suspect 32 elements would be ample. Ray likely wouldn't even need to manipulate the stack at that point. I still wish there was an event for when the stack was "full" and "empty", so that it could be virtualized.
Edit: even increasing the stack to 16 would be an improvement. At size 8, I think people are going to be hesitant to use the stack, particularly if they have to worry about third-party code also using it. You should not need to contemplate copying the stack every time you enter a CALLed code block "just in case". If the stack can't be easily virtualized (e.g. with the addition of empty/full events), then making it relatively large makes it much safer to use without having to worry about overflowing.
I thought about suggesting that, but I like the fact that the stack is really just a "return stack". This discourages its use as a data stack. Once you start including data, then even a 32-deep stack is nowhere near large enough.
Last night, I made the LUT RAM dual-port, so that you can update it while you stream from it. We could just use that as the hardware stack. It would save resources.
Okay, then. No change to the return stack.
It's unfortunate you can't use PTRx with RDLUT/WRLUT. That would give you PUSH/POP behavior without additional instructions. You could even provide an alias PUSHLUT/POPLUT.
Also, the term "LUT" just doesn't quite match anymore. The old "AUX" term might be better.
Last night, I made the LUT RAM dual-port, so that you can update it while you stream from it. We could just use that as the hardware stack. It would save resources.
Cryptic, but sounds rather promising.
A more conventional stack, without width and depth caveats, would be easier to use and explain, and could simplify things, as Chip says it would save resources.
There are some simple enhancements I'm making to the smart pins and then I want to get the streamer doing 1/2/4 bits under the RFBYTE/WFBYTE modes. That's it.
We need to get on with the tools and start making more progress on software fronts. I'm thinking today that I've been buried in this project for so long that the microcontroller world has likely changed in many ways I'm not aware of. I wish I had an up-to-date knowledge of what can be done with current microcontrollers and what it feels like to program them. If it feels awful, then we are in good shape. If they've actually made it pleasurable, then we may be in trouble.
They need to amend the Oxford Dictionary with the following:
"FUN" feeling experienced when programming/using a Parallax Propeller 2 Microcontroller
IMHO your is good shape Chip.
Feature freeze from the man himself! Bravo!
You are! in good shape.
Yay!!!
What little I've done lately has been micros with higher level languages (micro-python, espruino, lua, micromite basic, raspberry pi's) Those are fun but at a different level. The most pure fun I've had has been with the P2 FPGA things I've played with since the reborn P2 images started coming out. It's still understandable (barely ) and I can still wrap my pea brain around the hardware at the lowest levels. Anything else now days, is just a computer - feed in high level code and get results. I don't think about the wheels and gears spinning inside to get something done anymore.
In talking about what we could do, additionally, for stack space, using LUT, we were heading right back into Prop2 Hot territory. It's like a giant Merry-Go-Round. We need to get off the thing.
After all, the P1+ is really the original P2, so call it that. This is what we all screamed out for after P2HOT.
We know you could do the few required things in under a week! So even take a month
Put it in the P2 frame, and take the required pins from the I/O pins to check the frame.
It will then be compatible with the P2/P3/??
Maximum delay to P2/P3 might be a month - it has been 8+??? years already. It would give you breathing space to clear your head.
If it all works, then we have a P1+ ready for final silicon.
No tools need to be done and we have existing objects that will work!
It will give you a break to refresh. Then back to P2 (which will be P3 or something else).
Do the few things left to tidy the P2/P3 up, produce the FPGA images for testing.
Then we can all get on with the tools and objects.
No use in silicon if we don't have any reasonable tools and objects!
BTW All modern micros now have internal Flash.
Seriously, why on earth cannot the P2 have some Flash or OTP (OnSemi have 1T-OTP in 180nm) ???
Do we need it?
What code does it benefit?
Wouldn't an extra 512x32 LUT be better use of the die space?
Wouldn't a pair of [32-bit latch plus a data available SR latch] (for 2-way use) between each set of adjacent cogs be of more use?
I think Chip said it allows the Streamer to operate, while code updates Streamer LUT.
That's a fairly important sounding feature, as many streamer apps will not want to run/pause/run/pause
I would say the Streamer also needs simple HW handshake support, as you cannot rely on whatever is connected to be a continual-capable device.
True, but good test coverage of the FPGA image, should not be lost in the rush to 'tools and software'
For most, you can at least program the core in C, so that 'feels' much the same across all vendors.
The peripherals are where many Software types come unstuck, and many vendors struggle to have adequate DOCs and examples on their peripherals.
Some get around that with various frameworks, but that adds considerable bloat, and slows things down, so you need another ratchet up on the Code-Size and MHz settings...