Chip,
I would like to buy a board with a P2 mounted on it. So that I can write assembler programs that I can upload to the obex.
HydraHacker
I talked to VonSzarvas today who is going to design a really robust base board which we'll put the P2 onto. He may have something roughed out by the end of the week.
Has that full P2 power thing been understood yet? ie. the P2 being far less dependent on actual COG workload for its total 1.8V current draw than expected, more like an FPGA. Would that also be fixed in some (potential future) respin, or stay as is? I only ask as I had a possible battery powered application for the P2 in mind...
I just did a test to determine if my limited use of '$signed' caused the problem in the modulator.
It did.
Altera's Quartus II Verilog compiler sign-extends any term within a $signed() construct, whereas the ASIC Verilog compiler only honors $signed() if all other terms on the right-hand part of the equation are also of signed type.
This caused the modulator to not work right.
I'm wondering what OnSemi will say about this Verilog issue.
I checked all my Verilog source files for use of '$signed()' where I involved another term which was not signed, which would cause errant results. I found three things:
1) The colorspace modulator is broken (as observed).
2) The ALTSN..ALTB instructions don't sign-extend S[17:09] before adding it into D.
3) The smart pin measurement modes that are supposed to count +1/-1 can only count +1/+3.
These are all simple to remedy at the Verilog level, but will require a respin to fix.
Could the XORO32 constants be changed at the same time?
Same inputs and same outputs, just different 'internal wiring'.
I just did a test to determine if my limited use of '$signed' caused the problem in the modulator.
It did.
Altera's Quartus II Verilog compiler sign-extends any term within a $signed() construct, whereas the ASIC Verilog compiler only honors $signed() if all other terms on the right-hand part of the equation are also of signed type.
This caused the modulator to not work right.
I'm wondering what OnSemi will say about this Verilog issue.
I checked all my Verilog source files for use of '$signed()' where I involved another term which was not signed, which would cause errant results. I found three things:
1) The colorspace modulator is broken (as observed).
2) The ALTSN..ALTB instructions don't sign-extend S[17:09] before adding it into D.
3) The smart pin measurement modes that are supposed to count +1/-1 can only count +1/+3.
These are all simple to remedy at the Verilog level, but will require a respin to fix.
Could the XORO32 constants be changed at the same time?
Same inputs and same outputs, just different 'internal wiring'.
We could change whatever we need to. A whole new set of files would be submitted.
Has that full P2 power thing been understood yet? ie. the P2 being far less dependent on actual COG workload for its total 1.8V current draw than expected, more like an FPGA. Would that also be fixed in some (potential future) respin, or stay as is? I only ask as I had a possible battery powered application for the P2 in mind...
I think that all clock gating was turned off in favor of performance.
If we backed off to 160MHz, I think we could reduce dynamic power consumption through clock gating that the synthesis tool would Implement, based on compiler switches.
Ok thanks Chip, I had thought from your earlier ideas it seemed like the RAMs were being accessed more frequently than they should be when COGs were not running, so maybe it could still have been some other Verilog issue. But from what you say here it seems as if the resulting power/load profile is a little bit more "as expected" based on the design choices and/or tool options.
Cheers.
... I only ask as I had a possible battery powered application for the P2 in mind...
What MHz and core count do you expect when running, and what does it need to do during idle ? Can it power off and re-boot to extend battery life ?
I've not seen a Static Iq number yet, but the Cpd fit I did on the mA/MHz values, suggested around 5mA ?
Ok thanks Chip, I had thought from your earlier ideas it seemed like the RAMs were being accessed more frequently than they should be when COGs were not running, so maybe it could still have been some other Verilog issue. But from what you say here it seems as if the resulting power/load profile is a little bit more "as expected" based on the design choices and/or tool options.
Cheers.
Wendy checked the simulations and saw the hub RAM enables togging. So, the hub RAMs were not always firing, as I suspected they might be. We just have a case here of no clock-gating.
... I only ask as I had a possible battery powered application for the P2 in mind...
What MHz and core count do you expect when running, and what does it need to do during idle ? Can it power off and re-boot to extend battery life ?
I've not seen a Static Iq number yet, but the Cpd fit I did on the mA/MHz values, suggested around 5mA ?
If you switch down to the 20 kilohertz oscillator, leakage power will exceed dynamic power. As far as performance, things are as expected. Our anticipation, though, was for all eight cogs running at a high speed. We never looked at a single Cog running at high speed.
Chip, should the same settings be used, would these issues not boil down to literally a few gates? I don't know how masks work. It seems like such a small difference!
On one hand, better power utilization is nice. But, on the other, this code and tool chain configuration just proved out nice and sweet. We all have to bang on it of course, but dang! It basically does what it says on the tin!
Which makes more sense?
Gate it, process changes, do this again, hope it all works? Doing that could put us here again. Could be lower power, a bit lower clock, working too. Doing this is like a double down.
Or...
Don't gate it, make the minor changes, do it again with a high expectation of it working great? This seems the safer bet. Could always gate one later.
The only way I know to do it (get baseband) is old school. Ignore the modulator, make the pixel clock a multiple of the colorburst, and use pixels to make a signal. Then it boils down to using the streamer more like WAITVID.
At 200mhz, we can get 14x colorburst. That is plenty. If people clock it off a colorburst crystal, the whole thing gets easier and more stable still. Make no mistake, doing that is a kludge.
Stream back porch, colorburst, border.
Stream active.
Stream front porch.
Wrap all that into a frame loop.
Or, do it grey scale for now. RGB mode will work for doing that. And doing it that way means almost no code changes to get color once these signed problems are resolved.
Forget color on the engineering chips. For people intending to use baseband displays, component, what does work might be enough to develop on. I may do that when the time comes, rather than hack an all software driver.
Yeah, it would be good to go back in time and just make these edits over 5 minutes before I sent the files out.
No kidding! Hey, you never know. Maybe On will cut us a deal of some sort?
Now we do have this mask set. Maybe enough can be sold to take some of the pain out of the respin?
They are going to be useful. What we got looks like a pretty binary thing. The stuff that works is golden.
If testing proves out, a ton of good stuff works! Maybe there are takers. A lot can be made anyway. Maybe people will when they know they can just drop new chips in when ready.
Just thinking out loud here, trying to think through what so much working as intended can mean.
... I only ask as I had a possible battery powered application for the P2 in mind...
What MHz and core count do you expect when running, and what does it need to do during idle ? Can it power off and re-boot to extend battery life ?
I've not seen a Static Iq number yet, but the Cpd fit I did on the mA/MHz values, suggested around 5mA ?
If you switch down to the 20 kilohertz oscillator, leakage power will exceed dynamic power. As far as performance, things are as expected. Our anticipation, though, was for all eight cogs running at a high speed. We never looked at a single Cog running at high speed.
Yeah there are certainly ways to deal with it such as sleeping at low frequencies etc. The application I had in mind would primarily be running user generated code so there is no fixed MHz/core count requirement known at this stage. It's just that battery life when the system is operating would not be greatly extended by having the COGs waiting on timers/pins and external events etc as perhaps expected might be the case, at least based on P1 scaling. To maximize run time on a portable device you'd need to drop the frequency down dynamically and try to manage all that in the code itself. That's probably the only way around it with a P2. Use it or lose it.
Also it would be interesting to see what various OSS tools say about it. Verilator, YoSys,...
Also some tools such as VCS are accessible via https://www.edaplayground.com Not sure if you can enable the vcs linter. And also I'm not sure that Parallax would want to put whole files up there.
Edited to add: I tried on EDA playground and couldn't get VCS or Incisive to flag this type of mistake. I also couldn't get Verilator 4 to flag it. It does seem to have a warning about compares and signed #'s, but I didn't pursue that.
Other stuff:
Do you put "`default_nettype none" at the top of your files? That can catch some typos.
Does this sign extension problem have any impact on the AUGx instructions or only the ALTx instructions?
Just the ALTx instructions which intend to add a negative value to D. It has to do with addition, where sign-extension is performed on an addend (S[17:9] in this case), but the other addend (D register) is not expressed as a signed type.
Comments
I talked to VonSzarvas today who is going to design a really robust base board which we'll put the P2 onto. He may have something roughed out by the end of the week.
That really sounds great!...please excuse the mistyped sentence, I should have previewed it first.
HydraHacker
I was really looking forward to the colorspace modulator working, though. It does IQ, PSK, AM, and FM modulation for free.
Seems like it should be (NTSC is so slow), but I don't know...
There might be other things lurking, lets have a really good time pushing these early chips and see what else we can find.
You might need to be sitting down...
That respin cost might vary, depending on if you want to try to enable clock gating, to lower the mA/MHz for single COG operations ?
I'm wondering what OnSemi will say about this Verilog issue.
Could the XORO32 constants be changed at the same time?
Same inputs and same outputs, just different 'internal wiring'.
We could change whatever we need to. A whole new set of files would be submitted.
That would be a minor add, I think.
I think that all clock gating was turned off in favor of performance.
If we backed off to 160MHz, I think we could reduce dynamic power consumption through clock gating that the synthesis tool would Implement, based on compiler switches.
The IQ modulator took care of a huge problem for making baseband video. I don't know how easy it would be to replace its function.
Cheers.
Seems like there should be a lint program of some kind to help with subtle things like this.
I've not seen a Static Iq number yet, but the Cpd fit I did on the mA/MHz values, suggested around 5mA ?
Wendy checked the simulations and saw the hub RAM enables togging. So, the hub RAMs were not always firing, as I suspected they might be. We just have a case here of no clock-gating.
She sent me the compiler log and it did Issue some warnings in the color space modulator. However, they did not touch on the problem issues.
If you switch down to the 20 kilohertz oscillator, leakage power will exceed dynamic power. As far as performance, things are as expected. Our anticipation, though, was for all eight cogs running at a high speed. We never looked at a single Cog running at high speed.
On one hand, better power utilization is nice. But, on the other, this code and tool chain configuration just proved out nice and sweet. We all have to bang on it of course, but dang! It basically does what it says on the tin!
Which makes more sense?
Gate it, process changes, do this again, hope it all works? Doing that could put us here again. Could be lower power, a bit lower clock, working too. Doing this is like a double down.
Or...
Don't gate it, make the minor changes, do it again with a high expectation of it working great? This seems the safer bet. Could always gate one later.
??
Bummer.
That said, I am quite impressed! These appear to be unfortunate, niche gaffes. So much is spot on.
NTSC is not universal. PAL composite video is also broken.
Yeah, it would be good to go back in time and just make these edits over 5 minutes before I sent the files out.
At 200mhz, we can get 14x colorburst. That is plenty. If people clock it off a colorburst crystal, the whole thing gets easier and more stable still. Make no mistake, doing that is a kludge.
Stream back porch, colorburst, border.
Stream active.
Stream front porch.
Wrap all that into a frame loop.
Or, do it grey scale for now. RGB mode will work for doing that. And doing it that way means almost no code changes to get color once these signed problems are resolved.
Forget color on the engineering chips. For people intending to use baseband displays, component, what does work might be enough to develop on. I may do that when the time comes, rather than hack an all software driver.
No kidding! Hey, you never know. Maybe On will cut us a deal of some sort?
Now we do have this mask set. Maybe enough can be sold to take some of the pain out of the respin?
They are going to be useful. What we got looks like a pretty binary thing. The stuff that works is golden.
If testing proves out, a ton of good stuff works! Maybe there are takers. A lot can be made anyway. Maybe people will when they know they can just drop new chips in when ready.
Just thinking out loud here, trying to think through what so much working as intended can mean.
Have you measured 20kHz Osc Freq/Icc yet ?
What Vcc( & Icc) can the core run down to, with 20kHz sysCLK ?
Yeah there are certainly ways to deal with it such as sleeping at low frequencies etc. The application I had in mind would primarily be running user generated code so there is no fixed MHz/core count requirement known at this stage. It's just that battery life when the system is operating would not be greatly extended by having the COGs waiting on timers/pins and external events etc as perhaps expected might be the case, at least based on P1 scaling. To maximize run time on a portable device you'd need to drop the frequency down dynamically and try to manage all that in the code itself. That's probably the only way around it with a P2. Use it or lose it.
There are. For example:
https://www.synopsys.com/verification/static-and-formal-verification/spyglass/spyglass-lint.html
Also it would be interesting to see what various OSS tools say about it. Verilator, YoSys,...
Also some tools such as VCS are accessible via https://www.edaplayground.com Not sure if you can enable the vcs linter. And also I'm not sure that Parallax would want to put whole files up there.
Edited to add: I tried on EDA playground and couldn't get VCS or Incisive to flag this type of mistake. I also couldn't get Verilator 4 to flag it. It does seem to have a warning about compares and signed #'s, but I didn't pursue that.
Other stuff:
Do you put "`default_nettype none" at the top of your files? That can catch some typos.
Just the ALTx instructions which intend to add a negative value to D. It has to do with addition, where sign-extension is performed on an addend (S[17:9] in this case), but the other addend (D register) is not expressed as a signed type.