New P2 Silicon Observations

David Betz · 2018-10-02 11:10

cgracey wrote: »

David Betz wrote: »

Does this sign extension problem have any impact on the AUGx instructions or only the ALTx instructions?

Just the ALTx instructions which intend to add a negative value to D. It has to do with addition, where sign-extension is performed on an addend (S[17:9] in this case), but the other addend (D register) is not expressed as a signed type.

That's good. I imagine that a compiler will make heavy use of AUGx instructions but maybe not so much the ALTx instructions.

Heater. · 2018-10-02 12:00

No need to feel bad about it. Intel makes the same mistakes.

Back in 1980 something when the Intel 286 was new a guy on our team discovered that if you did a MUL by an immediate value that happened to be negative you always got an incorrect result. Eventually Intel provided us, at Norther Telecom, with a 286 errata sheet, well 30 page document, under NDA that included that issue and 40 or 50 others.

Chip, following on from what KeithE said, is it possible you could show us the actual lines of Verilog that show the problem. It would be interesting to see what Verilator, Icarus, YoSys make out of it.

ErNa · 2018-10-02 12:23

we had the same with a SIEMENS peripheral to the 8080, which showed spurious errors and after we had isolated the reasons (interupts hanging, ...) we also saw a confidential note that showed us, that they were aware of the bugs and didn't communicate. That hit us severely

Rayman · 2018-10-02 13:33

I personally have no need for NTSC modulation at all. FM modulation would be fun though...
NTSC demodulation would be more interesting because then you could capture video from cheap cameras...

Heater. · 2018-10-02 13:34

Hmm...did you use the Intel 8259 interrupt controller there?

That would turn a short spike on any interrupt pin into an INT 7 instead of whatever interrupt that pin was. If you did not have code in place to catch unexpected INT 7 and check that it was spurious all kinds of confusion happened.

KeithE · 2018-10-02 15:21

Heater. wrote: »

Chip, following on from what KeithE said, is it possible you could show us the actual lines of Verilog that show the problem. It would be interesting to see what Verilator, Icarus, YoSys make out of it.

It would be interesting to ask the formal enthusiasts about it as well.

It looks like you can get a 14-day evaluation copy of this tool which include a linter:

https://www.hdlworks.com/products/companion/index.html

It doesn't look like it supports SystemVerilog though, and I believe that's going to be an issue for a few things in the Propeller.

potatohead · 2018-10-02 16:13

If anything, this gotcha should go into the linters On uses.

Chip got warnings, but not on this.

David Betz · 2018-10-02 16:16

potatohead wrote: »

If anything, this gotcha should go into the linters On uses.

Chip got warnings, but not on this.

Any chance any of the other warnings indicate a real problem?

MIchael_Michalski · 2018-10-02 16:32

jmg wrote: »

cgracey wrote: »

I checked all my Verilog source files for use of '$signed()' where I involved another term which was not signed, which would cause errant results. I found three things:

Should you just remove all uses of '$signed()', even if some uses appear to follow some rules ? Just seems a high risk item, best avoided entirely...

cgracey wrote: »

3) The smart pin measurement modes that are supposed to count +1/-1 can only count +1/+3.

Does that mean Quadrature counting is broken ? What other modes are affected ?
What about live-clear-on-read ? I think that just loads 0 or +1 ?

We can play a little game. The prop II will count coin flips with a smart pin, heads, it gives me a dollar, tails, it takes away a dollar. Im liking this already.... ;-)

garryj · 2018-10-02 16:35

cgracey wrote: »

"$signed()" worked where I used it amid other signed values. I think it is okay.

Smart pin quadrature counting is broken! The other affected modes are the INC/DEC modes. Modes that just INC and reload 0 on read are unaffected.

May not be related, but a couple of days ago I was looking for ways to make some routines more efficient and I have an interrupt-friendly polling version of WAITX where I changed this:

poll_waitx
                getct   hct2
                addct2  hct2, hctwait
.wait
                jnct2   #.wait
                ret

to this:

poll_waitx
                getct   hct2
                addct2  hct2, hctwait
.wait
    _ret_       jnct2   #.wait

and it broke. Is this a bug, or are there instances is this an instance where the _RET_ prefix cannot be used?

potatohead · 2018-10-02 16:35

David Betz wrote: »

potatohead wrote: »

If anything, this gotcha should go into the linters On uses.

Chip got warnings, but not on this.

Any chance any of the other warnings indicate a real problem?

Chip did not think so.

David Betz · 2018-10-02 16:40

potatohead wrote: »

David Betz wrote: »

potatohead wrote: »

If anything, this gotcha should go into the linters On uses.

Chip got warnings, but not on this.

Any chance any of the other warnings indicate a real problem?

Chip did not think so.

I'm not sure how many warnings he's seeing but I've seen C programs that generate hundreds or thousands of warnings and looking at each to make sure it really doesn't matter can be a pain. Sometimes people just assume that if it's a warning they don't really need to worry about it. The compiler is just being too picky. I'm not sure if the same sort of thing happens with Verilog. I know *I* get lots of warnings when I try to compile my Verilog but that undoubtedly is because I don't really know what I'm doing!

potatohead · 2018-10-02 16:41

Yes, me too. I am sure many of us experience these things.

Our test run should prove things out. Lets hope!

Seairth · 2018-10-02 17:51

garryj wrote: »
May not be related, but a couple of days ago I was looking for ways to make some routines more efficient and I have an interrupt-friendly polling version of WAITX where I changed this:
poll_waitx
                getct   hct2
                addct2  hct2, hctwait
.wait
                jnct2   #.wait
                ret
to this:
poll_waitx
                getct   hct2
                addct2  hct2, hctwait
.wait
    _ret_       jnct2   #.wait
and it broke. Is this a bug, or are there instances is this an instance where the _RET_ prefix cannot be used?

The instruction docs have this to say:

Execute <inst> always and return if no branch. If <inst> is not branching then return by popping stack[19:0] into PC.

Based on that, your code is correct. In what way did it break?

garryj · 2018-10-02 18:16

It appears that the code never executes the branch to the .wait label when the ct2 event has not yet triggered.

cgracey · 2018-10-02 18:50

garryj wrote: »

It appears that the code never executes the branch to the .wait label when the ct2 event has not yet triggered.

I will look into this today. It should work.

garryj · 2018-10-02 21:40

cgracey wrote: »

garryj wrote: »

It appears that the code never executes the branch to the .wait label when the ct2 event has not yet triggered.

I will look into this today. It should work.

In looking at it further, I think it's likely a logic problem for me, and not you :sick:
The difference in the "wait" cycles taking place between the RET and _RET_ versions may be just enough to upset some tight timing situations in other areas.

The offending code is from my "demo" for USB Bulk-Only Mass Storage devices that I'm working on. Since BOMS devices must be full-speed or faster, at 80Mhz I'm barely squeaking by in some bus timing situations, especially with IN data transfers.
In the "poll_waitx" screenshot, the first group of values is RET vs. _RET_ poll_waitx using 2-second waits from the non-interrupt driver cog and the second group is from the host cog that uses a single interrupt routine that transmits the full-speed 1ms frame counter.

The second group is "jittery", but since poll_waitx may be interrupted by the interrupt, they're consistent enough that I don't think anything can be blamed on _RET_.

cgracey · 2018-10-02 22:09

garryj wrote: »

cgracey wrote: »

garryj wrote: »

It appears that the code never executes the branch to the .wait label when the ct2 event has not yet triggered.

I will look into this today. It should work.

In looking at it further, I think it's likely a logic problem for me, and not you :sick:
The difference in the "wait" cycles taking place between the RET and _RET_ versions may be just enough to upset some tight timing situations in other areas.

The offending code is from my "demo" for USB Bulk-Only Mass Storage devices that I'm working on. Since BOMS devices must be full-speed or faster, at 80Mhz I'm barely squeaking by in some bus timing situations, especially with IN data transfers.
In the "poll_waitx" screenshot, the first group of values is RET vs. _RET_ poll_waitx using 2-second waits from the non-interrupt driver cog and the second group is from the host cog that uses a single interrupt routine that transmits the full-speed 1ms frame counter.

The second group is "jittery", but since poll_waitx may be interrupted by the interrupt, they're consistent enough that I don't think anything can be blamed on _RET_.

I will try this in a bit. Just fixing up the $signed() stuff, still.

Rayman · 2018-10-02 22:32

BOMS looks great garryj!

150 MHz lets us do 1080p@60 Hz I think.
200 MHz maybe lets us do HyperRam at 200 MHz?

ozpropdev · 2018-10-02 22:33

I just tried this code and it seems Ok

dat		org

test_code	hubset	#$ff

		call	#mywait
		drvh	#32	'we get here after 2 seconds ok
		jmp	#$

mywait		getct	pa
		addct2	pa,##80_000_000 * 2
here	_ret_	jnct2	#here

		jmp	#$

jmg · 2018-10-02 23:48

garryj wrote: »

..... Since BOMS devices must be full-speed or faster, at 80Mhz I'm barely squeaking by in some bus timing situations, especially with IN data transfers.
..

Sounds like Chip needs to get you some real P2 samples ASAP ?

Confirming USB on a P2 is going to be quite pivotal.
How tolerant do you think the USB code is, to ramping clock speed ? - will it work better at N*12MHz ?

garryj · 2018-10-03 00:29

ozpropdev wrote: »

I just tried this code and it seems Ok

Yep, I'm pretty sure now it's not a _RET_ issue. I'm thinking that at 80MHz, the resolution of poll_waitx may not be granular enough in certain situations. For example, a packet turn-around timeout is 16 to 18 USB bit periods at full-speed, which translates to 1.238us to 1.494us. The method I currently use to schedule USB transfers is not very exact when it comes to calculating when, or if, I might be in a delay that gets caught executing when the 1ms frame ISR triggers. Bad things happen very quickly if WAITX is used. Using poll_waitx keeps bad things from happening but, in the _RET_ case, there may have been just enough poll_waitx "jitter" difference from the RET variant to trigger this condition and instantly blow up SETUP transactions.

jmg wrote:

How tolerant do you think the USB code is, to ramping clock speed ? - will it work better at N*12MHz ?

I'm not enough of an engineer to tell you definitively whether it would definitely work better at N*12MHz -- that's the kind of thing I leave up to you guys

. I am pretty certain that 180MHz would provide a lot more breathing room

.

Rayman · 2018-10-03 00:39

I think Chip once said it didn't have to be multiple of 12 MHz because clock gets resynced with every frame of data...

evanh · 2018-10-03 00:40

Good point, everyone has been assuming 192 MHz would be the target for USB but you're right, 192 - 12 is 180 MHz.

jmg · 2018-10-03 01:38

cgracey wrote: »

HydraHacker wrote: »

Chip,
I would like to buy a board with a P2 mounted on it. So that I can write assembler programs that I can upload to the obex.

HydraHacker

I talked to VonSzarvas today who is going to design a really robust base board which we'll put the P2 onto. He may have something roughed out by the end of the week.

Here is my short listed suggested BOM, for 4 layer 2Oz simpler linear approach, with a focus on OnSemi parts, to keep supply lines simpler...
some 'really robust' elements included ....

Summary BOM:
Power source USB 5V / 2A chargers or similar...

LDO, PGood, Adj, Split RegIn, SS :
NCP59744MN2ADJTBG ONSemi LDO 3A 5x5mm (9 vias to GND planes)

Reverse polarity protection:
MBR230LSFT1G ON Semi DIODE SCHOTTKY 30V 2A(3A DC) SOD123L 430mV @ 2A

Thermal spread:
NHP220SFT3G ON Semi DIODE GEN PURP 200V 2A SOD123FL 200V 2A 1.05V @ 2A

Clamp/voltage tolerance:
1SMB5919BT3G ON Semi 5.6V ZENER DIODE 3W SMB (only SMA,SMB package choices, from OnSemi)

LDL212PU50R STm DFN6(3x3) 1.5A Typ, 1.7A typ graph Vdo ~350 mV @ 1.2 A 10°C/W - use for Optional, added oops protect to 20V abs Max Vin, runs saturated normally

Addit: Smaller package Zener, in 2% tolerance, 2.3W, as less bulky spike clamp.
BZD27B5V6P-M3-18 Vishay Zener 2.3W SMF DO-219AB B=2%= 5.49 5.6 5.71V (100mA)
BZD27B5V1P-M3-08 Vishay Zener 2.3W 5.1V DO-219AB B=2%=5.00 5.1 5.20V (100mA), likely OK with a Series STKY diode.

Cluso99 · 2018-10-03 15:11

Chip,
Just curious. Do you know how many transistors are in the P2?

David Betz · 2018-10-03 15:30

cgracey wrote: »

TonyB_ wrote: »

cgracey wrote: »

I just did a test to determine if my limited use of '$signed' caused the problem in the modulator.

It did.

Altera's Quartus II Verilog compiler sign-extends any term within a $signed() construct, whereas the ASIC Verilog compiler only honors $signed() if all other terms on the right-hand part of the equation are also of signed type.

This caused the modulator to not work right.

I'm wondering what OnSemi will say about this Verilog issue.

cgracey wrote: »

I checked all my Verilog source files for use of '$signed()' where I involved another term which was not signed, which would cause errant results. I found three things:

1) The colorspace modulator is broken (as observed).
2) The ALTSN..ALTB instructions don't sign-extend S[17:09] before adding it into D.
3) The smart pin measurement modes that are supposed to count +1/-1 can only count +1/+3.

These are all simple to remedy at the Verilog level, but will require a respin to fix.

Could the XORO32 constants be changed at the same time?
Same inputs and same outputs, just different 'internal wiring'.

We could change whatever we need to. A whole new set of files would be submitted.

This statement is a bit scary. I was thinking that now that we have a silicon version of P2 that would mean that no significant changes would be coming. Is that not true? Might there be instruction set changes in a respin?

Peter Jakacki · 2018-10-03 15:47

It seems that if I pull a pin low then high and then float it that instead of the high state decaying gradually it instead drops immediately to 1.78V and then decays from there taking 120us to reach 1V. So Chip what part of the I/O structure could be doing that?

TAQOZ# 0 PIN L H F  ok

This is the F word in TAQOZ:

' F - float pin
F               _ret_   dirl    pinreg

It seems if I do this to P59 where I have a 10k pullup then this strange thing happens:

TAQOZ# 59 PIN L H F  ok

It's as if "dirl" is also driving the pin low briefly.

jmg · 2018-10-03 16:14

Peter Jakacki wrote: »

It seems that if I pull a pin low then high and then float it that instead of the high state decaying gradually it instead drops immediately to 1.78V and then decays from there taking 120us to reach 1V. So Chip what part of the I/O structure could be doing that?

Do all pins do that ?
If you zoom in on the initial fall, what slew rate is that ?
Does that 1.78v vary with Vcore ? ( or not vary with 3v3 may be easier to test)
Maybe something is awry in the core-ring connects & level translators?

cgracey · 2018-10-03 16:41

Cluso99 wrote: »

Chip,
Just curious. Do you know how many transistors are in the P2?

I estimate about 33M transistors, 27M of which are in the SRAMs, with the remaining 6M being logic. Within the I/O pins there are an additional ~400k transistors.

New P2 Silicon Observations

Comments