@evanh said:
Nope, no chance. The 380 MHz example is one possible reason for what happens at 360 MHz - Which Ross has also listed as problematic. I've just been following this possibility through.
360Mhz was not working when I first tested a whole bunch of different clock frequencies, but is now working. Possibly SD card related (I may have been using a different SD card for the first round of testing).
The 180 MHz issue has been sidelined until either another idea comes to mind or Ross can find a shareable case example. Mainly the latter.
The next version of Catalina/Catalyst will recommend using 200Mhz, which seems to work very reliably. However, I will continue to test at 180Mhz in the hope of finding another example.
I said I would do this, but unfortunately I have not ...
I will continue to test at 180Mhz in the hope of finding another example.
However, it is very, very noticeable that since moving to using 200Mhz as my default P2 clock speed instead of 180Mhz, I have not seen any of the weird problems I used to see. None at all. However, it occurred to me today that this change also coincides fairly closely with when I got my P2 EDGE, which I have been using fairly exclusively ever since. And I think I am correct in assuming that the P2 EDGE has a Rev C P2 chip, whereas the P2 EVAL boards had RevA or Rev B P2 chips.
I can't see anything in the Propeller 2 documents which leads me to think the earlier chips were prone to Hub RAM problems that could be fixed simply by moving an instruction to a different address ... but could the Rev C chip changes have fixed this?
No. Rev C silicon was a single wire cut in the ADC front end to remove the pinB signal because it was injecting crosstalk. OnSemi wasn't happy with anything more without Parallax paying for further costs.
There were big changes from Rev A to Rev B though. So much so that Rev A is not supported and wasn't a sold product. It was only given away to testers as engineering samples.
Shuffling/adding instructions was a known, but not understood, workaround with inline Pasm in the Flex suite for some time. It was obscure and rare enough that no one was looking at the compiler for ages. There was possibly more than one bug fix to solve it. I know the one I narrowed down was far from an obvious bug.
It was never frequency sensitive though. Just alignment sensitive. And purely compile bugs, not the hardware at fault.
I've been running everything at 300Mhz recently in the hope that this would exacerbate the problem, and so it did. Same symptoms - a failure at one speed, but the same code would generally work fine at both faster and slower speeds. And the timing was so critical that inserting a single NOP instruction was often enough to make the problem disappear completely.
The problem was in my SD Card plugin for the Propeller 2, but the confusing part was that it was not the SD Card code at all, which is where I kept looking because the first sign of failure was generally a bad SD Card read. But on the Propeller 2 the same cog can also used to run the Real-Time Clock plugin when both are enabled, and that's where the problem was - not in the SD Card code, and not in the RTC code either, but in the clumsy way I put the two together when both were used (to save a cog!). It also takes a specific order and timing of requests to make it fail, and this can be dependent on both the system clock and the SD Card in use. Even the location of the program on the SD Card can affect it - you could have two copies of the same program on the SD Card and one might work and the other might not. The temperature of the SD Card also seemed to make a difference - you could be busy debugging it and then the problem just disappears.
I will test the new version a bit longer before including it in the next Catalina release. But if anyone wants a hot fix, I can provide one fairly quickly.
I think I may have once bumped into a peculiarity with the FIFO hardware that was timing sensitive. I didn't conclusively verify the behaviour though because it's very unlikely to ever be of practical concern. I only found it because I was trying to measure limits of FIFO performance. What it appeared to be was when issuing a non-blocking RDFAST then returning back to hubexec, before that RDFAST had time to fetch valid data into the FIFO, the hubexec's hidden RDFAST would then mess up and crash the program.
It would presumably also apply to any situation that involves reissuing a non-blocking RDFAST in short order. Within something like 25 sysclock ticks.
The Propeller is by its very nature a difficult beast to tame. We all attempt complicated things with it ... because it can do them. When things go right it shines as bright as a diamond, but when things go wrong it is about as transparent as a lump of coal.
One documented hardware bug is the REP branch bug: A relative branch instruction, absolute isn't affected, as the last instruction of a REP block of code gets its branch address summed with the REP's repeating relative branch.
The 180nm Process limits itself to about 350MHz unless you do some special clocking tricks which the P2 does not.
When I worked at National Semiconductor, one of the projects was a 10/100/1000 MACPhyter (Media Access Controller Physical layer).
For the Gigabit Ethernet speeds at 180nm process you actually had to use 4 oscillators that were staggered by 90 Deg each running at 250MHz to achieve 1Gig throughput. Synchronization was achieved with a ring oscillator configuration.
@"Beau Schwabe" said:
Synchronization was achieved with a ring oscillator configuration.
Oh, ring oscillators adjust by switching in varying amounts of inverters into the ring, right? I do see the advantage, it replaces the PLL (Or just the VCO in the PLL?) and provides the phase selectable clocks all in one. How is temperature compensation handled? Is it just add a lot more switched inverters? I suppose accuracy of frequency is a bendable parameter that can be traded for ring size. Ah, it's not the accuracy, it's the frequency resolution traded for ring size isn't it.
evanh - The ring configuration just keeps synchronization between the four individual oscillators with a specific R-C value that creates a 90 Deg phase shift for the target frequency. With a 180nm process the top frequency is about 350MHz for any one oscillator due to leakage in the substrate and other contributing factors. However that doesn't mean that you can't have multiple oscillators in different phases on the same substrate. You can't combine the clock outputs or you will end up where you started with the frequency limitation, but you can in effect "poll" the incoming and outgoing data between the individual clocks. It is a complex timing puzzle with fan-outs and buffer delays to meet timing requirements, but if done correctly it is effective. The PR software (Avant!) would sometimes take 4 hours or more to complete, and even then timing would not be met so after looking at a congestion map you would make a few tweaks and run the PR tool again.
@"Beau Schwabe" said:
... but you can in effect "poll" the incoming and outgoing data between the individual clocks. It is a complex timing puzzle with fan-outs and buffer delays to meet timing requirements, but if done correctly it is effective.
Congratulations! I know, it's always a great relief when you find a bug, especially a very obstinate one.
The Propeller is by its very nature a difficult beast to tame. We all attempt complicated things with it ... because it can do them. When things go right it shines as bright as a diamond, but when things go wrong it is about as transparent as a lump of coal.
Comments
360Mhz was not working when I first tested a whole bunch of different clock frequencies, but is now working. Possibly SD card related (I may have been using a different SD card for the first round of testing).
The next version of Catalina/Catalyst will recommend using 200Mhz, which seems to work very reliably. However, I will continue to test at 180Mhz in the hope of finding another example.
Ross.
I said I would do this, but unfortunately I have not ...
However, it is very, very noticeable that since moving to using 200Mhz as my default P2 clock speed instead of 180Mhz, I have not seen any of the weird problems I used to see. None at all. However, it occurred to me today that this change also coincides fairly closely with when I got my P2 EDGE, which I have been using fairly exclusively ever since. And I think I am correct in assuming that the P2 EDGE has a Rev C P2 chip, whereas the P2 EVAL boards had RevA or Rev B P2 chips.
I can't see anything in the Propeller 2 documents which leads me to think the earlier chips were prone to Hub RAM problems that could be fixed simply by moving an instruction to a different address ... but could the Rev C chip changes have fixed this?
No. Rev C silicon was a single wire cut in the ADC front end to remove the pinB signal because it was injecting crosstalk. OnSemi wasn't happy with anything more without Parallax paying for further costs.
There were big changes from Rev A to Rev B though. So much so that Rev A is not supported and wasn't a sold product. It was only given away to testers as engineering samples.
Shuffling/adding instructions was a known, but not understood, workaround with inline Pasm in the Flex suite for some time. It was obscure and rare enough that no one was looking at the compiler for ages. There was possibly more than one bug fix to solve it. I know the one I narrowed down was far from an obvious bug.
It was never frequency sensitive though. Just alignment sensitive. And purely compile bugs, not the hardware at fault.
Ah well, 'twas just a thought. I will wait for the problem to rear its ugly head again!
Ha! Finally found it!
And it was in my own code, naturally!
I've been running everything at 300Mhz recently in the hope that this would exacerbate the problem, and so it did. Same symptoms - a failure at one speed, but the same code would generally work fine at both faster and slower speeds. And the timing was so critical that inserting a single NOP instruction was often enough to make the problem disappear completely.
The problem was in my SD Card plugin for the Propeller 2, but the confusing part was that it was not the SD Card code at all, which is where I kept looking because the first sign of failure was generally a bad SD Card read. But on the Propeller 2 the same cog can also used to run the Real-Time Clock plugin when both are enabled, and that's where the problem was - not in the SD Card code, and not in the RTC code either, but in the clumsy way I put the two together when both were used (to save a cog!). It also takes a specific order and timing of requests to make it fail, and this can be dependent on both the system clock and the SD Card in use. Even the location of the program on the SD Card can affect it - you could have two copies of the same program on the SD Card and one might work and the other might not. The temperature of the SD Card also seemed to make a difference - you could be busy debugging it and then the problem just disappears.
I will test the new version a bit longer before including it in the next Catalina release. But if anyone wants a hot fix, I can provide one fairly quickly.
Ross.
I think I may have once bumped into a peculiarity with the FIFO hardware that was timing sensitive. I didn't conclusively verify the behaviour though because it's very unlikely to ever be of practical concern. I only found it because I was trying to measure limits of FIFO performance. What it appeared to be was when issuing a non-blocking RDFAST then returning back to hubexec, before that RDFAST had time to fetch valid data into the FIFO, the hubexec's hidden RDFAST would then mess up and crash the program.
It would presumably also apply to any situation that involves reissuing a non-blocking RDFAST in short order. Within something like 25 sysclock ticks.
The Propeller is by its very nature a difficult beast to tame. We all attempt complicated things with it ... because it can do them. When things go right it shines as bright as a diamond, but when things go wrong it is about as transparent as a lump of coal.
One documented hardware bug is the REP branch bug: A relative branch instruction, absolute isn't affected, as the last instruction of a REP block of code gets its branch address summed with the REP's repeating relative branch.
This is more of an FYI :
The 180nm Process limits itself to about 350MHz unless you do some special clocking tricks which the P2 does not.
When I worked at National Semiconductor, one of the projects was a 10/100/1000 MACPhyter (Media Access Controller Physical layer).
For the Gigabit Ethernet speeds at 180nm process you actually had to use 4 oscillators that were staggered by 90 Deg each running at 250MHz to achieve 1Gig throughput. Synchronization was achieved with a ring oscillator configuration.
Oh, ring oscillators adjust by switching in varying amounts of inverters into the ring, right? I do see the advantage, it replaces the PLL (Or just the VCO in the PLL?) and provides the phase selectable clocks all in one. How is temperature compensation handled? Is it just add a lot more switched inverters? I suppose accuracy of frequency is a bendable parameter that can be traded for ring size. Ah, it's not the accuracy, it's the frequency resolution traded for ring size isn't it.
evanh - The ring configuration just keeps synchronization between the four individual oscillators with a specific R-C value that creates a 90 Deg phase shift for the target frequency. With a 180nm process the top frequency is about 350MHz for any one oscillator due to leakage in the substrate and other contributing factors. However that doesn't mean that you can't have multiple oscillators in different phases on the same substrate. You can't combine the clock outputs or you will end up where you started with the frequency limitation, but you can in effect "poll" the incoming and outgoing data between the individual clocks. It is a complex timing puzzle with fan-outs and buffer delays to meet timing requirements, but if done correctly it is effective. The PR software (Avant!) would sometimes take 4 hours or more to complete, and even then timing would not be met so after looking at a congestion map you would make a few tweaks and run the PR tool again.
Hmm, I can't visualise the synchronisation.
DDR buffering will be similar I'd guess.
evanh - look up "180nm 4 stage differential ring oscillator"
Congratulations! I know, it's always a great relief when you find a bug, especially a very obstinate one.
So true.