No, the video driver outputs the virtual line number to #$F8. I left it as it is, but also added lines to wrlong this to the other address and changed the address in _lower file.
I am after several more experiments and now I closed the neoyume. Instead I want to test the hub ram in this P2. The last experiment was:
cmp lspc_curline,#4 wcz
if_ae mov ptra,##$7c000
in lspc cog. What I expected is: it will fill the 4 lines of sprites at the start of the screen, then the rest of this will be sent to the far away outer space and I will see repeating 4 lines of sprites. This was not the case. In the lower part of the screen these sprites started to disappear, the lower the line, the more they disappear, all of this blinky and noisy.
Here's my recent code for testing hubRAM reliability at and near PLL-self-limiting frequency. It just so happens to read and write 100% of hubRAM with verification.
Report is a tad terse but the sequence is write then read all of hubRAM at target frequency. Then, if there is an error, repeat the read at 10 MHz followed by a fresh write at 10 MHz and read back at target freq. Each of those results is reported on a single line of terminal.
A key-press is then waited for before incrementing target frequency and repeating above.
@pik33 said:
Add several nops between if_c jmp and wrfast and the picture starts blinking again.
Did you try four NOPs? Is this same as zero NOPs?
These two options have same egg beater slice difference between RDLONG and WRFAST.
No, 4 nopx <> 0 nops. With 4 nops only every 3rd sprite is displayed at the start screen (the simplest of them) With 0 nops 18 gradient columns are visible. As if something is overwriting the sprite buffer if the loop time is too long.
This "every 3rd column" effect is also present with the untouched program. Something overwrites hub ram locations "in between". Time to find an electronic shop and replace this P2.
There is a simpler fix.
Add a nop before rdlong ma_mtmp1,ptrb[2] wc and it works.
It seems the ptrb in this p2 cog needs some more time to settle after adding something to it. Then rdlong reads the wrong, and somewhat random location. Then the code writes to nobody knows where.
@pik33 said:
There is a simpler fix.
Add a nop before rdlong ma_mtmp1,ptrb[2] wc and it works.
It seems the ptrb in this p2 cog needs some more time to settle after adding something to it. Then rdlong reads the wrong, and somewhat random location. Then the code writes to nobody knows where.
NOP inside the .slotlp or outside?
Also, if you replace PTRB with PTRA, does it still happen?
I wonder if the implied augs with the if_c add ptrb, ##96*4*4*2 is somehow messing things up in the pipeline prior to using ptrb with indexing? Maybe adding that nop (or the mov qqq stuff) is adding another clock and cancelling something after this. In this code, maybe try adding the constant to ptrb from a new long holding the constant instead of the adding the ##96*4*4... values directly with the ## form.
It seems to be cog #0 that is damaged in this P2. I did this at the start:
DAT ' startup code & zero page
org
coginit #1,##UPPER_LOAD
cogstop #0
orgh $14
Now the main cog became cog #1. The cog #0 does something other now, but it is the cog1 which replaced itself by this memory arbiter if I understand this code well enough. Nop is no more needed in another cog
Time to find an electronic shop and replace this P2.
IMHO, and considering the maxima "every Cog is equal to any other one", you have a one-out-of-the-crowd in your possession.
If possibly (and "in name of science"), keep it as it is now, inspite the need of switching to Cog 1, as an alternate way of conducting the tests you've planned to do.
Perhaps, subjecting that P2 to the extra thermal stress of any removal ops, can totally destroy any useful evidence it could eventually produce.
As more and more P2s hits the roads, and pass thru "who knows what, how, and why" stress-related misbehaviours, maybe someone in Rocklin (@cgracey) can find useful keeping it "as is"...
@Yanomani said:
Weren't the most part of torture-testing initially conducted by @VonSzarvas, before you've got it, as the proud winner of that "contest"?
The pik33 module was fresh from manufacturing- I probably hadn't even opened the bag (although I might have had a peek!).
The modules that were stress tested that I happen to have with me run NeoYume fine.
In testing, a bunch of modules had power crazy-stress code running in all cogs at 350MHz+ until they failed. Certainly very hot for long periods of time. Data not with me, but >70C board temps.
So yeah, it's all pretty weird. I'm not sure anyone of us has seen a cog partially fail yet. Of course we are all running these things much faster than recommended, so without proper cooling what we are doing is far beyond what the semiconductor manufacturer designed for.
Is the issue with cog0 only at high speed, or is there an issue now at 180MHz too ? Wondering if this is a "lasting damage" type of thing, or rather just that's an edge case chip with lower overclocking capabilities than most?
Is the issue with cog0 only at high speed, or is there an issue now at 180MHz too
I tried this at something slightly below 300 MHz with the bug still on. The low boundary was set by the monitor which didn't like VGA at something lower than 50 Hz. One of monitors I have access to works even at 40 Hz so I managed to run it at 10x base freq instead of 14x.
Now when I know why the NeoYume didn't work I can write a simple test, a loop with several instructions, jumps and ptrx[index] to check this. For science After all, I work at the technical university. And there is a hope the university robotic project will restart so we will be able to buy several of these EC32. But then, as the side effect of this project, I have 3 bare naked unused P2s in my drawer.
The pik33 module was fresh from manufacturing- I probably hadn't even opened the bag (although I might have had a peek!).
The modules that were stress tested that I happen to have with me run NeoYume fine.
In testing, a bunch of modules had power crazy-stress code running in all cogs at 350MHz+ until they failed. Certainly very hot for long periods of time. Data not with me, but >70C board temps.
So yeah, it's all pretty weird. I'm not sure anyone of us has seen a cog partially fail yet. Of course we are all running these things much faster than recommended, so without proper cooling what we are doing is far beyond what the semiconductor manufacturer designed for.
Hi @VonSzaras,
Sorry by my misunderstanding about one of your comments...
"First person with a EU delivery address to guess that part can get a test module from me when I'm done with them!" (bolds of my own)
In my messy interpretation of that particular excerpt of the whole phrase's contents, I got the wrong impression that this one module took part of the test batch. My bad!
Anyway; part of what makes a "natural" gem so valuable, are the random deffects buried inside its structure...
Comments
No, the video driver outputs the virtual line number to #$F8. I left it as it is, but also added lines to wrlong this to the other address and changed the address in _lower file.
I am after several more experiments and now I closed the neoyume. Instead I want to test the hub ram in this P2. The last experiment was:
in lspc cog. What I expected is: it will fill the 4 lines of sprites at the start of the screen, then the rest of this will be sent to the far away outer space and I will see repeating 4 lines of sprites. This was not the case. In the lower part of the screen these sprites started to disappear, the lower the line, the more they disappear, all of this blinky and noisy.
Here's my recent code for testing hubRAM reliability at and near PLL-self-limiting frequency. It just so happens to read and write 100% of hubRAM with verification.
Report is a tad terse but the sequence is write then read all of hubRAM at target frequency. Then, if there is an error, repeat the read at 10 MHz followed by a fresh write at 10 MHz and read back at target freq. Each of those results is reported on a single line of terminal.
A key-press is then waited for before incrementing target frequency and repeating above.
No, 4 nopx <> 0 nops. With 4 nops only every 3rd sprite is displayed at the start screen (the simplest of them) With 0 nops 18 gradient columns are visible. As if something is overwriting the sprite buffer if the loop time is too long.
This "every 3rd column" effect is also present with the untouched program. Something overwrites hub ram locations "in between". Time to find an electronic shop and replace this P2.
Fixed!
About line 2974,
instead of
i used
and this is all I had to change (of course I also had to declare qqq) to fix the NeoYume.
No, what the kind of P2 malfunction is there to do such a thing?
Yep, get repeating sprites with that hack. So I guess there's something really bad going on.
huh? Buggy PTR indexing?
Though wait, wouldn't that be a more obvious fail? Maybe it only affects PTRB, which IIRC neither flexspin nor chip's interpreter actually use.
There is a simpler fix.
Add a
nop
beforerdlong ma_mtmp1,ptrb[2] wc
and it works.It seems the ptrb in this p2 cog needs some more time to settle after adding something to it. Then rdlong reads the wrong, and somewhat random location. Then the code writes to nobody knows where.
NOP inside the
.slotlp
or outside?Also, if you replace PTRB with PTRA, does it still happen?
inside. Outside doesn't work.
Well that's real strange, since in the loop path there's always the 4 cycle DJNZ inbetween the ADD and RDLONG
ptra does exactly the same. Either a nop or it blinks.
What is more strange, a nop before .slotlp and a nop before djnz (2 nops added) don't work - nop has to be added between djnz and rdlong.
I wonder if the implied augs with the
if_c add ptrb, ##96*4*4*2
is somehow messing things up in the pipeline prior to using ptrb with indexing? Maybe adding that nop (or the mov qqq stuff) is adding another clock and cancelling something after this. In this code, maybe try adding the constant to ptrb from a new long holding the constant instead of the adding the##96*4*4...
values directly with the ## form.Ok, just read this comment, you posted as I was formulating my post above.
No, this is rdlong after djnz that fails. Adding nop after add ptrb, and outside the loop, doesn't work.
Also, this work without nops:
so it is this ptrx[index] what fails after djnz
This doesn't work:
but this does:
How about reversing these two lines at the end of the transfer to give ptrb longer to update....
Ok we just cross posted again. So having a ptrx[index] directly after djnz seems to corrupt things on this board.
would have to move both inside the label though? Not sure if a pending IRQ immediately hits after the last REP instruction.
Yes I should have mentioned that too. Both inside irqshield.
It seems to be cog #0 that is damaged in this P2. I did this at the start:
Now the main cog became cog #1. The cog #0 does something other now, but it is the cog1 which replaced itself by this memory arbiter if I understand this code well enough. Nop is no more needed in another cog
Bad P2. You might have cooked it too much pik33.
Nothing cooks this P2 more than this NeoYume. None of my test makes it as hot as this. > @Wuerfel_21 said:
I tried this, it doesn't work. djnz does something bad here, which prevents to ptrx[index] to work with cog #0 directly after.
Weren't the most part of torture-testing initially conducted by @VonSzarvas, before you've got it, as the proud winner of that "contest"?
https://forums.parallax.com/discussion/174586/new-edge-module-spotted-in-rocklin
IMHO, and considering the maxima "every Cog is equal to any other one", you have a one-out-of-the-crowd in your possession.
If possibly (and "in name of science"), keep it as it is now, inspite the need of switching to Cog 1, as an alternate way of conducting the tests you've planned to do.
Perhaps, subjecting that P2 to the extra thermal stress of any removal ops, can totally destroy any useful evidence it could eventually produce.
As more and more P2s hits the roads, and pass thru "who knows what, how, and why" stress-related misbehaviours, maybe someone in Rocklin (@cgracey) can find useful keeping it "as is"...
Yea, you better keep it. FOR SCIENCE! I'm sure Parallax can just send you a fresh one.
I guess this is a yolk-less P2. The search for the double yolk is still on.
The pik33 module was fresh from manufacturing- I probably hadn't even opened the bag (although I might have had a peek!).
The modules that were stress tested that I happen to have with me run NeoYume fine.
In testing, a bunch of modules had power crazy-stress code running in all cogs at 350MHz+ until they failed. Certainly very hot for long periods of time. Data not with me, but >70C board temps.
So yeah, it's all pretty weird. I'm not sure anyone of us has seen a cog partially fail yet. Of course we are all running these things much faster than recommended, so without proper cooling what we are doing is far beyond what the semiconductor manufacturer designed for.
Is the issue with cog0 only at high speed, or is there an issue now at 180MHz too ? Wondering if this is a "lasting damage" type of thing, or rather just that's an edge case chip with lower overclocking capabilities than most?
pik said he tried it at much lower speeds (though still > 180 MHz) and it didn't change anything.
I tried this at something slightly below 300 MHz with the bug still on. The low boundary was set by the monitor which didn't like VGA at something lower than 50 Hz. One of monitors I have access to works even at 40 Hz so I managed to run it at 10x base freq instead of 14x.
Now when I know why the NeoYume didn't work I can write a simple test, a loop with several instructions, jumps and ptrx[index] to check this. For science After all, I work at the technical university. And there is a hope the university robotic project will restart so we will be able to buy several of these EC32. But then, as the side effect of this project, I have 3 bare naked unused P2s in my drawer.
Hi @VonSzaras,
Sorry by my misunderstanding about one of your comments...
(https://forums.parallax.com/discussion/comment/1538540/#Comment_1538540)
"First person with a EU delivery address to guess that part can get a test module from me when I'm done with them!" (bolds of my own)
In my messy interpretation of that particular excerpt of the whole phrase's contents, I got the wrong impression that this one module took part of the test batch. My bad!
Anyway; part of what makes a "natural" gem so valuable, are the random deffects buried inside its structure...