@rogloh said:
What's the lowest sysclk you can currently reproduce your problem @pik33?
About 265 MHz (multiplier=11 instead of 14 in NeoVGA code)
And, maybe more important, can it be reproduced with a (lot) less code, to see if other chips/cogs may be affected ?
I have to check this, but I think what is important to reproduce this bug is this:
drvl #PSRAM_SELECT
.irqshield
djnz ma_slotleft,#.slotlp
<------------------------------------------------- nop here partially repairs the problem
'debug("canary alive. Lorem ipsum dolor sit amet. Take it easy!")
jmp #ma_lineloop
ma_do_adpcm
'drvh #38
shl ma_mtmp1,#2 ' ADPCM cache lines are 4 longs <---------------------------------------------------------------------------- here
setbyte ma_mtmp1,#$EB,#3
splitb ma_mtmp1
There is an instruction that does something with ma_mtmp1 in the pipeline and it may be "partially executed" overwriting the rdlong result. That's why this nop changes the behavior: there is no isstruction that changes ma_mtmp1 in the pipeline any more.
before the loop. If there is not a nop between this and the rdlong, there is "2 lines good, 2 lines empty" effect.
This effect disappears if I change the algorithm to this
mov qqq, ma_curline
and qqq, #3
mul qqq,##$600 '(=96*16)
add ptrb,qqq
While adding qqq to ptrb is still right before rdlong, the effect disappears, as if what causes the effect is somewhat connected with the conditional instruction
if_c add ptrb,##96*4*4*2
It is the same problem, the instruction is in the pipeline and if not "fully cancelled" it may alter several bits in PTRB while the condition is not met. However,
add ptrb,#8
rdlong ma_mtmp1,ptrb wc
sub ptrb,#8
repairs this, so it is only the problem with ptrx[index] instructions. I can use ptra with no changes in "special effects"
The hypothesis now:
To trigger the bug, you have to:
use ptrx[index] addressing mode to load a register
no more than 2 instriuctions earlier, use something which modifies either a ptrx, or a target register
make the instruction to be cancelled in the pipeline by using jump or if_something
Having this hypothesis I can start to write something which may (or may not if it is false) check this thing.
@evanh said:
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?
Maybe because there isn't a simpler test than running the full MegaYume ?
I can't run it because I don't have the ram module, however occasionally I experienced that "magic nop" that fixed things, it is a while since I last experienced this issue, the affected code has gone under several revisions and I don't remember if the ptr..index was used or how (and I always tought that it was something stupid I did in the code..., not that this is excluded at all...).
There's a hundred reasons for a NOP to make a difference. I've had a few myself - Either a slip-up or lack of knowledge when writing some pasm.
I've just come across something similar that I don't understand why. I can change the order of the instructions and avoid it. But when I tried to simplify the environment it goes away completely, reordered or not. - https://forums.parallax.com/discussion/comment/1539753/#Comment_1539753
@pik33 said:
The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job
uiuiui, pipelining funny.
@evanh said:
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?
Yea, only that one P2.
@macca said:
@evanh said:
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?
Maybe because there isn't a simpler test than running the full MegaYume ?
I can't run it because I don't have the ram module, however occasionally I experienced that "magic nop" that fixed things, it is a while since I last experienced this issue, the affected code has gone under several revisions and I don't remember if the ptr..index was used or how (and I always tought that it was something stupid I did in the code..., not that this is excluded at all...).
I've never had a "magic NOP" issue that didn't eventually turn out to be a programming mistake somewhere else and thus theoretically consistent across chips, but usually changing with shifting addresses. pik's funny rdlong does not react to shifting the surrounding code and only happens on one chip.
Getting a simpler test going is certainly important.
...
Maybe the interrupts have something to do with it? I don't think we really considered them yet.
The Reimu prop.
Reimu: What's a "microcontroller", anyways?
[Ken jumps out of the woodwork and buries her under a comically large pile of educational materials]
Magically got some files ... works flawlessly for me, not getting any glitches watching the Metal Slug demo cycling. Pulling about 1.8 Watts from my benchtop, so well within USB supply capable as well.
Attached a photo of my wiring to use add-on accessories because I didn't buy a carrier board.
Ada,
I'm just watching your presentation video now. You mention having to perform pixel doubling for HDMI output and that that consumes the cog. This could be avoided but you seemed hesitant to go that path? HDMI displays are pretty good at accepting low hsync frequency, they aren't restricted to >30 kHz the way VGA displays are.
The limit is minimum of 25 MP/s. So, yeah, large blanking will be required. But again, HDMI have wider blanking capable.
EDIT: Quick check ... yep, working with 320x480@60Hz:
I guess you could do 320x480 with hueg hblank (remember, vertical timing must be SDTV-like), but that'll very likely mess up the aspect ratio on a lot of displays. Also, pre-scaling the pixels stops crappy scaling algorithms from messing up the image too much...
The current VGA modes are more eyeballed than anything (read: set correct resolution and tweak divider until monitor says the framerate is right). Should probably use brain to actually calculate the mathematically correct dividers.
'' HDMI 800x480 mode
hdmi_wide_config
long VIDEO_CLKFREQ
' Timing
long 2 - 1 ' line multiplier minus one
long 12 ' native front porch lines
long 2 ' native sync lines
long 34 ' native back porch lines
long $0CCCCCCD ' Sync NCO value
long 0 ' H40 NCO value
long 0 ' H32 NCO value
long HDMI_BLANK ' blanking color
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 800 ' blank line
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 80 ' extra pillar
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 56, HDMI_BLANK ' HSync section 1 (front porch)
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 96, HDMI_HSYNC ' HSync section 2 (sync pulse)
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 123, HDMI_BLANK ' HSync section 3 (back porch)
long 0,0 ' HSync padding 4
long 0,0 ' HSync padding 5
long 0,0 ' HSync padding 6
long 0,0 ' HSync padding 7
long 0,0 ' HSync padding 8
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 56, HDMI_VSYNC ' VSync section 1 (front porch)
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 96, HDMI_HVSYNC ' VSync section 2 (sync pulse)
long X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 123 + 800,HDMI_VSYNC ' VSync section 3 (back porch + active)
long 0,0 ' VSync padding 4
long 0,0 ' VSync padding 5
long 0,0 ' VSync padding 6
long 0,0 ' VSync padding 7
long 0,0 ' VSync padding 8
' Color conversion
long %10_0000000 + negx' CMOD mode + flags
long 0 ' CY
long 0 ' CI
long 0 ' CQ
long 0 ' CQ XORlternate value
long 0 ' CFRQ
So that's 480 active pixels, 160 pixels of pillarbox and 275 pixels of blanking. 1075 total = 21500cy per virtual line. Only 4 short of the ideal hardware-accurate value (384 clocks/line times 4 (pixel clock divider) times 14 (master clock multiplier) = 21504)
@evanh said:
Vtotal = 1006 is more than enough for 4*240 visible. Or do you also need the longer Vblanking interval as well?
Yep, needs it. updating VRAM outside of blanking causes tearing/glitches. Some games exhibited this before I fixed the interrupt trigger line to be the first blank line instead of the first vsync line. So most games don't really care, some do.
MegaYume really needs the proper vertical timing, some games are very particular about what values end up in the scanline counter. (and updating the sprite table outside of VBLANK just doesn't work right). In NeoYume the equivalent feature isn't even implemented lmao.
@Rayman said:
So you actually got this to work over HDMI to TV?
That's neat.
With my cheap small LCD TV (2013 model), yes. The older large plasma TV, no - Limited testing indicates it only accepts strict published timings of a few common resolutions.
Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?
@evanh said:
... testing indicates it only accepts strict published timings of a few common resolutions.
It's worse than that. All prior success must have been via VGA only. With HDMI, the plasma TV may require CEC negotiation or something. So far, I've not managed to put any picture up using a HDMI link. PS: 5 Volt pull-up is present.
EDIT: LOL, no, I was right first time. Just had to be more exact with the frequencies. It won't accept anything other than 31.5 kHz line rate for 640x480. And that's the only documented resolution in range of the Prop2
@"Christof Eb." said:
Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?
Well, my debugging technique isn't terribly great, that's why there's still many bugs that elude me.
The first important thing is to build components, if possible, in a way where they can be tested on their own (see: standalone sound chip objects, Z80 test rig and the VRAM dump rendering tests). That not only makes it easier to debug certain issues, but also proves the viability of the project without too much investment.
The main weapon beyond that is the DEBUG feature / BRK instruction. One trick I used in MegaYume when the 68000 was still too buggy to launch any games was to trace each instruction executed and compare that with a similar trace from a working emulator on PC. That (after filtering out noise (= busy wait loops)) then pinpointed the instruction where the program goes awry.
@"Christof Eb." said:
Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?
Well, my debugging technique isn't terribly great, that's why there's still many bugs that elude me.
The first important thing is to build components, if possible, in a way where they can be tested on their own (see: standalone sound chip objects, Z80 test rig and the VRAM dump rendering tests). That not only makes it easier to debug certain issues, but also proves the viability of the project without too much investment.
The main weapon beyond that is the DEBUG feature / BRK instruction. One trick I used in MegaYume when the 68000 was still too buggy to launch any games was to trace each instruction executed and compare that with a similar trace from a working emulator on PC. That (after filtering out noise (= busy wait loops)) then pinpointed the instruction where the program goes awry.
Thanks, Ada, for the insights.
"The first important thing is to build components, if possible, in a way where they can be tested on their own." Well, that's some reason, why I like the strange thing called Forth...
Have a good hot Sunday! Christof
Comments
About 265 MHz (multiplier=11 instead of 14 in NeoVGA code)
I have to check this, but I think what is important to reproduce this bug is this:
There is an instruction that does something with ma_mtmp1 in the pipeline and it may be "partially executed" overwriting the rdlong result. That's why this nop changes the behavior: there is no isstruction that changes ma_mtmp1 in the pipeline any more.
Then there is this:
before the loop. If there is not a nop between this and the rdlong, there is "2 lines good, 2 lines empty" effect.
This effect disappears if I change the algorithm to this
While adding qqq to ptrb is still right before rdlong, the effect disappears, as if what causes the effect is somewhat connected with the conditional instruction
if_c add ptrb,##96*4*4*2
It is the same problem, the instruction is in the pipeline and if not "fully cancelled" it may alter several bits in PTRB while the condition is not met. However,
repairs this, so it is only the problem with ptrx[index] instructions. I can use ptra with no changes in "special effects"
The hypothesis now:
To trigger the bug, you have to:
Having this hypothesis I can start to write something which may (or may not if it is false) check this thing.
Maybe because there isn't a simpler test than running the full MegaYume ?
I can't run it because I don't have the ram module, however occasionally I experienced that "magic nop" that fixed things, it is a while since I last experienced this issue, the affected code has gone under several revisions and I don't remember if the ptr..index was used or how (and I always tought that it was something stupid I did in the code..., not that this is excluded at all...).
There's a hundred reasons for a NOP to make a difference. I've had a few myself - Either a slip-up or lack of knowledge when writing some pasm.
I've just come across something similar that I don't understand why. I can change the order of the instructions and avoid it. But when I tried to simplify the environment it goes away completely, reordered or not. - https://forums.parallax.com/discussion/comment/1539753/#Comment_1539753
uiuiui, pipelining funny.
Yea, only that one P2.
I've never had a "magic NOP" issue that didn't eventually turn out to be a programming mistake somewhere else and thus theoretically consistent across chips, but usually changing with shifting addresses. pik's funny rdlong does not react to shifting the surrounding code and only happens on one chip.
Getting a simpler test going is certainly important.
...
Maybe the interrupts have something to do with it? I don't think we really considered them yet.
Reimu: What's a "microcontroller", anyways?
[Ken jumps out of the woodwork and buries her under a comically large pile of educational materials]
Magically got some files ... works flawlessly for me, not getting any glitches watching the Metal Slug demo cycling. Pulling about 1.8 Watts from my benchtop, so well within USB supply capable as well.
Attached a photo of my wiring to use add-on accessories because I didn't buy a carrier board.
As mentioned by @Wuerfel_21 above, she's got some interesting work to share with us tomorrow! Register here https://www.parallax.com/open-discussion-around-propeller-1-and-2-june-15th-2022/
If you ever want to relive the amazing experience of witnessing my expertly crafted LibreOffice(tm) slides, you may: https://mega.nz/file/ma4WzQgL#IKGVH7VHz9dGhjBLrNAmzfsJtwLSjv-XOxFJsNoEivU
Now that's a real joystick! I had never seen the home console NeoGeo before. In fact never knew there was one until you said so.
I do infact have the power to retroactively materialize obscure video game hardware into existence. I use my powers with care.
Ada,
I'm just watching your presentation video now. You mention having to perform pixel doubling for HDMI output and that that consumes the cog. This could be avoided but you seemed hesitant to go that path? HDMI displays are pretty good at accepting low hsync frequency, they aren't restricted to >30 kHz the way VGA displays are.
The limit is minimum of 25 MP/s. So, yeah, large blanking will be required. But again, HDMI have wider blanking capable.
EDIT: Quick check ... yep, working with 320x480@60Hz:
EDIT2: Vtot could be reduced if Htot was boosted. It worked with my TV that way though. That's just the timings my old code auto-generated.
I guess you could do 320x480 with hueg hblank (remember, vertical timing must be SDTV-like), but that'll very likely mess up the aspect ratio on a lot of displays. Also, pre-scaling the pixels stops crappy scaling algorithms from messing up the image too much...
TVs probably won’t accept that signal. Monitors might.
Here's better mode suited for VGA cable:
Not sure if DVI output can do such dividers ... I have to re-engineer the calculation ...
TMDS encoder only supports sysclk/10.
The current VGA modes are more eyeballed than anything (read: set correct resolution and tweak divider until monitor says the framerate is right). Should probably use brain to actually calculate the mathematically correct dividers.
Okay, right, at 338 MHz, sysclock/10 creates huge blankings then.
Well, current HDMI config looks like this:
So that's 480 active pixels, 160 pixels of pillarbox and 275 pixels of blanking. 1075 total = 21500cy per virtual line. Only 4 short of the ideal hardware-accurate value (384 clocks/line times 4 (pixel clock divider) times 14 (master clock multiplier) = 21504)
I got my TV displaying this mode:
And this one is less lopsided:
And this one is pretty close to max hblank:
EDIT: Hmm, the 58 kHz line rate is too fast for the emulation isn't it. It needs the slower 31 kHz. Sysclock/20 would've done the job.
Here we go (line quad):
And the widescreen version:
For line4x, vtotal should be 1056 (264*4). Do note that it comes out to ~59.6Hz, that is correct.
So you actually got this to work over HDMI to TV?
That's neat.
I haven't tried it on an actual TV, but apparently the 800x480 mode works for people.
Vtotal = 1006 is more than enough for 4*240 visible. Or do you also need the longer Vblanking interval as well?
Yep, needs it. updating VRAM outside of blanking causes tearing/glitches. Some games exhibited this before I fixed the interrupt trigger line to be the first blank line instead of the first vsync line. So most games don't really care, some do.
MegaYume really needs the proper vertical timing, some games are very particular about what values end up in the scanline counter. (and updating the sprite table outside of VBLANK just doesn't work right). In NeoYume the equivalent feature isn't even implemented lmao.
Can certainly tweak the blankings to suit. Make Htotal a little shorter maybe.
With my cheap small LCD TV (2013 model), yes. The older large plasma TV, no - Limited testing indicates it only accepts strict published timings of a few common resolutions.
Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?
It's worse than that. All prior success must have been via VGA only. With HDMI, the plasma TV may require CEC negotiation or something. So far, I've not managed to put any picture up using a HDMI link. PS: 5 Volt pull-up is present.
EDIT: LOL, no, I was right first time. Just had to be more exact with the frequencies. It won't accept anything other than 31.5 kHz line rate for 640x480. And that's the only documented resolution in range of the Prop2
Well, my debugging technique isn't terribly great, that's why there's still many bugs that elude me.
The first important thing is to build components, if possible, in a way where they can be tested on their own (see: standalone sound chip objects, Z80 test rig and the VRAM dump rendering tests). That not only makes it easier to debug certain issues, but also proves the viability of the project without too much investment.
The main weapon beyond that is the DEBUG feature / BRK instruction. One trick I used in MegaYume when the 68000 was still too buggy to launch any games was to trace each instruction executed and compare that with a similar trace from a working emulator on PC. That (after filtering out noise (= busy wait loops)) then pinpointed the instruction where the program goes awry.
Thanks, Ada, for the insights.
"The first important thing is to build components, if possible, in a way where they can be tested on their own." Well, that's some reason, why I like the strange thing called Forth...
Have a good hot Sunday! Christof
Looks like I was able to get an eMMC chip reading at 28 MBPS a while ago.
Is that fast enough for console emulation ROM?
https://forums.parallax.com/discussion/171653/fsrw-for-emmc-with-8-bit-bus-now-at-28-mb-s-example-code-posted/p1