Console Emulation

pik33 · 2022-06-14 10:06

@rogloh said:
What's the lowest sysclk you can currently reproduce your problem @pik33?

About 265 MHz (multiplier=11 instead of 14 in NeoVGA code)

And, maybe more important, can it be reproduced with a (lot) less code, to see if other chips/cogs may be affected ?

I have to check this, but I think what is important to reproduce this bug is this:

              drvl #PSRAM_SELECT
.irqshield

              djnz ma_slotleft,#.slotlp
             <-------------------------------------------------  nop here partially repairs the problem

              'debug("canary alive. Lorem ipsum dolor sit amet. Take it easy!")
              jmp #ma_lineloop



ma_do_adpcm

              'drvh #38

              shl     ma_mtmp1,#2 ' ADPCM cache lines are 4 longs        <---------------------------------------------------------------------------- here 
              setbyte ma_mtmp1,#$EB,#3
              splitb  ma_mtmp1

There is an instruction that does something with ma_mtmp1 in the pipeline and it may be "partially executed" overwriting the rdlong result. That's why this nop changes the behavior: there is no isstruction that changes ma_mtmp1 in the pipeline any more.

Then there is this:

              testb ma_curline,#0 wc
        if_c  add ptrb,##96*4*4
              testb ma_curline,#1 wc
        if_c  add ptrb,##96*4*4*2

before the loop. If there is not a nop between this and the rdlong, there is "2 lines good, 2 lines empty" effect.

This effect disappears if I change the algorithm to this

    mov qqq, ma_curline
    and qqq, #3
    mul qqq,##$600 '(=96*16)
    add ptrb,qqq

While adding qqq to ptrb is still right before rdlong, the effect disappears, as if what causes the effect is somewhat connected with the conditional instruction

if_c add ptrb,##96*4*4*2

It is the same problem, the instruction is in the pipeline and if not "fully cancelled" it may alter several bits in PTRB while the condition is not met. However,

    add ptrb,#8
    rdlong ma_mtmp1,ptrb wc
    sub ptrb,#8

repairs this, so it is only the problem with ptrx[index] instructions. I can use ptra with no changes in "special effects"

The hypothesis now:

To trigger the bug, you have to:

use ptrx[index] addressing mode to load a register
no more than 2 instriuctions earlier, use something which modifies either a ptrx, or a target register
make the instruction to be cancelled in the pipeline by using jump or if_something

Having this hypothesis I can start to write something which may (or may not if it is false) check this thing.

macca · 2022-06-14 11:00

@evanh said:
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?

Maybe because there isn't a simpler test than running the full MegaYume ?
I can't run it because I don't have the ram module, however occasionally I experienced that "magic nop" that fixed things, it is a while since I last experienced this issue, the affected code has gone under several revisions and I don't remember if the ptr..index was used or how (and I always tought that it was something stupid I did in the code..., not that this is excluded at all...).

evanh · 2022-06-14 12:11

There's a hundred reasons for a NOP to make a difference. I've had a few myself - Either a slip-up or lack of knowledge when writing some pasm.

I've just come across something similar that I don't understand why. I can change the order of the instructions and avoid it. But when I tried to simplify the environment it goes away completely, reordered or not. - https://forums.parallax.com/discussion/comment/1539753/#Comment_1539753

Wuerfel_21 · 2022-06-14 13:52

@pik33 said:
The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job

uiuiui, pipelining funny.

@evanh said:
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?

Yea, only that one P2.

@macca said:

@evanh said:
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?

Maybe because there isn't a simpler test than running the full MegaYume ?
I can't run it because I don't have the ram module, however occasionally I experienced that "magic nop" that fixed things, it is a while since I last experienced this issue, the affected code has gone under several revisions and I don't remember if the ptr..index was used or how (and I always tought that it was something stupid I did in the code..., not that this is excluded at all...).

I've never had a "magic NOP" issue that didn't eventually turn out to be a programming mistake somewhere else and thus theoretically consistent across chips, but usually changing with shifting addresses. pik's funny rdlong does not react to shifting the surrounding code and only happens on one chip.

Getting a simpler test going is certainly important.
...
Maybe the interrupts have something to do with it? I don't think we really considered them yet.

The Reimu prop.

Reimu: What's a "microcontroller", anyways?

[Ken jumps out of the woodwork and buries her under a comically large pile of educational materials]

evanh · 2022-06-14 14:22

Magically got some files ... works flawlessly for me, not getting any glitches watching the Metal Slug demo cycling. Pulling about 1.8 Watts from my benchtop, so well within USB supply capable as well.

Attached a photo of my wiring to use add-on accessories because I didn't buy a carrier board.

Ken Gracey · 2022-06-14 19:05

As mentioned by @Wuerfel_21 above, she's got some interesting work to share with us tomorrow! Register here https://www.parallax.com/open-discussion-around-propeller-1-and-2-june-15th-2022/

Wuerfel_21 · 2022-06-16 00:52

If you ever want to relive the amazing experience of witnessing my expertly crafted LibreOffice(tm) slides, you may: https://mega.nz/file/ma4WzQgL#IKGVH7VHz9dGhjBLrNAmzfsJtwLSjv-XOxFJsNoEivU

evanh · 2022-06-16 12:23

Now that's a real joystick! I had never seen the home console NeoGeo before. In fact never knew there was one until you said so.

Wuerfel_21 · 2022-06-16 18:34

I do infact have the power to retroactively materialize obscure video game hardware into existence. I use my powers with care.

evanh · 2022-06-17 10:28

Ada,
I'm just watching your presentation video now. You mention having to perform pixel doubling for HDMI output and that that consumes the cog. This could be avoided but you seemed hesitant to go that path? HDMI displays are pretty good at accepting low hsync frequency, they aren't restricted to >30 kHz the way VGA displays are.

The limit is minimum of 25 MP/s. So, yeah, large blanking will be required. But again, HDMI have wider blanking capable.

EDIT: Quick check ... yep, working with 320x480@60Hz:

 timings[]: 00000000 0ee6b280 08400828 7fa989e0 00000a00 00000000 00000000
 Sysclock freq=250 MHz   Divider=2560
 Hres=320  Htot=400   Hfreq=62500 Hz
 Vres=480  Vtot=1042   Vfreq=60.0 Hz

EDIT2: Vtot could be reduced if Htot was boosted. It worked with my TV that way though. That's just the timings my old code auto-generated.

Wuerfel_21 · 2022-06-17 10:38

I guess you could do 320x480 with hueg hblank (remember, vertical timing must be SDTV-like), but that'll very likely mess up the aspect ratio on a lot of displays. Also, pre-scaling the pixels stops crappy scaling algorithms from messing up the image too much...

Rayman · 2022-06-17 10:54

TVs probably won’t accept that signal. Monitors might.

evanh · 2022-06-17 11:49

Here's better mode suited for VGA cable:

 timings[]: 00000000 14257880 08400828 0a20a1e0 00001b00 00000000 00000000
 Sysclock freq=338 MHz   Divider=27
 Hres=320  Htot=400   Hfreq=31296 Hz
 Vres=480  Vtot=522   Vfreq=60.0 Hz

Not sure if DVI output can do such dividers ... I have to re-engineer the calculation ...

Wuerfel_21 · 2022-06-17 12:00

TMDS encoder only supports sysclk/10.

The current VGA modes are more eyeballed than anything (read: set correct resolution and tweak divider until monitor says the framerate is right). Should probably use brain to actually calculate the mathematically correct dividers.

evanh · 2022-06-17 12:02

Okay, right, at 338 MHz, sysclock/10 creates huge blankings then.

Wuerfel_21 · 2022-06-17 12:54

Well, current HDMI config looks like this:

'' HDMI 800x480 mode
hdmi_wide_config
long VIDEO_CLKFREQ
' Timing
long    2 - 1 ' line multiplier minus one
long    12 ' native front porch lines
long    2 ' native sync lines
long    34 ' native back porch lines
long    $0CCCCCCD ' Sync NCO value
long    0 ' H40 NCO value
long    0 ' H32 NCO value
long    HDMI_BLANK ' blanking color

long    X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 800 ' blank line
long    X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 80 ' extra pillar

long    X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 56, HDMI_BLANK ' HSync section 1 (front porch)
long    X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 96, HDMI_HSYNC ' HSync section 2 (sync pulse)
long    X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 123, HDMI_BLANK ' HSync section 3 (back porch)
long                         0,0 ' HSync padding 4
long                         0,0 ' HSync padding 5
long                         0,0 ' HSync padding 6
long                         0,0 ' HSync padding 7
long                         0,0 ' HSync padding 8

long   X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 56, HDMI_VSYNC ' VSync section 1 (front porch)
long   X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 96, HDMI_HVSYNC ' VSync section 2 (sync pulse)
long   X_DACS_3_2_1_0|X_IMM_1X32_4DAC8 + 123 + 800,HDMI_VSYNC ' VSync section 3 (back porch + active)
long                         0,0 ' VSync padding 4
long                         0,0 ' VSync padding 5
long                         0,0 ' VSync padding 6
long                         0,0 ' VSync padding 7
long                         0,0 ' VSync padding 8

' Color conversion
long    %10_0000000 + negx' CMOD mode + flags
long    0 ' CY
long    0 ' CI
long    0 ' CQ
long    0 ' CQ XORlternate value
long    0 ' CFRQ

So that's 480 active pixels, 160 pixels of pillarbox and 275 pixels of blanking. 1075 total = 21500cy per virtual line. Only 4 short of the ideal hardware-accurate value (384 clocks/line times 4 (pixel clock divider) times 14 (master clock multiplier) = 21504)

evanh · 2022-06-17 14:37

I got my TV displaying this mode:

 timings[]: 00000000 14257880 30403028 7fada9e0 0ccccccd 00000000 00000000
 Sysclock freq=338 MHz   Divisor=10
 Hres=320  hfp=48 hsync=64 hbp=48  Htot=480   Hfreq=70417 Hz
 Vres=480  vfp=255 vsync=2 vbp=437  Vtot=1174   Vfreq=60.0 Hz

And this one is less lopsided:

 timings[]: 00000000 14257880 40404028 7fab59e0 0ccccccd 00000000 00000000
 Sysclock freq=338 MHz   Divisor=10
 Hres=320  hfp=64 hsync=64 hbp=64  Htot=512   Hfreq=66016 Hz
 Vres=480  vfp=255 vsync=2 vbp=363  Vtot=1100   Vfreq=60.0 Hz

And this one is pretty close to max hblank:

 timings[]: 00000000 14257880 60406028 7c27c1e0 0ccccccd 00000000 00000000
 Sysclock freq=338 MHz   Divisor=10
 Hres=320  hfp=96 hsync=64 hbp=96  Htot=576   Hfreq=58681 Hz
 Vres=480  vfp=248 vsync=2 vbp=248  Vtot=978   Vfreq=60.0 Hz

EDIT: Hmm, the 58 kHz line rate is too fast for the emulation isn't it. It needs the slower 31 kHz. Sysclock/20 would've done the job.

evanh · 2022-06-17 16:29

Here we go (line quad):

 timings[]: 00000000 14257880 50405028 12a12bc0 0ccccccd 00000000 00000000
 Sysclock freq=338 MHz   Divisor=10
 Hres=320  hfp=80 hsync=64 hbp=80  Htot=544   Hfreq=62132 Hz
 Vres=960  vfp=37 vsync=2 vbp=37  Vtot=1036   Vfreq=60.0 Hz

And the widescreen version:

 timings[]: 00000000 14257880 30403032 0b20b3c0 0ccccccd 00000000 00000000
 Sysclock freq=338 MHz   Divisor=10
 Hres=400  hfp=48 hsync=64 hbp=48  Htot=560   Hfreq=60357 Hz
 Vres=960  vfp=22 vsync=2 vbp=22  Vtot=1006   Vfreq=60.0 Hz

Wuerfel_21 · 2022-06-17 17:20

For line4x, vtotal should be 1056 (264*4). Do note that it comes out to ~59.6Hz, that is correct.

Rayman · 2022-06-17 18:32

So you actually got this to work over HDMI to TV?
That's neat.

Wuerfel_21 · 2022-06-17 18:33

I haven't tried it on an actual TV, but apparently the 800x480 mode works for people.

evanh · 2022-06-17 23:51

Vtotal = 1006 is more than enough for 4*240 visible. Or do you also need the longer Vblanking interval as well?

Wuerfel_21 · 2022-06-17 23:57

@evanh said:
Vtotal = 1006 is more than enough for 4*240 visible. Or do you also need the longer Vblanking interval as well?

Yep, needs it. updating VRAM outside of blanking causes tearing/glitches. Some games exhibited this before I fixed the interrupt trigger line to be the first blank line instead of the first vsync line. So most games don't really care, some do.

MegaYume really needs the proper vertical timing, some games are very particular about what values end up in the scanline counter. (and updating the sprite table outside of VBLANK just doesn't work right). In NeoYume the equivalent feature isn't even implemented lmao.

evanh · 2022-06-18 00:23

Can certainly tweak the blankings to suit. Make Htotal a little shorter maybe.

evanh · 2022-06-18 00:59

@Rayman said:
So you actually got this to work over HDMI to TV?
That's neat.

With my cheap small LCD TV (2013 model), yes. The older large plasma TV, no - Limited testing indicates it only accepts strict published timings of a few common resolutions.

Christof Eb. · 2022-06-18 05:47

Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?

evanh · 2022-06-18 09:24

@evanh said:
... testing indicates it only accepts strict published timings of a few common resolutions.

It's worse than that. All prior success must have been via VGA only. With HDMI, the plasma TV may require CEC negotiation or something. So far, I've not managed to put any picture up using a HDMI link. PS: 5 Volt pull-up is present.

EDIT: LOL, no, I was right first time. Just had to be more exact with the frequencies. It won't accept anything other than 31.5 kHz line rate for 640x480. And that's the only documented resolution in range of the Prop2

Wuerfel_21 · 2022-06-18 12:53

@"Christof Eb." said:
Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?

Well, my debugging technique isn't terribly great, that's why there's still many bugs that elude me.

The first important thing is to build components, if possible, in a way where they can be tested on their own (see: standalone sound chip objects, Z80 test rig and the VRAM dump rendering tests). That not only makes it easier to debug certain issues, but also proves the viability of the project without too much investment.

The main weapon beyond that is the DEBUG feature / BRK instruction. One trick I used in MegaYume when the 68000 was still too buggy to launch any games was to trace each instruction executed and compare that with a similar trace from a working emulator on PC. That (after filtering out noise (= busy wait loops)) then pinpointed the instruction where the program goes awry.

Christof Eb. · 2022-06-19 09:43

@Wuerfel_21 said:

@"Christof Eb." said:
Thanks for sharing this project and for the video presentation!
As the project is very complex, it would be interesting to know more about the debugging methods you use?

Well, my debugging technique isn't terribly great, that's why there's still many bugs that elude me.

The first important thing is to build components, if possible, in a way where they can be tested on their own (see: standalone sound chip objects, Z80 test rig and the VRAM dump rendering tests). That not only makes it easier to debug certain issues, but also proves the viability of the project without too much investment.

The main weapon beyond that is the DEBUG feature / BRK instruction. One trick I used in MegaYume when the 68000 was still too buggy to launch any games was to trace each instruction executed and compare that with a similar trace from a working emulator on PC. That (after filtering out noise (= busy wait loops)) then pinpointed the instruction where the program goes awry.

Thanks, Ada, for the insights.
"The first important thing is to build components, if possible, in a way where they can be tested on their own." Well, that's some reason, why I like the strange thing called Forth...
Have a good hot Sunday! Christof

Rayman · 2022-06-19 11:16

Looks like I was able to get an eMMC chip reading at 28 MBPS a while ago.

Is that fast enough for console emulation ROM?

https://forums.parallax.com/discussion/171653/fsrw-for-emmc-with-8-bit-bus-now-at-28-mb-s-example-code-posted/p1

Console Emulation

Comments