Console Emulation

rogloh · 2024-09-09 10:02

Ok, so I think I am getting good data now for the capture. Was a stupid bug in the write to PSRAM where I was taking the address of a buffer pointer, instead of passing the buffer pointer value. Doh one of those again! I'm now seeing a clock transitioning every 5 P2 clocks in the nibble bit which is what I want to see. I'm also messing with the IO pin output voltages in case the loading has any effect on the read back state of the pin.

Fixed code attached. Now I need to capture it to my Mac and decode it all to see if it is valid.

EDIT: Added ability to write LOGFILE to SD card if fitted. Press "w" after a capture is done to write 2MB of nibbles expanded into 4MB of bytes. This will help analyze the data offline.

Wuerfel_21 · 2024-09-09 14:55

Going to look at it later, will have to set up the edge (usually I develop on EVAL 96MB, which doesn't have the bandwidth for this)

rogloh · 2024-09-09 23:55

@Wuerfel_21 said:

Going to look at it later, will have to set up the edge (usually I develop on EVAL 96MB, which doesn't have the bandwidth for this)

Cool, give it a try when you can. Unless my code is bad and somehow corrupting incoming data (maybe my PSRAM driver's default timing is not liking your 337MHz operation, still have to check) I am yet to achieve a stable sampled clock yet with various IO settings. It would bounce around a bit and not always remain at 5 P2 clocks high and 5 P2 clocks low, though it would add up to 10 clocks. Something weird at the IO pin with reflections perhaps. Leaving it unterminated gives other results as well, even just zeroes depending on the IO strengths. I do imagine that P_LOW_FAST and P_HIGH_1K5 would have the IO pin remain low for long time. I wonder what the best setting would be, P_HIGH_FAST and P_LOW_FAST perhaps?

I'm also expecting that Mouser order of the Adafruit HDMI to parallel LCD/RGB decoder board to arrive today so I might be able to try that out too. I'd expect it should give stable data to the P2 pins of a second system, probably a P2EVAL at this stage.

Wuerfel_21 · 2024-09-10 00:05

As mentioned, for logic capture I'm pretty sure you want the fast drivers and P_SYNC_IO.

Later is going to be tomorrow, spent the evening cooking potato soup and drawing some art for cirno day.

rogloh · 2024-09-10 00:11

@Wuerfel_21 said:
As mentioned, for logic capture I'm pretty sure you want the fast drivers and P_SYNC_IO.

Yeah that's what I was using IIRC.

Wuerfel_21 · 2024-09-10 00:49

I guess you could try having it run at lower sysclk for the capture (take care that NeoVGA init sets and then reads sysclk)

rogloh · 2024-09-10 01:08

Yeah I might have to do that. I did see some PLL stuff in NeoVGA and reading in that sysclk long, so gotta be careful to do it right. Hopefully I can just use that CLK_MULTIPLIER setting in one change..? Well two changes perhaps.

Wuerfel_21 · 2024-09-10 01:19

No, that one does a bunch of things. You want to just launch NeoVGA and then switch the clock down afterwards, that's safest.

rogloh · 2024-09-10 01:32

@Wuerfel_21 said:
No, that one does a bunch of things. You want to just launch NeoVGA and then switch the clock down afterwards, that's safest.

Okay will have to do that. I am seeing PSRAM readback errors so that might be the problem (or there's yet another bug in my testing).

This is failing at random offsets.

PUB testMem(buf1,buf2) | addr, data, i, j
    repeat j from 1 to 100
        i:=j
        repeat addr from buf2 to buf2+1023
            byte[addr]:=i
            ??i
        psram.write(buf2, 0,1024)
        psram.read(buf1, 0, 1024)
        repeat addr from 0 to 1023
            if byte[buf1][addr]<>byte[buf2][addr]
                send("Mismatch! offset=", f.hexn(addr,8), " iteration ", f.dec(j),13,10)
                quit

    send("Memory tested",13,10)

l -t myvgatest.binary
( Entering terminal mode. Press Ctrl-] or Ctrl-Z to exit. )
Mismatch! offset=0000001B iteration 1
Mismatch! offset=0000000B iteration 2
Mismatch! offset=0000000A iteration 3
Mismatch! offset=00000003 iteration 4
Mismatch! offset=0000000E iteration 5
Mismatch! offset=00000005 iteration 6
Mismatch! offset=00000003 iteration 7
Mismatch! offset=00000010 iteration 8
Mismatch! offset=00000003 iteration 9
Mismatch! offset=00000002 iteration 10
Mismatch! offset=00000001 iteration 11
Mismatch! offset=00000000 iteration 12

UPDATE: setting PSRAM delay to 12 (from 13) fixes the memory errors. Now checking if captured HDMI data is fixed.

rogloh · 2024-09-10 10:06

Finally getting some TMDS capture results that are starting to make a bit of sense but there is a problem I think with skew between bit lanes in the same 8 bit port. This is causing what looks like pixel differences - see highlighted line for RGB differences in the 10 bit codes received on each lane - they should all be the same. I do at least see the colour bar pixel data from the test pattern now (pixel doubled per scanline). The TERC4 decoding is still messed up though (probably by same issue) and isn't able to lock to anything or these packets are not being sent (which I doubt).

I found I needed to use P_SCHMITT_A to help stabilize the data and have a monitor plugged into the HDMI port (switched off) while driving IO at FAST high/low 3.3V strengths. Hopefully this isn't hurting my monitor in any way vs typical CML levels.

Updated capture source code attached. Analyzed by running this command on the LOGFILE written to SD.
./tm 3 2 1 0 < LOGFILE

Update: I did try to capture with 10b symbols being output slower than the regular P2 sysclk rate but I need to be sure I am reliably sampling in the middle of the bit. I was still getting strange inconsistent results, perhaps due to phase differences between capture COG streamer starting up and the HDMI cog output timing, though this was before I stumbled onto the Schmitt trigger setting so I should have another attempt at this. Probably need to lock it to a reference edge and then fine adjust with waitx to nicely phase align with the now underclocked HDMI signal spanning 2 P2 clocks.

Ltech · 2024-09-10 19:52

@Wuerfel_21 said:

Which one are you trying to build?
.....

Ok I build flexprop on my old MacbookAir, On a M3 chip, make is missing the ARM thing.
"ld: symbol(s) not found for architecture arm64"

I compiled for P2edge 32ram on a edge module breadboard.
I put the sd card and metal slug.

Audio runs on my LG HDMI monitor

Video is strange, text are fine, game are just big blocks and moving/ building. You can percept the game trough it.
I guess the moving parts.

Wuerfel_21 · 2024-09-10 20:34

@Ltech
Can you show a picture or video of the problem?
That sounds a bit like the same old @pik33 chip glitch.

Ltech · 2024-09-11 06:49

Wuerfel_21 · 2024-09-11 07:23

Ok, that's a new one, good graphics but the wrong ones. Can you double-check that the files that end in c1,c2,c3,c4 are the right ones in the right place?

Wuerfel_21 · 2024-09-11 08:02

From the screenshots I figured out what appears to be going on, but I have zero idea how that would happen on accident.
It looks exactly like what I get when the first half of each C-rom file is swapped with its second half. (i.e. 201-C1 is a 4MB file and the first 2MB of data are swapped with the second 2MB - the same for all C files)

Rayman · 2024-09-11 10:31

Kinda similar to what I got when first fired up crossed swords…. Went away after hit start. Never came back, so no way to troubleshoot…

Ltech · 2024-09-11 19:17

Bad rom files...
I put overtop in the micro sd and this is running fine.
With sound in HDMI.

Impressif !

Wuerfel_21 · 2024-09-11 19:30

@Ltech said:

Bad rom files...
I put overtop in the micro sd and this is running fine.
With sound in HDMI.

Are you using the video-nextgen version or the hdmi-clock-regen-test version (if unsure which one you built, search NeoVGA.spin2 for packet_aclk_v. Only the latter will have that symbol)

Ltech · 2024-09-11 20:26

Sorry I use the old master I think
Go try the nextgen tomorrow, but don't find the "packet_aclk_v"

Wuerfel_21 · 2024-09-11 21:03

If you have audio over HDMI (without separate aux cable ) it must be one of the new branches.

Ltech · 2024-09-12 06:52

Yes I have sound on HDMI. no audio board

And sorry yesterday I was looking on my new M3 mac, but I compile on my old intel mac.

I try so much to get flexspin working, and forgot witch branche I used
I use the new nextgen video of 8 september.

Wurfel_21, I do not real understand you code, way to far from my skills.
I just pick some parts

If there was a hall of fame on Parallax, for sure you are near the HIGHEST score !

Respect!

rogloh · 2024-09-12 14:33

Hi @Wuerfel_21 , I finally managed to capture some decent(ish) HDMI samples from your driver. Had to run at full speed in the end and readback HDMI bytes using the P_OUTBIT_A setting for input pin control on HDMI port pins. Reading real pin signals wasn't accurate enough and was plagued by lane skew/reflections etc. Dropping to half NCO pixel frequencies also didn't help - for some reason the streamer doesn't seem to like to keep up and push data bytes back through the fifo into HUB at sysclk/2 rates, and apparently corrupts these samples. It does seem to work ok at sysclk/1 rates however. The only problem with this rate is that the PSRAM bandwidth does incur about a 15% overhead so eventually the ping pong buffer head chasing its tail catches up. With the two 128kB buffers I use this happens about 768kB (6*1.15 ~ 7 buffers worth) in so you do still get a reasonable amount of captured data to examine before it wraps/corrupts itself, around 78643 pixels worth or ~73 lines worth with your 1075 pixels/line timing.

I've synced up the capture to the top of frame using VSYNC pin (am also sending VGA) and see the AVI frame and clock regen and audio sample packets being sent in the TERC4 decoded output. CRCs/Checksums seem to match and the N/CTS fields look the same as you computed in the COG when I enabled DEBUG mode and printed them out. I did notice that you seem to send a long preamble. I think the encoding table in the spec called for 8 pixels prior to leading guard bands being preamble and I'm assuming prior to those 8 pixels would just be DVI standard encoding, not HDMI data/video preamble but whether that makes an actual difference who knows.

Included in this zip along with the source updates is a captured LOGFILE and the DECODE.TXT file which shows the TMDS decoded output after using ./tm 7 5 3 1 < LOGFILE > DECODE.TXT . The initial capture shows the vertical sync and blanking lines at the top of the frame. The real color bar RGB pixel data from active scan lines starts after sample 56325.

If you look at this output and use GREP etc you should see you are sending audio sample packets with 0,1 or 2 samples. Is 0 samples a valid packet to send? Also the channel status seems to reset before it hits 192, which I am counting per sample. Is this expected? Are you sure you are sending all 192 samples before wrapping this in the header? Or maybe I have some bug in the decode logic or it's hitting my own ping pong buffer wrap point.

Wuerfel_21 · 2024-09-12 16:14

@rogloh said:
Hi @Wuerfel_21 , I finally managed to capture some decent(ish) HDMI samples from your driver. Had to run at full speed in the end and readback HDMI bytes using the P_OUTBIT_A setting for input pin control on HDMI port pins. Reading real pin signals wasn't accurate enough and was plagued by lane skew/reflections etc. Dropping to half NCO pixel frequencies also didn't help - for some reason the streamer doesn't seem to like to keep up and push data bytes back through the fifo into HUB at sysclk/2 rates, and apparently corrupts these samples. It does seem to work ok at sysclk/1 rates however. The only problem with this rate is that the PSRAM bandwidth does incur about a 15% overhead so eventually the ping pong buffer head chasing its tail catches up. With the two 128kB buffers I use this happens about 768kB (6*1.15 ~ 7 buffers worth) in so you do still get a reasonable amount of captured data to examine before it wraps/corrupts itself, around 78643 pixels worth or ~73 lines worth with your 1075 pixels/line timing.

I think half NCO just doesn't work, ever. The TMDS encoder I think only works when NCO is at $CCCCCCD.

I've synced up the capture to the top of frame using VSYNC pin (am also sending VGA) and see the AVI frame and clock regen and audio sample packets being sent in the TERC4 decoded output. CRCs/Checksums seem to match and the N/CTS fields look the same as you computed in the COG when I enabled DEBUG mode and printed them out. I did notice that you seem to send a long preamble. I think the encoding table in the spec called for 8 pixels prior to leading guard bands being preamble and I'm assuming prior to those 8 pixels would just be DVI standard encoding, not HDMI data/video preamble but whether that makes an actual difference who knows.

I think the spec doesn't say anything on that, but I think I saw some Altera document where the preamble was drawn as long in a diagram. I think it just needs to at least 8 cycles. In the demo driver I had it generate exactly 8, I don't think it makes a difference.

Included in this zip along with the source updates is a captured LOGFILE and the DECODE.TXT file which shows the TMDS decoded output after using ./tm 7 5 3 1 < LOGFILE > DECODE.TXT . The initial capture shows the vertical sync and blanking lines at the top of the frame. The real color bar RGB pixel data from active scan lines starts after sample 56325.

If you look at this output and use GREP etc you should see you are sending audio sample packets with 0,1 or 2 samples. Is 0 samples a valid packet to send? Also the channel status seems to reset before it hits 192, which I am counting per sample. Is this expected? Are you sure you are sending all 192 samples before wrapping this in the header? Or maybe I have some bug in the decode logic or it's hitting my own ping pong buffer wrap point.

Empty audio packet is valid, yes. For channel status, I'm doing this, so it should be every 192 samples.

              incmod spdif_phase,#191 wc
        if_c  alts temp3,#20
        if_c  bith packet_header,#0-0
              ' Add S/PDIF status bit
              cmp spdif_phase,#32 wc
        if_c  testb spdif_status,spdif_phase wc
              bitc spleft,#26
              bitc spright,#26

Wuerfel_21 · 2024-09-12 23:32

@rogloh
Actually, looking at the log, there does appear to be something strange going on. Packet at clock 73209 is the beginning of a frame, but then the next frame begins at 181370 when the count is at 156. Also the actual status bits are wrong, bit 25 should be set.

The capture did loose data at around 89876 though, who knows how many scanlines got lost there.

I did push a minor fix (there were some unneccessary AUGS instructions that technically took too many cycles between scanfunc calls), but that's probably got nothing to do with it.

rogloh · 2024-09-13 01:52

@Wuerfel_21 said:
@rogloh
Actually, looking at the log, there does appear to be something strange going on. Packet at clock 73209 is the beginning of a frame, but then the next frame begins at 181370 when the count is at 156. Also the actual status bits are wrong, bit 25 should be set.

The capture did loose data at around 89876 though, who knows how many scanlines got lost there.

I did push a minor fix (there were some unneccessary AUGS instructions that technically took too many cycles between scanfunc calls), but that's probably got nothing to do with it.

Yeah I am going to try to capture more by increasing the buffer size. You lose 15% of the prior buffer due to the head catching the tail per buffer swap. About 6 iterations will wrap it and the first corruption would happen. The buffers are only 128kB sized but I can probably increase them by 100kB depending on the total code size. That will let proportionally more valid data be captured. I like to figure out a way to capture the entire frame but I'd need two external memories. Probably not worth it.

EDIT: Actually I could potentially compress the 8 bits in HUB first to a nibble with Merge/Split etc, but even doing that at the speeds I need has to be validated. Probably needs multiple COGs.

Wuerfel_21 · 2024-09-13 02:10

simple and braindead way: use RFBYTE->LUT->pins mode to read back the buffer and compress it onto 4 pins, then read those back in.

Wuerfel_21 · 2024-09-13 02:17

Also, just to mention, when I was logging on the packet encoder, the channel status did decode correctly.

rogloh · 2024-09-13 03:18

Yeah may need to look at that RFBYTE LUT approach. Otherwise I could use two COGs with this approach:

'total is 15 clocks per 8 nibbles processed or ~1.875 clocks per nibble excluding any minor outer loop overhead

rep #7,#512  ' 14 clocks for 8 compressed 4 "RGBC" bits saved per long
rflong x
rflong y
splitw y
splitw x
rolword z,y,#1
rolword z,x,#1
wrlut z,ptrb++

setq2 #511
wrlong 0,ptra++ ' 1 clock per 8 nibbles to write back to HUB

x res 1
y res 1
z res 1

rogloh · 2024-09-13 12:29

So I was able to use the code above to compress the data on the fly with 2 extra COGs before writing into PSRAM. I then decompress on the way out before writing to SD. So we can now capture entire frames.

This way is simpler than dealing with continuous streaming and lots of buffering and signalling between COGs with RFBYTE->LUT->Pins.

Here's the updated code.

rogloh · 2024-09-13 13:07

I also fixed a bug I found in the tm.c program related to channel status stuff where I reused a loop variable inside a loop. Fixed and updated in the zipfile above.

I do still see one case of a discontinuity initally in the capture below at channel status count = 84 but I think this is okay. It's just that my decoder has not yet seen the channel status block reset bit yet and just is starting its own count from zero in the decode which doesn't match the HDMI driver's current position. Once that first B.X bit arrives and the counter gets reset, it seems to be correctly reset each 192 samples after that (it reports from 0..191).

Console Emulation

Comments