Shop Learn P1 Docs P2 Docs
Console Emulation - Page 19 — Parallax Forums

Console Emulation

1161719212239

Comments

  • @rogloh said:
    Also be sure that you meet all PSRAM timing requirements between accesses in your arbiter. You probably will due to the overhead instructions but it might be worth double checking anyway.

    UPDATE: And check that any of the page boundary crossings are ok too.

    What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.

    The boundary check/split code is 1:1 copied from megayume. Interestingly, I at some point realized that the page size was set too large, but fixing it changed literally nothing, so uhhhh. Code looks like this (is set as ATN interrupt. Yes, the graphics reads are appropriately shielded from being interrupted, I hope)

    ma_68krequest
                  setq #2
                  rdlong ma_prog_addr,#_progrq_addr
                  'debug("Got request ",uhex_long(ma_prog_addr,ma_prog_length,ma_prog_target))
    
    
                  wrfast ma_bit31,ma_prog_target
                  mov ma_itmp2,ma_prog_addr
                  add ma_itmp2,ma_prog_length
                  mov ma_itmp1,ma_prog_addr
                  bith ma_itmp1,#0 addbits 9
                  add ma_itmp1,#1 ' ma_itmp1 has start of next page
                  cmpsub ma_itmp2,ma_itmp1 wcz ' IF C, ma_itmp2 now has longs that cross over boundary
            if_c  sub ma_prog_length,ma_itmp2
                  call #ma_psram_read68k
       if_nc_or_z cogatn ma_mk_cogatn_val
       if_nc_or_z reti2
                  mov ma_prog_addr,ma_itmp1
                  mov ma_prog_length,ma_itmp2
                  call #ma_psram_read68k
                  cogatn ma_mk_cogatn_val
                  reti2
    
    ma_psram_read68k
                  mov ma_itmp0,#(8+PSRAM_WAIT)*2
                  shl ma_prog_length,#2
                  add ma_itmp0,ma_prog_length
                  setbyte ma_prog_addr,#$EB,#3
                  ' Reverse nibble order
                  splitb  ma_prog_addr
                  rev     ma_prog_addr
                  movbyts ma_prog_addr, #%%0123
                  mergeb  ma_prog_addr
                  drvh  #PSRAM_SELECT
                  drvl  ma_psram_pinfield
                  xinit ma_psram_addr_cmd,ma_prog_addr
                  wypin ma_itmp0,#PSRAM_CLK
                  setq ma_nco_fast
                  xcont #PSRAM_WAIT*2+PSRAM_DELAY,#0
                  shr ma_prog_length,#1
                  setword ma_psram_read68k_cmd,ma_prog_length,#0
                  waitxmt
                  fltl ma_psram_pinfield
                  setq ma_nco_slow
                  xcont ma_psram_read68k_cmd,#0
                  waitxfi
            _ret_ drvl #PSRAM_SELECT
    
    

    "Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.

  • roglohrogloh Posts: 4,437

    @Wuerfel_21 said:
    What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.

    I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.

    "Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.

    Fingers crossed.

  • @rogloh said:

    "Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.

    Fingers crossed.

    Final continue, if survives a full games worth of "Marco Rossi wins by doing nothing", it'll go on to the overnight attract mode test. It survived it before I started with the audio, so that's the benchmark.

  • @rogloh said:

    @Wuerfel_21 said:
    What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.

    I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.

    2 AM mind says that that's ~12 cycles. I think the case of an interrupt hitting right before the non-interrupt code is about to enter the protected section might only do 10 cycles? Eh we're overclocking, anyways.

  • roglohrogloh Posts: 4,437
    edited 2022-05-26 00:58

    Well at 330MHz or so each P2 clock is ~3ns, so 18ns is 6 clock cycles, or 3 P2 instructions. You should be good as long as the next transfer delays by this number of instructions between CS going high and CS going low.

    Update: Is it still running okay with the updated timing?

  • Oh owie, no idea how that doubled in my mind. Yeah there's certainly 6 cycles in any case.

    And yep, still going. I think it works. Oh man, P_SYNC_IO, gotta be my new favorite semi-undocumented magic flag.

  • roglohrogloh Posts: 4,437

    Just let it run while you hopefully get some sleep.

  • YanomaniYanomani Posts: 1,505
    edited 2022-05-26 04:25

    @rogloh said:

    @Wuerfel_21 said:
    What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.

    I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.

    "Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.

    Fingers crossed.

    If it's a P2-EC32MB Rev.A, its BOM states APS6404L-3SQR-ZR (ZR-code stands for USON-8L pkg).

    APMEMORY website lists "APS6404L-3SQR QSPI PSRAM" - "APM SPI 3V PSRAM Datasheet.pdf - Rev. 2.4 Oct 08, 2021" as the current datasheet for that part:

    https://apmemory.com/wp-content/uploads/APM_PSRAM_E3_QSPI-APS6404L-3SQR-v2.4-KGD_PKG.pdf

    Figure 19 shows tCPH as only defined for "input", so that limit seems to be valid just for Writes.

    Item 8.6 "Command Termination" (page 10) gives more detail on how reads and writes are expected to come to an end, while showing tCHD as the limiting factor, and also carry the following warning: "Not doing so will block internal refresh operations and cause
    memory failure."

    Perhaps I'm wrong, but, as suggested by those timing diagrams, it worths noting those PSRAMS seems to differ quite a bit from Hypers, in the sense CLK appears to be held "LOW" while ~CE is "HIGH"; IIRC, Hypers explicitly allows for continuous clocking while they're unselected.

    Hope it helps...

    Henrique

    Addit: Convoluted as it appear to be, Table 14.5 (Page 22, DC Characteristics) shows the way Standby Current is meassured (3. Standby current is measured when CLK is in DC low state. )

    Again, I can be wrong, but, patents being what they really are (... subdued narratives..., never completelly descriptives...), Standby mode needs to be forcefully waranted between valid operations, or sh...t will happen...

  • roglohrogloh Posts: 4,437

    If it's a P2-EC32MB Rev.A, its BOM states APS6404L-3SQR-ZR (ZR-code stands for USON-8L pkg).

    APMEMORY website lists "APS6404L-3SQR QSPI PSRAM" - "APM SPI 3V PSRAM Datasheet.pdf - Rev. 2.4 Oct 08, 2021" as the current datasheet for that part:

    Yeah that's the same one then. 18ns CE# high time is required between transfers.

  • evanhevanh Posts: 13,392
    edited 2022-05-26 07:54

    @Yanomani said:
    Item 8.6 "Command Termination" (page 10) gives more detail on how reads and writes are expected to come to an end, while showing tCHD as the limiting factor, and also carry the following warning: "Not doing so will block internal refresh operations and cause memory failure."

    Perhaps I'm wrong, but, as suggested by those timing diagrams, it worths noting those PSRAMS seems to differ quite a bit from Hypers, in the sense CLK appears to be held "LOW" while ~CE is "HIGH"; IIRC, Hypers explicitly allows for continuous clocking while they're unselected.

    Don't think there's any problems comparing/reusing there. All Hyper transactions are two edges, 16-bit, per cycle. The diagrams show these as a low-to-high-to-low sequence. And I've not seen any continuous clocking for any of these memories. The only code I've seen continuous clocking used is for the Ethernet chip that ManAtWork did an impressive driver for, and that relies on preambled data start sequence.

  • Well, survived the night. That solves that then.

  • roglohrogloh Posts: 4,437

    Awesome. Glad we worked it out.

  • evanhevanh Posts: 13,392

    ewww, Roger, I'm just having a nosey at your driver doc, MemoryDriverDocumentation_v09b.pdf, and trying to work out how to use it and how Ada has used it ... not matching up very well so far. The first function Ada's code calls is "getDriverAddr()" but that isn't even a documented function!

  • Just found an interesting pickle: Apparently the YM2149 SSG (and thus presumably the SSG subcomponent of Yamaha FM chips?) has a 32-step envelope generator while the original AY-3-8910 only has 16 (matching the fixed volume levels). Did @Ahle2 ever realize this? Well, I think the tweak isn't going to be too difficult, but still, how odd.

  • Well, got SSG going (and removed the YM2612 bitcrush/distortion). Haven't messed with the envelope steps yet. Anyways, it can do this now.

  • Slight problem encountered: There may not actually be enough cog space for YM2608 Rhythm channels (let alone YM2610 ADPCM). FM section and its envelopes are too dang complex.

  • I guess some space could be saved by un-unrolling (rerolling?) the SSG code. That would violate all my principles that, as aforementioned, I discovered by transcending the boundary between time and space, but if it has to be done...

    It has to be done tomorrow because holy shat where did the hours go?

  • roglohrogloh Posts: 4,437

    Where's the fun if it is easy to get it to fit first time around.

  • Well, it didn't fit after all and I had to move the rhythm generator to hubexec... (The YM2610 ADPCM needs to be hubexec, anyways, it's a decent bit more complex and then there's the ADPCM-B channel to worry about (which YM2608 technically also has, but at least on PC-98 soundcards, the bus for it wasn't hooked up to anything (and in later YM2608 compatible chips, they just removed it), so I'd have to dig to even find a soundtrack that uses it)

    So anyways, here's the mega-unpolished YM2608 (so unpolished I didn't even bother to change the file header) for personal enjoyment. Excuse the poor tune selection, too lazy to hunt for anything that doesn't immediately come to mind. Not that my example tune collections are ever of particularly notable quality.

  • Thoughts regarding ADPCM: There's a bit of a pickle when it comes to handling key-on: If the channel is already active, it will have already requested the next data block, so a key-on command can really only be handled when that request is acknowledged (which may take a sample or two (for ADPCM-A), since ADPCM uses the bandwidth left over from the sprite reads at the end of each scanline). That's gonna be a somewhat complicated state machine I reckon. (Also there may be weird phasing issues if there are samples that are supposed to be triggered simultaneously but end up out of alignment due to high bus usage. Not sure if that ever happens (two samples being triggered at once even. That wouldn't happen with a usual track-per-channel type music driver (I guess the ability to trigger more than one sample at a time is simply a holdover from the YM2608 rhythm generator))).

  • pik33pik33 Posts: 1,732
    edited 2022-05-28 16:57

    I have similar situation in the player: the main portion of the bandwidth is used by the video driver, so I have to wait for a sample up to 20-25 us, way too long for the driver to work. That's why I implemented a simple cache system in the audio driver. If hit, it loads the sample from hub, if miss it preloads 256 next bytes in one PSRAM transfer. As it is way too simple, the driver still can have to wait these 30 us, but it does this only when cache miss, which then can be absorbed by a 512-sample buffer.

  • And once again we're having fun:

    Putting a NOP anywhere above this ALTR sequence seems to break the Z80 entirely. Below doesn't matter. I'm having a headache already....

  • Ah yes, the wonders of having no proper namespacing.

  • pik33pik33 Posts: 1,732

    Another esoteric nop trap. I also had this kind of stuff: any nop anywhere before the particular point made the code crash and I still don't understand why and how.

  • So, uh, anyways, ADPCM. Kinda works after I hit my head against a stupid typo for a while. I think the levels (FM vs SSG vs ADPCM) are all wrong and the HMG sounds a bit odd and it's only the ADPCM-A channels (ADPCM-B uses a different codec and has a frequency register). IDK will fix tomorrow am tired.

    Here's some 2AM smoothbrained Metal Slug for demonstration purposes:

  • Wuerfel_21Wuerfel_21 Posts: 3,012
    edited 2022-05-30 15:16

    Welp, got the ADPCM-B to kinda work (still freaks out occassionally) and also made the sample end register work (I think). Still so many audio issues. Anyways, I put together another ZIP because why not. Maybe @pik33 wants to have a go at it with his new edge or smth. (Perhaps interesting to know if the timing tweak is consistent across edge units)

    Eitherhow, take a look at the horridness that is the ADPCM-B implementation (I'm pretty sure there's still a bug in this (or the register handler)):

    opn_adpcm_b_run
                  'ret
                  loc ptra,#adpb_regbase
    
                  setq #1
                  rdlong opn_arg3,ptra[0] ' get repeat flag, pan flags, start/end address
    
                  rdlong opn_tempValue3,#adpcm_pollbox+6*4 wz
                  'modcz _clr,_nz wcz ' If outstanding memory rq, ignore commands
            {if_x0} rdbyte opn_tempValue,ptra[8] wcz ' NZ-> got command. NC-> key-on, C-> dump
            'if_x0 drvh #38
            if_x0 wrbyte #0,ptra[8]
            if_x0 bitl adpcm_active,#7
            if_10 neg adpb_nextblock,#1
            if_00 getword adpb_nextblock,opn_arg3,#1
            if_00 shl adpb_nextblock,#4
    
                  tjf adpb_nextblock,#.norq
                  tjnz opn_tempValue3,#.nosample ' Stall if previous request isn't done
                  add adpb_nextblock,adpcm_b_base
                  debug(ubin_byte(adpcm_active),uhex_long(adpb_nextblock,adpcm_b_base,adpb_ptr,adpb_phase))
                  wrlong adpb_nextblock,#adpcm_pollbox+6*4
                  testbn adpcm_active,#7 wc ' C if key-on
            if_nc neg adpb_nextblock,#1
            if_c  bith adpcm_active,#7
            if_c  jmp #.key_on
    .norq
    
                  rdword opn_tempValue,ptra[##9]
                  'debug(uhex_long(adpb_phase,opn_tempValue))
                  add adpb_phase,opn_tempValue
                  cmpsub adpb_phase,##$1_0000 wc
            if_nc jmp #.nosample
    
                  mov adpb_oldsample,adpb_newsample
    
                  '' Get sample and decode
                  mov opn_tempValue,adpb_ptr
                  and opn_tempValue,#63
                  shr opn_tempValue,#1 wc ' odd/even nibble
                  add opn_tempValue,##adpcm_buffers+6*32
                  rdbyte opn_tempValue,opn_tempValue
            if_nc shr opn_tempValue,#4
                  bitl opn_tempValue,#3 addbits 7 wcz
                  altgb opn_tempValue,#adpb_adaptionTable
                  getbyte opn_tempValue3
                  shl opn_tempValue,#1
                  add opn_tempValue,#1
    
                  mul opn_tempValue,adpb_step
                  shr opn_tempValue,#3
                  sumc adpb_newsample,opn_tempValue
                  fges adpb_newsample,##-32768
                  fles adpb_newsample,##+32767
    
                  mul adpb_step,opn_tempValue3
                  shr adpb_step,#6
                  fge adpb_step,#127
                  fle adpb_step,##24576
    
                  '' Increment ptr and check end address
                  add adpb_ptr,#1
                  getword opn_tempValue,opn_arg4,#0
                  shl opn_tempValue,#9
                  sub opn_tempValue,adpb_ptr wz
                  testbn opn_arg3,#4 wc ' Check repeat flag
            if_11 bitl adpcm_active,adpa_which
            if_11 wrbyte #0,ptra[6] ' set end flag
        if_not_01 jmp #.norep_retrigger
    .key_on       ' If we enter with NC, we repeat. If we enter with C, we key-on...
                  wrbyte #1,ptra[6] ' clear end flag
                  mov adpb_oldsample,#0
                  mov adpb_newsample,#0
                  mov adpb_phase,#0 ' Doing this for repeat feel wrong..
                  getword adpb_ptr,opn_arg3,#1
                  shl adpb_ptr,#9
                  mov adpb_step,#127
                  mov opn_tempValue,#0 ' for safety
    .norep_retrigger
                  testb adpcm_active,#7 wc
                  ' For repeat, we actually need to check one block early for the load
                  cmp opn_tempValue,#32 wz
            if_11 getword adpb_nextblock,opn_arg3,#1
            if_11 shl adpb_nextblock,#4
            if_11 skip #%1111
    
                  test adpb_ptr,#31 wz
            if_11 mov adpb_nextblock,adpb_ptr
            if_11 shr adpb_nextblock,#5
            if_11 add adpb_nextblock,#1
    
    .nosample
                  ' Linear interpolation (slightly low quality for speends)
                  getword opn_tempValue,adpb_phase,#0
                  shr opn_tempValue,#2
                  mov opn_tempValue2,adpb_oldsample
                  scas adpb_newsample,opn_tempValue
                  adds opn_tempValue2,0-0
                  scas adpb_oldsample,opn_tempValue
                  subs opn_tempValue2,0-0
    
                  rdbyte opn_tempValue,ptra[$B]
                  muls opn_tempValue2,opn_tempValue
                  sar opn_tempValue2,#8+1
                  muls opn_tempValue2,##OPN_VOLUME_MUL
                  sar opn_tempValue2,#14
    
                  testb adpcm_active,#7 wz
                  testb opn_arg3,#7+8 wc
            if_11 add opn_outl,opn_tempValue2
                  testb opn_arg3,#6+8 wc
            if_11 add opn_outr,opn_tempValue2
    
                  ret
    
    
  • pik33pik33 Posts: 1,732
    edited 2022-05-30 17:15

    Maybe @pik33 wants to have a go

    I compiled this and I have a picture (Neo Yume alpha 03... .... .... file1 fail: 4!) on the monitor: I have to stuff the SD card with this MSLUG thing now. (and find it somewhere in the darknet first)

  • Wuerfel_21Wuerfel_21 Posts: 3,012
    edited 2022-05-30 17:33

    You also need a NeoGeo AES BIOS file at /sd/neoyume/neogeo/neo-epo.bin. You'd usually find all the BIOS variants together in a neogeo.zip.

    Also, side note: For reasons entirely unrelated to the continued nonexistence of a preprocessor in propeller tool, some configuration is duplicated between config and neoyume_lower. (In particular, audio pins)

  • pik33pik33 Posts: 1,732

    Yes, I now have the error, lack of neo-epo.bin

  • hinvhinv Posts: 1,148

    @JonnyMac said:
    I'm not a gamer -- retro or otherwise -- but if someone built a P2 version of the Hydra I would buy one just to play with the code you PASM wizards are writing. Neat stuff.

    Like a P2 based DemoBoard? I would love something like that too, especially if it came from Parallax themselves so it becomes a standard of sort.

Sign In or Register to comment.