@rogloh said:
So how full is the COG+LUT RAM now @Wuerfel_21 ?
Not sure (about to go to bed, it's like 4 am), but I think there's a decent bit of space left in both and the only things left to go in there some I/O related bits, interrupt polling and of course, the external ROM interface (and related opcode queue code). But for the ROM interface I'm already using a workalike (i.e. this is the same interface I plan for the PSRAM/HyperRAM bits):
mk_readrom_ea ' read single long, offset such that the requested
' address ends up at mk_romio_area
mov pb,mk_effaddr
mov mk_romio_length,#1
rczr pb wcz
mov mk_romio_target,mk_romio_area_ptr
if_c sub mk_romio_target,#2
if_z sub mk_romio_target,#1
mk_readrom ' arbitrary block read
shl pb,#2
zerox pb,#15 ' <- ROM SIZE HERE
add pb,##fake_rom
mov mk_memtmp1,mk_romio_target
debug("ROM read from ",uhex_long(pb)," to ",uhex_long(mk_romio_target))
rep @.readrom,mk_romio_length
rdlong mk_memtmp0,pb
add pb,#4
wrlong mk_memtmp0,mk_memtmp1
add mk_memtmp1,#4
.readrom
ret wcz
Note that opcode fetch is currently very primitive though, no queue and doesn't go through the actual ROM interface because uhhhh. So that might consume quite a couple longs.
... but if the space is not enough, I can simply move a few more opcodes to hub (or inversely, bring some opcodes or addressing modes into cog/lut (as of now, all addressing modes that are not register direct or simple (An) are a hub call))
Ok, it should be possible to convert/hook that type of thing into the existing PSRAM driver at least for initial testing.
One thing you might find is that if you can cache some code snippets read from external RAM into a (smaller) simulated ROM area stored in HUB RAM you might get better performance reading from that block whenever you can, rather than making lots of individual accesses to the external memory. There's probably some scope there for some interesting performance improvements by playing with the burst sizes read and see how much you gain from the latency savings vs the overhead of checking addresses fall within a range already available in HUB. Or you can just try the individual random reads and compare those too.
Dealing with data cache/prefetch seems a bit... ehhhh.
Caching repeated access to the same address would be easy enough, but how often does that happen?
Code of course will be fetched in bigger blocks, maybe 16 words at a time? That should be big enough to eliminate code reads in hot loops and speed up short branches.
Also, current memory usage is as such: Registers up to $1D4 and LUT up to $30f. So basically, 3/4 full.
Also, current idea for interrupt implementation is to use JSEx instructions to check for lock changes before each instruction. Only need two of them because on the megadrive, there are only really two interrupt sources: VBlank and scanline counter interrupts from the VDP. There's technically also an external interrupt line for peripherals to use, but that's kinda out-of-scope.
@Wuerfel_21 said:
... Code of course will be fetched in bigger blocks, maybe 16 words at a time? That should be big enough to eliminate code reads in hot loops and speed up short branches.
16 words read in at a time seems like a good place to start, given we can clock in at 320MB/s @ 320MHz. Can be tweaked further if required, eg. 8 or 32 etc (maybe nibble masked addresses will be fastest for testing given the getnib/setnib opcodes in the P2). A proper I-cache is hopefully not required to achieve some decent performance for this emulator, although for some it could be an interesting thing to examine if you ever wanted to execute directly from PSRAM in general.
Also, current memory usage is as such: Registers up to $1D4 and LUT up to $30f. So basically, 3/4 full.
Impressive, looks like it should fit nicely in the end.
@Wuerfel_21 said:
... Code of course will be fetched in bigger blocks, maybe 16 words at a time? That should be big enough to eliminate code reads in hot loops and speed up short branches.
16 words read in at a time seems like a good place to start, given we can clock in at 320MB/s @ 320MHz. Can be tweaked further if required, eg. 8 or 32 etc (maybe nibble masked addresses will be fastest for testing given the getnib/setnib opcodes in the P2). A proper I-cache is hopefully not required to achieve some decent performance for this emulator, although for some it could be an interesting thing to examine if you ever wanted to execute directly from PSRAM in general.
Got that code block fetching system implemented now. Size can be configured to any even number of words and checking if a branch is inside the currently cached block is simple - we already need to keep track of how many words are left, so simply subtract the branch displacement from that counter and check if it is still in the valid range (which, since the valid range is 0..MK_ROMQUE_MAX, only requires a single unsigned compare).
It will be good to simply try my existing PSRAM driver as is with your pre-fetching support to see if that has any hope of working without it being directly coupled to your COG. At 320MHz it can probably get close to 1us per request so 16 words is then 32MB/s and some of it can potentially be done in parallel to your emulator code running (so not sure how that translates to final 68k MIPs, it depends on branches and wasted reads).
I have attached some sample code showing a simple PSRAM config for the new P2EDGE and how the mailbox could be used from PASM2 for block reads/writes, but there are far more commands the mailbox can use than are shown here. Memory addresses can also be mapped differently in the banks if the address space used is to be shared with HUB RAM (sometimes it is useful to be able to do that and to reserve 0-16MB for indicating HUB addresses, other times not). This PASM2 demo already works with my driver.
{
Propeller 2 PSRAM demo (PASM)
=============================
This software contains a simple demo showing how a PASM COG can use the PSRAM driver
without requiring the overhead of the complete SPIN2 based memory driver as well.
The driver is initialized and a PASM COG then accesses the PSRAM with direct mailbox
access using the burst write and read commands to transfer data.
No QoS policy is used, so any COG can access the memory without prioritization.
Run this with DEBUG mode enabled.
}
'-----------------------------------------------------------------------
CON
_clkfreq = 160000000
DEBUG_BAUD = 115200
MAXBURST = 512 ' set to a suitable device burst size & keep under maximum CS low time of 8us
DELAY = 8 ' set to an input delay suitable for this P2 clock frequency (from 0-15)
ADDRSIZE = 25 ' number of address bits used in 32MB of PSRAM
' P2 EDGE PSRAM pin mappings
DATABUS = 40
CLK_PIN = 56
CE_PIN = 57
OBJ
psram : "psramdrv"
PUB main() | driverAddr
' patch in the proper HUB addresses for Propeller Tool (redundant for FlexSpin)
long[@startupData][5]:=@deviceData
long[@startupData][6]:=@qosData
long[@startupData][7]:=@mailboxes
' get the address of the PSRAM memory driver so we can start it
driverAddr:= psram.getDriverAddr()
' start the PSRAM memory driver and wait for it to complete initialization
coginit(NEWCOG, driverAddr, @startupData)
repeat until long[@startupData] == 0
' now just continue running as the PSRAM reader cog, pass mailbox base address as argument
coginit(cogid(), @reader, @mailboxes)
DAT
orgh
'-----------------------------------------------------------------------
' Reader Cog PASM2 code entry point
'-----------------------------------------------------------------------
reader
org 0
cogid pb 'get COG ID
mov pa, #12
mul pa, pb 'scale by 12 bytes per mailbox (3 longs)
add ptra, pa 'compute real mailbox start address for this COG
add msgaddr, ptrb 'determine real HUB RAM location of the test message
'write the test message into PSRAM
'NOTE: the setq burst write method used here can only be used without interruption and
'relies on this sequential addresses being written in order each clock cycle before the
'RAM driver poller can read mailbox data that is incomplete. If you are using the streamer
'or if interrupts could somehow delay the burst write part way through this would not work
'and you would need to ensure you write the first mailbox long after other two longs.
setnib addr, #%1111, #7'include the write burst command in the cmd+address parameter
setq #3-1 'write 3 longs to mailbox in a burst (can do this only without interruption)
wrlong addr, ptra 'trigger the write burst to external memory
pollwrite rdlong pa, ptra wcz 'check for the result
if_c jmp #pollwrite 'wait until done or error
if_nz jmp #error 'error check (optional but useful if you encounter setup problems)
'read the message back to a new address (just using this COG's own HUB space as scratch buffer)
setnib addr, #%1011, #7'setup read burst command in the cmd+address parameter
mov msgaddr, ptrb 'update destination hub address to COG's scratch area
setq #3-1 'write 3 longs to mailbox in a burst (can do this only without interruption)
wrlong addr, ptra 'trigger the read burst command
pollread rdlong pa, ptra wcz 'get the result
if_c jmp #pollread 'wait until done or error
if_nz jmp #error 'error check (optional but useful if you encounter setup problems)
'display the message we just read back with DEBUG statements
DEBUG (ZSTR(ptrb)) 'print string we just read
cogstop pb 'stop here
'if an error occurred, display the error code to help debug code
error DEBUG ("Test failed, error code=-", SDEC_LONG_(pa), 13, 10)
cogstop pb 'stop here
' 3 long structure to be written to mailbox
addr long $__0abcdef ' command & some address in external memory
msgaddr long message - reader ' HUB source/destination address for burst
msglen long msgend - message ' length in bytes
fit 502
'-----------------------------------------------------------------------
orgh
' data to be passed to driver when starting it
startupData
long _clkfreq ' use current frequency
long 0 ' optional flags
long 0 ' reset pin mask on port A for PSRAM (none)
long 0 ' reset pin mask on port B for PSRAM (none)
long DATABUS ' PSRAM data bus start pin
long deviceData ' address of devices data structure in HUBRAM
long qosData ' address of QoS data structure in HUBRAM
long mailboxes ' address of mailbox structure in HUBRAM
deviceData
' 16 bank parameters follow
long (MAXBURST << 16) | (DELAY << 12) | (ADDRSIZE - 1) ' bank 0
long (MAXBURST << 16) | (DELAY << 12) | (ADDRSIZE - 1) ' bank 1
long 0[14] ' bank 2-15
' 16 banks of pin parameters follow
long (CLK_PIN << 8) | CE_PIN ' bank 0
long (CLK_PIN << 8) | CE_PIN ' bank 1
long -1[14] ' bank 2-15
qosData
long $FFFF0000 ' cog 0 default QoS parameters
long $FFFF0000 ' cog 1 default QoS parameters
long $FFFF0000 ' cog 2 default QoS parameters
long $FFFF0000 ' cog 3 default QoS parameters
long $FFFF0000 ' cog 4 default QoS parameters
long $FFFF0000 ' cog 5 default QoS parameters
long $FFFF0000 ' cog 6 default QoS parameters
long $FFFF0000 ' cog 7 default QoS parameters
mailboxes
long 0[8*3] ' 3 longs per mailbox per COG
message byte "This message is coming at you today all the way from PSRAM!", 0
msgend byte 0
{{
-------------
LICENSE TERMS
-------------
Copyright 2020, 2021 Roger Loh
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
}}
Unrelatedly, I've just completed implementing all the instructions. Probably a million bugs, but I can't find a test program that can run from ROM so uhhhhh, let's not think about that too much. Next stop: hooking this up to the VDP so I can run some simple test programs
Status report: got video driver and VDP integrated into the binary and set up (using flexspin to compile the Spin code into high memory). Also, a simple "Hello World" ROM seems to end up at the correct STOP opcode (end of program, as opposed to an exception handler or getting stuck on some inappropriately decoded opcode), so I presume that when the VDP register interface is working, there'll be some letters on screen.
Note that there's a bunch of SEGA-isms in the code (memory map, broken TAS (though I know its broken on Amiga, too. Not sure about other 68k systems), assumption that vector table is in ROM, etc), so you'd need to rework it a bit to use it for a different purpose.
Semi-relatedly, anyone got any idea of how colors in %0000_BBB0_GGG0_RRR0 CRAM format could be expanded to RGB24 at decent speed? I guess there's the nuclear option of precomputing a big table (would only be 8K!)... That could also allow emulating an accurate luma curve (real VDP DAC is slightly nonlinear), though that might look goofy with the shadow/highlight effect (which happens on the final RGB values)
Yes, it is timing. If I insert a waitx #174 (and not one cycle less!) into the instruction loop, it works in non-debug mode....
So, I am sending an interrupt by pulsing (locking and immediately releasing) a lock inside the VDP. This is picked up by one of the event channels of the 68k cog and before loading the next instruction, is checked using JSE2. How would timing differences cause this to break after one iteration???($E0E (magenta) + 2 = $E10 (blue)) The 68k code never touches the lock, it only ever polls the SE2 flag and only in the instruction fetch code. And I can verify that the VDP is still getting and releasing the lock just fine.
@Wuerfel_21 said:
Semi-relatedly, anyone got any idea of how colors in %0000_BBB0_GGG0_RRR0 CRAM format could be expanded to RGB24 at decent speed? I guess there's the nuclear option of precomputing a big table (would only be 8K!)... That could also allow emulating an accurate luma curve (real VDP DAC is slightly nonlinear), though that might look goofy with the shadow/highlight effect (which happens on the final RGB values)
What is "decent speed"? Presumably this is for converting the palette data to P2 format as each palette word is loaded/written?
Converting %0000__B2B1B0_0__G2G1G0_0__R2R1R0_0
to say %R2R1R0_R2R1R0_R2R1__G2G1G0_G2G1G0_G2G1__B2B1B0_B2B1B0_B2B1__00000000
looks pretty horrible.
A table in hub RAM seems best option in terms of speed, saving cog/LUT space and emulating non-linearity.
Unrelatedly, have not fixed the interrupt issue, but at least narrowed it down: it seems that for some reason it ends up doing a branch to an odd address, triggering an address error and getting stuck in that. That explains why it doesn't acknowledge any more interrupts, but how it gets into that situation and why it is timing-sensitive is still mystifying me.
Okay, I think I got it? I was reading the PC from the wrong address for RTE (incremented SP correctly after reading SR, but didn't move it into EA register again). Don't ask me how that is timing-sensitive though.
Since I'll be away for a couple days, I guess this is as good as any occassion to release... something.
Introducing MegaYume version alpha 0.0.0, the terrible emulator that is entirely borken! Applause!
Not with that attitude
Features include:
- VGA graphic (just the one)
- USB keyboard input
- "Plays" Pong (it does not know the rules of Pong)
- bugs
- non-functioning spaghetti code 68000 emulator which is now called "MotoKore" because I had to figure out what mk_ actually stands for, lol. Just claim it's a mortal kombat reference or smth idk idgaf.
- LED debugging nonsense on pins 38/39 that I was too lazy to edit out
To build, use the included build.sh (or read it and just do the same thing on the terminal) and then load megayume_lower.binary. Needs a reasonably recent flexspin, but for a change from my usual releases, I don't think you need the bleeding edge version.
Keyboard is mapped as follows:
Enter -> START
X -> A
C -> B
V -> C
Directions are on arrow keys
There's also a few other contrived test ROMs bundled in - just scroll to the bottom of megayume_lower.spin2 and you'll find where the ROM file is included.
Ok, so to make the paddle not be.. like that, change the branch in line 193 from mk_cmp_common to mk_sub_common. Something something confusing parameter order.
"But Ada, what does MOVE USP have to do with the aforementioned problem?" you say. Well...
As you may be able to tell, there's two mistakes here:
direction test is wrong (should test bit 3, but tests bit 0 xor bit 1 instead)
Address registers are indexed starting from A7
So what is supposed to be a MOVE A6,USP is interpreted as MOVE USP,A13. Now, A13 doesn't exist, so it writes the USP into the jump table entry for opcodes $3xxx (MOVE.W) instead. The USP is zero, so it writes zero.
Now, the next time a MOVE.W executes, the contents of D0 onwards will be executed as P2 instructions. D0 is zero (-> NOP), but D1 contains $0000FFFF which dissassembles to _RET_ ROL $07F,INB. INB on my setup happens to be $FFFF_FF30, so it rotates the instruction at $07F by 16 bits and returns. As seen above, that instruction is the tailcall through mk_writef for MOVE.B instructions. But after the corruption it isn't a branch anymore, so execution falls through into the MOVE.W implementation, where an address error is triggered due to an odd address. This leads to an infinite loop of address errors, because people writing these test ROMs don't seem to realize that you can't simply RTE from a bus or address error due to the special stack frame (will end up branching to the fault address, which in case of address error will always be odd, causing another address error in an infinite loop (until you run the SP out of RAM))
Changes:
- fixed aforementioned bugs
- removed pin 38/39 led stuff
- included some more ROMs (check out wintest.bin and sampler.bin for some interactivity)
So, I've
a) started implementing DMA
b) tried some more ROMs
So, it transpires that Flicky actually already sorta does something. The SEGA logo, title screen and instruction screen work, but in-game the map is corrupted and it hangs immediately after Flicky comes out of the door. Oddly enough, the newer version with the DMA stuff seems to work less (no roof and floor, immediate crash), even if I disable it... odd.
So, it seems that shifting the VDP I/O code around in hub seems to affect which of the two results (immediate crash, no roof/floor/score vs. crash after entry animation) happens. Oh, this is going to be another painful one, isn't it?
Comments
Not sure (about to go to bed, it's like 4 am), but I think there's a decent bit of space left in both and the only things left to go in there some I/O related bits, interrupt polling and of course, the external ROM interface (and related opcode queue code). But for the ROM interface I'm already using a workalike (i.e. this is the same interface I plan for the PSRAM/HyperRAM bits):
Note that opcode fetch is currently very primitive though, no queue and doesn't go through the actual ROM interface because uhhhh. So that might consume quite a couple longs.
... but if the space is not enough, I can simply move a few more opcodes to hub (or inversely, bring some opcodes or addressing modes into cog/lut (as of now, all addressing modes that are not register direct or simple (An) are a hub call))
Ok, it should be possible to convert/hook that type of thing into the existing PSRAM driver at least for initial testing.
One thing you might find is that if you can cache some code snippets read from external RAM into a (smaller) simulated ROM area stored in HUB RAM you might get better performance reading from that block whenever you can, rather than making lots of individual accesses to the external memory. There's probably some scope there for some interesting performance improvements by playing with the burst sizes read and see how much you gain from the latency savings vs the overhead of checking addresses fall within a range already available in HUB. Or you can just try the individual random reads and compare those too.
Dealing with data cache/prefetch seems a bit... ehhhh.
Caching repeated access to the same address would be easy enough, but how often does that happen?
Code of course will be fetched in bigger blocks, maybe 16 words at a time? That should be big enough to eliminate code reads in hot loops and speed up short branches.
Also, current memory usage is as such: Registers up to $1D4 and LUT up to $30f. So basically, 3/4 full.
Also, current idea for interrupt implementation is to use JSEx instructions to check for lock changes before each instruction. Only need two of them because on the megadrive, there are only really two interrupt sources: VBlank and scanline counter interrupts from the VDP. There's technically also an external interrupt line for peripherals to use, but that's kinda out-of-scope.
16 words read in at a time seems like a good place to start, given we can clock in at 320MB/s @ 320MHz. Can be tweaked further if required, eg. 8 or 32 etc (maybe nibble masked addresses will be fastest for testing given the getnib/setnib opcodes in the P2). A proper I-cache is hopefully not required to achieve some decent performance for this emulator, although for some it could be an interesting thing to examine if you ever wanted to execute directly from PSRAM in general.
Impressive, looks like it should fit nicely in the end.
Got that code block fetching system implemented now. Size can be configured to any even number of words and checking if a branch is inside the currently cached block is simple - we already need to keep track of how many words are left, so simply subtract the branch displacement from that counter and check if it is still in the valid range (which, since the valid range is 0..MK_ROMQUE_MAX, only requires a single unsigned compare).
Mind you, there's a significant amount of hub code. Only what's really needed is in cog/lut right now.
It will be good to simply try my existing PSRAM driver as is with your pre-fetching support to see if that has any hope of working without it being directly coupled to your COG. At 320MHz it can probably get close to 1us per request so 16 words is then 32MB/s and some of it can potentially be done in parallel to your emulator code running (so not sure how that translates to final 68k MIPs, it depends on branches and wasted reads).
I have attached some sample code showing a simple PSRAM config for the new P2EDGE and how the mailbox could be used from PASM2 for block reads/writes, but there are far more commands the mailbox can use than are shown here. Memory addresses can also be mapped differently in the banks if the address space used is to be shared with HUB RAM (sometimes it is useful to be able to do that and to reserve 0-16MB for indicating HUB addresses, other times not). This PASM2 demo already works with my driver.
Yeah, that looks doable.
Unrelatedly, I've just completed implementing all the instructions. Probably a million bugs, but I can't find a test program that can run from ROM so uhhhhh, let's not think about that too much. Next stop: hooking this up to the VDP so I can run some simple test programs
Ok let me know when you need a PSRAM driver. I do intend to get this out before xmas and take a little break.
Status report: got video driver and VDP integrated into the binary and set up (using flexspin to compile the Spin code into high memory). Also, a simple "Hello World" ROM seems to end up at the correct STOP opcode (end of program, as opposed to an exception handler or getting stuck on some inappropriately decoded opcode), so I presume that when the VDP register interface is working, there'll be some letters on screen.
Any chance of seeing the 68K emulator PASM2 code?
When it is done and working.
Note that there's a bunch of SEGA-isms in the code (memory map, broken TAS (though I know its broken on Amiga, too. Not sure about other 68k systems), assumption that vector table is in ROM, etc), so you'd need to rework it a bit to use it for a different purpose.
Semi-relatedly, anyone got any idea of how colors in
%0000_BBB0_GGG0_RRR0
CRAM format could be expanded to RGB24 at decent speed? I guess there's the nuclear option of precomputing a big table (would only be 8K!)... That could also allow emulating an accurate luma curve (real VDP DAC is slightly nonlinear), though that might look goofy with the shadow/highlight effect (which happens on the final RGB values)Well, after tracking down a stupid issue wherein I fumbled TEST and TESTB....
BEHOLD
Well done. All coming together now, and the external memory driver is now available for you too when you need it.
Cool register description...
In other words, owie ouch the interrupts hurt my head
In particular, I'm at the point where this simple and contrived program works.... BUT ONLY IN DEBUG MODE???? And without DEBUG, it's solid blue??????
Something something timing, I fear. Or maybe the debugger messes stuff up.
Yes, it is timing. If I insert a
waitx #174
(and not one cycle less!) into the instruction loop, it works in non-debug mode....So, I am sending an interrupt by pulsing (locking and immediately releasing) a lock inside the VDP. This is picked up by one of the event channels of the 68k cog and before loading the next instruction, is checked using JSE2. How would timing differences cause this to break after one iteration???($E0E (magenta) + 2 = $E10 (blue)) The 68k code never touches the lock, it only ever polls the SE2 flag and only in the instruction fetch code. And I can verify that the VDP is still getting and releasing the lock just fine.
What is "decent speed"? Presumably this is for converting the palette data to P2 format as each palette word is loaded/written?
Converting
%0000__B2B1B0_0__G2G1G0_0__R2R1R0_0
to say
%R2R1R0_R2R1R0_R2R1__G2G1G0_G2G1G0_G2G1__B2B1B0_B2B1B0_B2B1__00000000
looks pretty horrible.
A table in hub RAM seems best option in terms of speed, saving cog/LUT space and emulating non-linearity.
Yea, table is what I went with.
Unrelatedly, have not fixed the interrupt issue, but at least narrowed it down: it seems that for some reason it ends up doing a branch to an odd address, triggering an address error and getting stuck in that. That explains why it doesn't acknowledge any more interrupts, but how it gets into that situation and why it is timing-sensitive is still mystifying me.
Okay, I think I got it? I was reading the PC from the wrong address for RTE (incremented SP correctly after reading SR, but didn't move it into EA register again). Don't ask me how that is timing-sensitive though.
Since I'll be away for a couple days, I guess this is as good as any occassion to release... something.
Introducing MegaYume version alpha 0.0.0, the terrible emulator that is entirely borken! Applause!
Not with that attitude
Features include:
- VGA graphic (just the one)
- USB keyboard input
- "Plays" Pong (it does not know the rules of Pong)
- bugs
- non-functioning spaghetti code 68000 emulator which is now called "MotoKore" because I had to figure out what
mk_
actually stands for, lol. Just claim it's a mortal kombat reference or smth idk idgaf.- LED debugging nonsense on pins 38/39 that I was too lazy to edit out
To build, use the included
build.sh
(or read it and just do the same thing on the terminal) and then loadmegayume_lower.binary
. Needs a reasonably recent flexspin, but for a change from my usual releases, I don't think you need the bleeding edge version.Keyboard is mapped as follows:
There's also a few other contrived test ROMs bundled in - just scroll to the bottom of
megayume_lower.spin2
and you'll find where the ROM file is included.Ok, so to make the paddle not be.. like that, change the branch in line 193 from
mk_cmp_common
tomk_sub_common
. Something something confusing parameter order.Great, I'll have a browse through this, maybe I can port it to my Voyager board too for fun.
Ok, so.... can you spot the problem(s)?
"But Ada, what does MOVE USP have to do with the aforementioned problem?" you say. Well...
As you may be able to tell, there's two mistakes here:
So what is supposed to be a
MOVE A6,USP
is interpreted asMOVE USP,A13
. Now, A13 doesn't exist, so it writes the USP into the jump table entry for opcodes $3xxx (MOVE.W) instead. The USP is zero, so it writes zero.Now, the next time a MOVE.W executes, the contents of D0 onwards will be executed as P2 instructions. D0 is zero (-> NOP), but D1 contains $0000FFFF which dissassembles to
_RET_ ROL $07F,INB
. INB on my setup happens to be $FFFF_FF30, so it rotates the instruction at $07F by 16 bits and returns. As seen above, that instruction is the tailcall throughmk_writef
for MOVE.B instructions. But after the corruption it isn't a branch anymore, so execution falls through into the MOVE.W implementation, where an address error is triggered due to an odd address. This leads to an infinite loop of address errors, because people writing these test ROMs don't seem to realize that you can't simply RTE from a bus or address error due to the special stack frame (will end up branching to the fault address, which in case of address error will always be odd, causing another address error in an infinite loop (until you run the SP out of RAM))To that, a new ZIP.
Changes:
- fixed aforementioned bugs
- removed pin 38/39 led stuff
- included some more ROMs (check out wintest.bin and sampler.bin for some interactivity)
So well, I'm back.
So, I've
a) started implementing DMA
b) tried some more ROMs
So, it transpires that Flicky actually already sorta does something. The SEGA logo, title screen and instruction screen work, but in-game the map is corrupted and it hangs immediately after Flicky comes out of the door. Oddly enough, the newer version with the DMA stuff seems to work less (no roof and floor, immediate crash), even if I disable it... odd.
So, it seems that shifting the VDP I/O code around in hub seems to affect which of the two results (immediate crash, no roof/floor/score vs. crash after entry animation) happens. Oh, this is going to be another painful one, isn't it?