TMS9918 Renderer Example

rogloh · 2021-04-12 08:34

@macca said:Don't we have an instruction to duplicate bits (0101 -> 00110011), don't we ?

setword data, data, #1
mergew  data

macca · 2021-04-12 08:58

@rogloh said:

@macca said:Don't we have an instruction to duplicate bits (0101 -> 00110011), don't we ?
setword data, data, #1
mergew  data

Ah, so are two instructions to duplicate bits ...

Thanks, that solves a lot of problems!

TonyB_ · 2021-04-12 10:42

@macca said:
@rogloh said:

@macca said:Don't we have an instruction to duplicate bits (0101 -> 00110011), don't we ?
setword data, data, #1
mergew  data
Ah, so are two instructions to duplicate bits ...

Thanks, that solves a lot of problems!

MOVBYTS can also be used:

'duplicate bits in bytes 1 & 0

        movbyts x,#%%1010
        mergew  x

'duplicate bits in bytes 3 & 2

        movbyts x,#%%3232
        mergew  x

macca · 2021-04-13 08:41

Several updates today:

Added an initial implementation of the sprite collision check. Needs to do more tests but looks working so far, I have added a simple demo tms9918_collide.spin2 with two square sprites moving up and down, when they collide one sprite changes color.

Optimized sprite drawing a bit. The timing dropped drammatically to 5_354 cycles per line (with sprite collisions implemented).

Added vga driver, single-cog with interrupts. Hope I have not messed the video syncs, it was tested only on my TV with VGA input.

TonyB_ · 2021-04-16 10:39

@macca, I think your sprite code is not 100% correct.

To discover which sprites are active, the VDP reads the first byte, y, of the Sprite Attribute Table (SAT) for each sprite in turn, requiring 32 separate VRAM reads. If y = D0H then sprite processing is terminated. Six VRAM read cycles are allocated for each of the possible four active sprites. The first four cycles read the sprite's SAT entry in full (y is read again), the fifth reads the first pattern byte and the sixth the second pattern byte (if 16x16).

Thus the VDP does not read the x or colour/early clock bytes for every sprite. If y matches the current line then the sprite is active even if it is entirely off the left side of the display or its colour is transparent.

The lowest-priority / highest-numbered sprite must be drawn first and the highest-priority / lowest-numbered sprite last. Is this what you do?

A quicker way to clear your collision buffer is to do a fast block read from hub RAM above 512K but below top 16K. Non-existent hub RAM has a read value of zero.

macca · 2021-04-16 11:07

@TonyB_ said:
To discover which sprites are active, the VDP reads the first byte, y, of the Sprite Attribute Table (SAT) for each sprite in turn, requiring 32 separate VRAM reads. If y = D0H then sprite processing is terminated. Six VRAM read cycles are allocated for each of the possible four active sprites. The first four cycles read the sprite's SAT entry in full (y is read again), the fifth reads the first pattern byte and the sixth the second pattern byte (if 16x16).

Thus the VDP does not read the x or colour/early clock bytes for every sprite. If y matches the current line then the sprite is active even if the sprite is entirely off the left side of the display or the sprite colour is transparent.

I need to check that.

The lowest-priority / highest-numbered sprite must be drawn first and the highest-priority / lowest-numbered sprite last. Is this what you do?

No, the priority check is correct but the drawing is reversed, I noticed that few days ago when looking at the openMsx code, I have fixed the sprite draw code and now it matches an example I wrote with openMsx. Will be more complicated to fix in the P1 implementation.

A quicker way to clear your collision buffer is to do a fast block read from hub RAM above 512K but below top 16K. Non-existent hub RAM has a read value of zero.

Good because I'm nearly running out of cog-ram with the composite driver. I managed to free some space but every long is important right now.

macca · 2021-04-17 10:07

Archive updated with sprite draw fixes. Now sprites should be checked and drawn correctly, including transparent and off-screen.

TonyB_ · 2021-04-18 10:11

@macca said:
Archive updated with sprite draw fixes. Now sprites should be checked and drawn correctly, including transparent and off-screen.

It would be better to replace the P1-style CALLD with either CALL or CALLPA/PB.

Change this

                mov     sprt,vbuf+3    wz
    if_nz       calld   draw_sprite_ret, #draw_sprite
'...
                jmp     draw_sprite_ret
draw_sprite_ret long    0
'...
                mov     cy,#24
                calld   border_ret, #border
'...
                mov     cy,#10
                calld   blank_ret, #blank
'...
                djnz    cy,#blank
                jmp     blank_ret
blank_ret       long    0
'...
                djnz    cy,#border
                jmp     border_ret
border_ret      long    0

to this

                mov     sprt,vbuf+3    wz
    if_nz       call    #draw_sprite
'...
                ret
'...
                callpa  #24,#border     'or callpb
'...
                callpa  #10,#blank      'or callpb
'...
       _ret_    djnz    pa,#blank       'or pb
'...
       _ret_    djnz    pa,#border      'or pb

N.B. Not all changes shown.

macca · 2021-04-19 05:45

@TonyB_ said:

It would be better to replace the P1-style CALLD with either CALL or CALLPA/PB.

Ok, that requires an explanation: I used the p1-style call because I had the impression that p2-calls would interfere with the interrupt, specifically with the resume instruction (resiN). I changed the code to use call/callpa as per your suggestion and it still works, but I would like to know why it works ? AFAIK there is only one stack, a call on the interrupt routine will corrupt the stack on the main program. I have an idea why it works in this specific case, but I want to know if there is something else I have not considered.

TonyB_ · 2021-04-19 09:11

@macca said:

@TonyB_ said:

It would be better to replace the P1-style CALLD with either CALL or CALLPA/PB.

Ok, that requires an explanation: I used the p1-style call because I had the impression that p2-calls would interfere with the interrupt, specifically with the resume instruction (resiN). I changed the code to use call/callpa as per your suggestion and it still works, but I would like to know why it works ? AFAIK there is only one stack, a call on the interrupt routine will corrupt the stack on the main program. I have an idea why it works in this specific case, but I want to know if there is something else I have not considered.

The hardware stack is 8 longs deep.

macca · 2021-04-19 09:25

@TonyB_ said:
The hardware stack is 8 longs deep.

Yes, I know that, but what happens if:

Main is in a subroutine
Interrupt and call to another subroutine
Interrupt exits with resiN before returning from subroutine
Main returns from subroutine

I believe this will not work: stack should point to the interrupt subroutine return address and not the main program return address.

It works in this specific case because the line rendering is synchronized with the line output so no calls are made from the interrupt routine while the main program is rendering a line. I believe if the two were truly asynch it will crash sooner or later.

TonyB_ · 2021-04-19 09:46

The programmer must ensure an interrupt routine does not mess up the stack.

Incidentally, the stack is implemented as a single shift register and there is no stack pointer.

macca · 2021-04-19 10:09

@TonyB_ said:
The programmer must ensure an interrupt routine does not mess up the stack.

And that brings back the p1-style call, that is the only safe way to make subroutine calls using resi feature.
My deduction was correct after all.

evanh · 2021-04-19 12:52

Macca,
That's some reckless coding if intentional. At the very least POP the stack before the RETI so the CALL is then negated. Oh, it's a RESI. Ugh, why have the RESI in the subroutine at all? Return then do the RESI.

macca · 2021-04-19 15:35

Archive updated with 5th sprite and vsync flags and enabled blank bit.
Registers are updated at each scanline to be more in-line with the original chip.
Updated the palette with what is used in openMSX.

With these changes I think the emulation is feature complete.

TonyB_ · 2021-04-20 09:37

@macca said:

@TonyB_ said:
The hardware stack is 8 longs deep.

Yes, I know that, but what happens if:

Main is in a subroutine

Interrupt and call to another subroutine

Interrupt exits with resiN before returning from subroutine

Main returns from subroutine

I believe this will not work: stack should point to the interrupt subroutine return address and not the main program return address.

It works in this specific case because the line rendering is synchronized with the line output so no calls are made from the interrupt routine while the main program is rendering a line. I believe if the two were truly asynch it will crash sooner or later.

Interrupt call & return mechanism does not use the hardware stack. Ideally an ISR would not do a CALL, but if so a POP between stages 2 and 3 would restore the stack.

macca · 2021-04-20 10:23

@TonyB_ said:
Interrupt call & return mechanism does not use the hardware stack. Ideally an ISR would not do a CALL, but if so a POP between stages 2 and 3 would restore the stack.

Ok, I'll try to be more clear: I'm not going to change the calls, your suggestion may work because of the specific state of the code (read synchronization between main and interrupt) but is subject to random and hard to track failures if the conditions changes. I'll stick with p1-style calls because in this context are safer.

TMS9918 Renderer Example

Comments