Yeah border colour and width per region would be handy but that requires more double buffering of the information because it will be one scan line later when we use it vs when we update it and we can't clobber its working register. It means two extra longs and two more instructions to copy these values over. Plus the extra adjustments to the streamer commands and restoration would add more. Doing it once for the display helpfully uses far less COG resources.
I sort of originally envisaged the optional border to be used for managing overscan type problems for screen centering (where it would often be black, but still had the ability to get coloured according to the overall application preference/style, so they need to be active pixels), or for retro C64-type screens in text modes - though I think that platform was actually able to change the border colour per scanline IIRC.
Well, I got JET Engine going on P2 (see Projekt Menetekel thread, I even posted a little example, but it's outdated. I actually have a really cool looking example, should probably post that and the current JET code) and yep, the idea with the forever looping regions works splendid. I even implemented the high-res text feature from the PC version by toggling DOUBLE_HIGH.
I also have it generate all necessary borders itself. That also helped with performance, since it now doesn't need to worry about not drawing outside the line buffer, since there's plenty of space either side that gets covered up with border color. Not that performance is a problem to begin with. It can handle much more sprites per line than the P1 version, yet in 1 cog instead of 4.
Just had a quick look at your other Project Menetekel thread which I'd not seen before, looks interesting @Wuerfel_21 . I'm glad my driver can be used for porting that to P2.
I'm working on a PSRAM driver right now (adaptation of my HyperRAM driver) and in theory this should allow video frame buffers to be read in applications that need it, in addition to the existing HyperRAM/HyperFlash capability we already have. The next P2 Edge may have PSRAM.
Do you think having access to more external RAM would work well for your game development with the P2, or do you plan to stick to SD only solutions for more compatibility? My memory driver architecture allows any number of COGs to share a global external memory pool and the video driver COG to still get priority access if it sourcing an external frame buffer if/when required. I know for sprite drivers you don't really need full frame buffers stored in external RAM, but perhaps larger game assets can be stored there and rapidly read into hub RAM as needed with simple SPIN2 APIs or directly via mailbox setup. External RAM will be ideal for GUIs, and also for storing larger amounts of bulk information like high quality wavetable audio for midi synths etc. I expect the 32 MB of HyperFlash will be especially nice for audio. It should be much faster to get audio from external memory than SD with much less latency, and it would be handy to be able to map a filesystem to the external memory for simple access of file data (which could be copied from SD initially at startup).
@rogloh said:
Just had a quick look at your other Project Menetekel thread which I'd not seen before,
Yeah, I have a hunch that people don't check the lower forum sections often...
Do you think having access to more external RAM would work well for your game development with the P2, or do you plan to stick to SD only solutions for more compatibility?
Both, really. I have ideas that work within 512k and ones that don't. Let's see which ones will be pursued.
I know for sprite drivers you don't really need full frame buffers stored in external RAM, but perhaps larger game assets can be stored there and rapidly read into hub RAM as needed with simple SPIN2 APIs or directly via mailbox setup.
Well, I infact have been thinking about what would be the most ideal setup for rendering large assets into a framebuffer. I think it'd be like this:
Front buffer is stored in external RAM, back buffer in hub RAM
One cog processes draw commands of sorts (could be anything, scaled sprites, doom-stle wall/plane or full 3D) into a simpler form, which one or more cogs then pick up to do the low-level pixel pushing.
That command processor also manages a texture cache. When pushing a low-level command onto the queue, it requests the necessary textures. By the time the drawing cog(s) gets to the command, the texture should ideally already be loaded. If not, it'd have to block.
I don't think you actually have a facility for the latter yet. It'd need some way of submitting commands without having to wait for completion (ring buffer?) and some way of knowing when the requests complete (I think being able to give a pointer to a byte that is decremented when the command completes would be ideal)
@Wuerfel_21 said:
I don't think you actually have a facility for the latter yet. It'd need some way of submitting commands without having to wait for completion (ring buffer?) and some way of knowing when the requests complete (I think being able to give a pointer to a byte that is decremented when the command completes would be ideal)
Well I do have the facility for request lists, where a sequence of external memory transfer operations can be loaded into a list and sent to to the driver where the memory driver COG will go operate in its own context while the initiating COG can move onto other things (it doesn't always need to block) - it will be notified at the end via mailbox update or optionally also COGATN. The status of where the request list is currently up to in HUB RAM is also available as the list entries get processed. Perhaps this feature can be used for what you want?
@Wuerfel_21 said:
I don't think you actually have a facility for the latter yet. It'd need some way of submitting commands without having to wait for completion (ring buffer?) and some way of knowing when the requests complete (I think being able to give a pointer to a byte that is decremented when the command completes would be ideal)
Well I do have the facility for request lists, where a sequence of external memory transfer operations can be loaded into a list and sent to to the driver where the memory driver COG will go operate in its own context while the initiating COG can move onto other things (it doesn't always need to block) - it will be notified at the end via mailbox update or optionally also COGATN. The status of where the request list is currently up to in HUB RAM is also available as the list entries get processed. Perhaps this feature can be used for what you want?
just a COGATN seems like not enough. Because that'd mean that'd mean that that the cog making the requests needs to service an interrupt before the next memory request completes, figure out which low-level draw command that request belonged to and then decrement that "unfinished requests" byte itself.
Ok. Some software experiments could be done to see if request lists will suit various application needs or not. Basically in the non-blocking mode a requesting COG can poll the mailbox to determine when it's request list is fully complete / where it is currently up to, or can wait for the COGATN with WAITATN/POLLATN or in an ISR.
It is pretty much up to the application to prepare its own lists of requests and incorporate whatever blocking/tracking logic is required to achieve the desired outcome. As it stands today I think it's going to be a rather useful general facility, maybe not for everything, but certainly in cases where it can free the COG to do more work in parallel while memory transfers happen in the background. You could have a lot of this pre-configured or controlled in interrupts, and hub/external memory will be copied in the background to your application - it's probably quite handy for streaming buffers, automatic timed transfers, data logging, GUI operation sequences etc.
Here's a nice NTSC timing I've come up with. It uses the same 32/10*3.579 MHz pixel clock as JET Engine on P1, so slightly wide pixels. Had to add an extra blanking line to get the phase to toggle. Maybe a bit low h blanking, had to increase active to get around that one weird bug. I haven't tried it on my CRT TV yet (which presumably would actually require good blanking. OTOH, it is a "100Hz" field doubling one, so maybe not). Also, I think the NTSC output may actually be too saturated. Or maybe I'm just used to the low saturation from the P1.
prog240_timing '576x240p timing @ 60Hz with 11.4545Hz pixel clock (32/10 NTSC )
long $01553AFB ' 80x NTSC colorburst (286 MHz)
long $11118FF4 ' ^^
'_HSyncPolarity___FrontPorch__SyncWidth___BackPorch__Columns
' 1 bit 7 bits 8 bits 8 bits 8 bits
long (video.SYNC_NEG<<31) | ((24)<<24) | ( 60<<16) | ( (52+16)<<8 ) | ((512+64)/8)
'_VSyncPolarity___FrontPorch__SyncWidth___BackPorch__Visible
' 1 bit 8 bits 3 bits 9 bits 11 bits
long (video.SYNC_NEG<<31) | ( 0<<23) | ( 0<<20) | ( 14<<11) | 240
long 25<<8
long (8<<24) + (36<<16)
long round(1.0/80.0 * 4294967296.0)
Will try to figure out a similiar PAL60 timing next. P1 JET slightly squishes the image when going PAL (32/12*4.433 MHz), I want to avoid that, so I'll have to figure out a pixel clock that's close-enough to the NTSC one while also resolving to the correct amount of color cycles per line.
Oh yes, there's certainly something very very wrong about the NTSC and PAL colorspace settings. Have not yet found anything that looks good on both my TV and capture card.
I also should note that @cgracey 's driver is also wrong and is very blown out on my capture card (oddly enough only on the Svideo input), but fine on my TV.
Somehow, I ended up with some settings that seem to satisfy every device in plugging-in range. That being my big Phillips CRT TV, my small portable CRT TV, my cheapie china LCD and my Tuner/capture card.
@Wuerfel_21 said:
Oh yes, there's certainly something very very wrong about the NTSC and PAL colorspace settings. Have not yet found anything that looks good on both my TV and capture card.
I would agree my NSTC colours seem oversaturated, and in comparison PAL looks undersaturated and anemic. I took some of this originally from Chip's earlier code but I then modified it further when looking at colour bar amplitudes on a simple oscilloscope. I think Chip now has more saturated PAL in his latest dot crawl reduction driver values.
Ideally someone with good PAL colour knowledge and test equipment (like a vector scope?) could play about further to tune it right up to optimal, this stuff is tricky and doing it by eye is not the way to go. I'll have a look to see what your values look like later today @Wuerfel_21 .
Ok, here's some further refinement on the NTSC coefficients. They needed to be attenuated a little bit. The blanking levels also needed tweaking to make the S-Video input on my capture card happy (it seems to be very picky about S-Video levels in particular)
Roger,
Just tried to compile your 0.92b helloworld demo sources using Flexspin and got the following:
p2textdrv.spin2:112: error: request for member REGION_SIZE in something not an object
p2textdrv.spin2:112: error: request for member REGION_SIZE in something not an object
p2textdrv.spin2:112: error: request for member REGION_SIZE in something not an object
p2textdrv.spin2:113: error: request for member DISPLAY_SIZE in something not an object
p2textdrv.spin2:113: error: request for member DISPLAY_SIZE in something not an object
p2textdrv.spin2:113: error: request for member DISPLAY_SIZE in something not an object
p2textdrv.spin2:114: error: request for member CONTEXT_SIZE in something not an object
p2textdrv.spin2:114: error: request for member CONTEXT_SIZE in something not an object
p2textdrv.spin2:114: error: request for member CONTEXT_SIZE in something not an object
p2textdrv.spin2:118: error: request for member TIMING_SIZE in something not an object
p2textdrv.spin2:118: error: request for member TIMING_SIZE in something not an object
p2textdrv.spin2:118: error: request for member TIMING_SIZE in something not an object
It works in Pnut so I presume this is some Spin compatibility issue. My Spin foo is near non-existent so don't know how to resolve.
Ok @evanh I just downloaded the latest and tried the code in 5.3.1 and I was able to reproduce the error. Looks like something changed in flexspin that doesn't like constants being brought in from included objects that are declared later, this used to work before but it was probably bad practice.
If you swap the order of the OBJ and VAR declarations then it seems to fix it.
So instead of this:
VAR
long region[video.REGION_SIZE/4] ' text region structure
long display[video.DISPLAY_SIZE/4] ' display structure
byte context[video.CONTEXT_SIZE] ' context data for text region
byte lineBuffers[LINEBUFSIZE*2] ' space for two line buffers
word screenbuf[SCREENSIZE] ' screen buffer size
long palette[16] ' 16 colour palette
long custom[video.TIMING_SIZE/4] ' custom timing
OBJ
video: "p2videodrv" ' underlying video driver
do this
OBJ
video: "p2videodrv" ' underlying video driver
VAR
long region[video.REGION_SIZE/4] ' text region structure
long display[video.DISPLAY_SIZE/4] ' display structure
byte context[video.CONTEXT_SIZE] ' context data for text region
byte lineBuffers[LINEBUFSIZE*2] ' space for two line buffers
word screenbuf[SCREENSIZE] ' screen buffer size
long palette[16] ' 16 colour palette
long custom[video.TIMING_SIZE/4] ' custom timing
I've updated the top post to include v0.93 of this driver with this issue fixed.
@"Ken Gracey" you may wish to update your QuickBytes link to this newer version of this zipfile so others don't also encounter this issue.
I think someone some time back mentioned that they thought the automatic PLL calculations in my video driver were not working right. Might have been @evanh or @Wuerfel_21, but I'm not sure anymore who it was.
In any case, today I spent a little while to see if I could find any problems there. I created a test program to exercise the relevant computeClockMode function that determines the PLL settings for a given frequency and iterated through the desired P2 clock frequencies from 4MHz to 350MHz and printed the computed values out. Certainly with a 20 MHz crystal fitted, all integer MHz frequencies appear achievable without any frequency errors.
I do know there will be some tolerance errors introduced for certain crystal frequencies and desired frequencies when there are no multiplier & divisor values available that can provide the precise frequency (which can't be overcome), but my code should work in with that and still give you a close value, within the nominated tolerance, wherever it can.
So until someone can find something specifically wrong with this code and provide a case of where it computes the wrong PLL settings, for now I think I'll just have to assume it is still working okay.
Note: the PLL calculating code does return 0 in the case it could not find a valid PLL setting meeting the nominated frequency tolerance. This zero value does need to be checked against by the caller rather than simply using the zero value for the PLL clock mode. Perhaps that was the issue encountered? There is a comment in my driver also indicating this situation but it is somewhat buried. I could potentially define a new error code in the initDisplay code that could be checked in case this occurs...but it should not be possible with the default 20MHz crystal and 500kHz tolerance at least.
' (c) If the new P2 frequency is non-zero and the clock mode is 0, an attempt is made to auto-configure
' the PLL based on the specified new frequency. The crystal or input clock frequency are required to
' be specified as well as the tolerance in Hz. The closest PLL settings are computed based on these
' criteria and will be used in CLKSET. If the tolerance is not met then no PLL timing will be changed.
' Take that into consideration when setting up the tolerance and don't set values that are unachievable.
Here's the test program I used - you also need Eric's ers_fmt and SmartSerial objects for it to run over the serial port of a P2.
{{
PLL setup test
-------------
LICENSE TERMS
-------------
Copyright 2021 Roger Loh
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
}}
CON
' clock source used below
#0, CLKSRC_XTAL, CLKSRC_XIN
' setup one of these based on your P2 HW input clock,
' this will only be used if the PLL settings get automatically computed (see code below)
'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC
'CLKIN_HZ = _xinfreq ' also only enable CLKSRC_XIN below as CLKSRC
CLKIN_HZ = 20000000 ' assume 20MHz crystal by default
CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default)
'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal)
' parameters used when automatically determining PLL settings
TOLERANCE_HZ = 500000 ' pixel clock accuracy will be constrained by this when no exact ratios are found
MAXVCO_HZ = 350000000 ' for safety, but you could try to overclock even higher at your own risk
MINVCO_HZ = 100000000
MINPLLIN_HZ = 500000 ' setting lower can find more PLL ratios but may begin to introduce more PLL jitter
' serial port BAUD rate
BAUD = 115200
' P2 clock frequency
_clkfreq = 10000000
OBJ
uart : "SmartSerial"
f : "ers_fmt" ' useful formatting object
VAR
long xtal
PUB plltest() | desired, esc, pll, s, p, m, d, c, freq
' setup serial port
uart.start(BAUD)
send:=@uart.tx
send("PLL Test Tool, ESC aborts", f.nl())
repeat
send("Enter new P2 crystal frequency (1000000-200000000) Hz [",f.dec(CLKIN_HZ),"] : ")
xtal, esc :=getdec(CLKIN_HZ)
if esc
quit
' restrict range
if xtal < 1000000 or xtal > 200000000
send("Invalid frequency ignored",13,10)
' iterate over desired frequencies from 4-350MHz
repeat desired from 4000000 to 350000000 step 1000000
pll := computeClockMode(desired)
send("desired = ", f.dec(desired), "Hz", 9," PLL = ", f.hexn(pll,8), " ", 9)
'PLL clock mode format is this #%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS
s:=pll & 3
c:=(pll >> 2 ) & 3
p:=(pll >> 4 ) & $f
m:=(pll >> 8) & $3ff
d:=(pll >> 18) & $3f
freq:=(xtal / (d+1)) * (m+1)
send(" vco = ", f.dec(freq), "Hz", 9)
if p <> 15
freq := freq / (2*(p+1))
send("P2freq = ", f.dec(freq), "Hz", 9, " error = ", f.dec(abs(freq-desired)), "Hz", 9)
send("s=",f.dec(s), ",")
send("c=",f.dec(c), ",")
send("p=",f.dec(p), ",")
send("m=",f.dec(m), ",")
send("d=",f.dec(d),10,13)
send(13,10,255,0,0)
PRI getdec(default) : num, escape | ch, c
c := 0
repeat
ch := uart.rx()
if ch == 13
if c==0
num:=default
f.dec(default)
f.nl()
return
if ((ch == 8) or (ch == 127)) && c
send(8,32,8)
num:=num/10
c--
if ch == 27
escape := 1
return
if ch < "0" or ch > "9" or c == 10
next
send(ch)
num := num * 10 + ch-"0"
c++
PRI computeClockMode(desiredHz) : mode | vco, finput, f, p, div, m, error, bestError
bestError := -1
repeat p from 0 to 30 step 2
' compute the ideal VCO frequency f at this value of P
if p <> 0
if desiredHz > MAXVCO_HZ/p ' test it like this to not overflow
quit
f := desiredHz * p
else
f := desiredHz
if f > MAXVCO_HZ
quit
' scan through D values, and find best M, retain best case
repeat div from 1 to 64
'compute the PLL input frequency from the crystal through the divider
finput := xtal/div
if finput < MINPLLIN_HZ ' input getting too low, and only gets lower so quit now
quit
' determine M value needed for this ideal VCO frequency and input frequency
m := f / finput
' check for the out of divider range case
if m +> 1024
quit
' zero is special and gets a second chance
if m == 0
m++
' compute the actual VCO frequency at this particular M, D setting
vco := finput * m
if vco +< MINVCO_HZ
quit
if vco +> MAXVCO_HZ
next
' compute the error and check next higher M value if possible, it may be closer
error := abs(f - vco)
if m < 1024 and (vco + finput) +< MAXVCO_HZ
if error > abs(f - (vco + finput))
error := abs(f - (vco + finput))
m++
' retain best allowed frequency error and divider bits found so far
if error +< bestError and error +< TOLERANCE_HZ+1
bestError := error
mode := ((div-1) << 18) + ((m-1) << 8) + (((p/2 - 1) & $f) << 4)
' quit whenever perfect match found
if bestError == 0
quit
if bestError == 0
quit
' final clock mode format is this #%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS
if mode
' also set 15 or 30pF capacitor loading based on input crystal frequency
mode |= (1<<24) ' enable PLL
if (CLKSRC == CLKSRC_XTAL) ' enable oscillator and caps for crystal
mode |= (xtal < 16000000) ? %1111 : %1011
else
mode |= %0111 ' don't enable oscillator
Wasn't me, I never got round to trying it. I struggled with using Spin's structure and constants handling in your p2textdrv.spin2 to open a custom screen mode.
Running your test program, I do see one problem. MAXVCO_HZ is doing two jobs that needs two limits. One job is simply limiting the top allowable sys-clock frequency.
The other job is above where DIVP is always 1 (%PPPP = %1111). You're generating VCO frequencies up to 360 MHz with sys-clock frequencies much lower. I'd limit that to maybe VCO of 250 MHz. Chip has a limit of 201 MHz in his code.
There's another limit Chip has imposed on the PLL: It's the crystal frequency divided by DIVD, named Fpfd. Limit imposed is 500 kHz. For a 20 MHz crystal, that limits DIVD to a max of 40. Err, that's weird, Got the limit wrong. It's 250 kHz, therefore max DIVD is 80, so not a limit for 20 MHz crystals.
@evanh said:
Running your test program, I do see one problem. MAXVCO_HZ is doing two jobs that needs two limits. One job is simply limiting the top allowable sys-clock frequency.
The other job is above where DIVP is always 1 (%PPPP = %1111). You're generating VCO frequencies up to 360 MHz with sys-clock frequencies much lower. I'd limit that to maybe VCO of 250 MHz. Chip has a limit of 201 MHz in his code.
Well to hit any sysclk frequencies over 200MHz you always need to use p=%1111. The VCO needs to be operated this high in these cases because you can't multiply the VCO frequency value, only optionally divide it down. My algorithm prioritizes lower p values first (also treating p=%1111 as a special case and is the "lowest" value). So if the exact match is found it should have used the lowest p value meaning the VCO is operated at its lowest frequency where there is a choice. Although if there is no perfect match but a lower tolerance error setting with a higher P value it will use that as it still aims for lowest frequency error.
@evanh said:
There's another limit Chip has imposed on the PLL: It's the crystal frequency divided by DIVD, named Fpfd. Limit imposed is 500 kHz. For a 20 MHz crystal, that limits DIVD to a max of 40. Err, that's weird, Got the limit wrong. It's 250 kHz, therefore max DIVD is 80, so not a limit for 20 MHz crystals.
Yeah I have this MINPLLIN_HZ setting which is conservatively defaulting to 500KHz right now instead of 250kHz. I guess I could reduce it further which can help improve error tolerance. My only concern was increasing the PLL jitter more in that case.
@evanh said:
Running your test program, I do see one problem. MAXVCO_HZ is doing two jobs that needs two limits. One job is simply limiting the top allowable sys-clock frequency.
The other job is above where DIVP is always 1 (%PPPP = %1111). You're generating VCO frequencies up to 360 MHz with sys-clock frequencies much lower. I'd limit that to maybe VCO of 250 MHz. Chip has a limit of 201 MHz in his code.
Well to hit any sysclk frequencies over 200MHz you always need to use p=%1111. The VCO needs to be operated this high in these cases because you can't multiply the VCO frequency value, only optionally divide it down. My algorithm prioritizes lower p values first (also treating p=%1111 as a special case and is the "lowest" value). So if the exact match is found it should have used the lowest p value meaning the VCO is operated at its lowest frequency where there is a choice. Although if there is no perfect match but a lower tolerance error setting with a higher P value it will use that as it still aims for lowest frequency error.
Problem is, if a better match is found with DIVP > 1 then that gets used even if it produces a VCO frequency of 360 MHz.
@evanh said:
There's another limit Chip has imposed on the PLL: It's the crystal frequency divided by DIVD, named Fpfd. Limit imposed is 500 kHz. For a 20 MHz crystal, that limits DIVD to a max of 40. Err, that's weird, Got the limit wrong. It's 250 kHz, therefore max DIVD is 80, so not a limit for 20 MHz crystals.
Yeah I have this MINPLLIN_HZ setting which is conservatively defaulting to 500KHz right now instead of 250kHz. I guess I could reduce it further which can help improve error tolerance. My only concern was increasing the PLL jitter more in that case.
Oh, in that case, your limit isn't doing its job. I'm seeing DIVDs greater than 40. Doh! No, the log file goes right up to 40, that one is good.
I guess the current algorithm doesn't have a way to trade off accepting a higher frequency error for operating at a lower VCO frequency when there is not a perfect parameter match for the desired pixel frequency. It just looks for the set of PLL parameters that is the closest to the desired pixel frequency within the acceptable tolerance allowed and uses those (and that may include running the VCO at a higher frequency).
Comments
Yeah border colour and width per region would be handy but that requires more double buffering of the information because it will be one scan line later when we use it vs when we update it and we can't clobber its working register. It means two extra longs and two more instructions to copy these values over. Plus the extra adjustments to the streamer commands and restoration would add more. Doing it once for the display helpfully uses far less COG resources.
I sort of originally envisaged the optional border to be used for managing overscan type problems for screen centering (where it would often be black, but still had the ability to get coloured according to the overall application preference/style, so they need to be active pixels), or for retro C64-type screens in text modes - though I think that platform was actually able to change the border colour per scanline IIRC.
Well, I got JET Engine going on P2 (see Projekt Menetekel thread, I even posted a little example, but it's outdated. I actually have a really cool looking example, should probably post that and the current JET code) and yep, the idea with the forever looping regions works splendid. I even implemented the high-res text feature from the PC version by toggling DOUBLE_HIGH.
I also have it generate all necessary borders itself. That also helped with performance, since it now doesn't need to worry about not drawing outside the line buffer, since there's plenty of space either side that gets covered up with border color. Not that performance is a problem to begin with. It can handle much more sprites per line than the P1 version, yet in 1 cog instead of 4.
Just had a quick look at your other Project Menetekel thread which I'd not seen before, looks interesting @Wuerfel_21 . I'm glad my driver can be used for porting that to P2.
I'm working on a PSRAM driver right now (adaptation of my HyperRAM driver) and in theory this should allow video frame buffers to be read in applications that need it, in addition to the existing HyperRAM/HyperFlash capability we already have. The next P2 Edge may have PSRAM.
Do you think having access to more external RAM would work well for your game development with the P2, or do you plan to stick to SD only solutions for more compatibility? My memory driver architecture allows any number of COGs to share a global external memory pool and the video driver COG to still get priority access if it sourcing an external frame buffer if/when required. I know for sprite drivers you don't really need full frame buffers stored in external RAM, but perhaps larger game assets can be stored there and rapidly read into hub RAM as needed with simple SPIN2 APIs or directly via mailbox setup. External RAM will be ideal for GUIs, and also for storing larger amounts of bulk information like high quality wavetable audio for midi synths etc. I expect the 32 MB of HyperFlash will be especially nice for audio. It should be much faster to get audio from external memory than SD with much less latency, and it would be handy to be able to map a filesystem to the external memory for simple access of file data (which could be copied from SD initially at startup).
Yeah, I have a hunch that people don't check the lower forum sections often...
Both, really. I have ideas that work within 512k and ones that don't. Let's see which ones will be pursued.
Well, I infact have been thinking about what would be the most ideal setup for rendering large assets into a framebuffer. I think it'd be like this:
I don't think you actually have a facility for the latter yet. It'd need some way of submitting commands without having to wait for completion (ring buffer?) and some way of knowing when the requests complete (I think being able to give a pointer to a byte that is decremented when the command completes would be ideal)
Well I do have the facility for request lists, where a sequence of external memory transfer operations can be loaded into a list and sent to to the driver where the memory driver COG will go operate in its own context while the initiating COG can move onto other things (it doesn't always need to block) - it will be notified at the end via mailbox update or optionally also COGATN. The status of where the request list is currently up to in HUB RAM is also available as the list entries get processed. Perhaps this feature can be used for what you want?
just a COGATN seems like not enough. Because that'd mean that'd mean that that the cog making the requests needs to service an interrupt before the next memory request completes, figure out which low-level draw command that request belonged to and then decrement that "unfinished requests" byte itself.
Ok. Some software experiments could be done to see if request lists will suit various application needs or not. Basically in the non-blocking mode a requesting COG can poll the mailbox to determine when it's request list is fully complete / where it is currently up to, or can wait for the COGATN with WAITATN/POLLATN or in an ISR.
It is pretty much up to the application to prepare its own lists of requests and incorporate whatever blocking/tracking logic is required to achieve the desired outcome. As it stands today I think it's going to be a rather useful general facility, maybe not for everything, but certainly in cases where it can free the COG to do more work in parallel while memory transfers happen in the background. You could have a lot of this pre-configured or controlled in interrupts, and hub/external memory will be copied in the background to your application - it's probably quite handy for streaming buffers, automatic timed transfers, data logging, GUI operation sequences etc.
Here's a nice NTSC timing I've come up with. It uses the same 32/10*3.579 MHz pixel clock as JET Engine on P1, so slightly wide pixels. Had to add an extra blanking line to get the phase to toggle. Maybe a bit low h blanking, had to increase active to get around that one weird bug. I haven't tried it on my CRT TV yet (which presumably would actually require good blanking. OTOH, it is a "100Hz" field doubling one, so maybe not). Also, I think the NTSC output may actually be too saturated. Or maybe I'm just used to the low saturation from the P1.
Will try to figure out a similiar PAL60 timing next. P1 JET slightly squishes the image when going PAL (32/12*4.433 MHz), I want to avoid that, so I'll have to figure out a pixel clock that's close-enough to the NTSC one while also resolving to the correct amount of color cycles per line.
Oh yes, there's certainly something very very wrong about the NTSC and PAL colorspace settings. Have not yet found anything that looks good on both my TV and capture card.
I also should note that @cgracey 's driver is also wrong and is very blown out on my capture card (oddly enough only on the Svideo input), but fine on my TV.
Somehow, I ended up with some settings that seem to satisfy every device in plugging-in range. That being my big Phillips CRT TV, my small portable CRT TV, my cheapie china LCD and my Tuner/capture card.
Relatedly, here's the PAL 576x240 timing. The fractions don't quite work out, but the quality is still pretty good.
I would agree my NSTC colours seem oversaturated, and in comparison PAL looks undersaturated and anemic. I took some of this originally from Chip's earlier code but I then modified it further when looking at colour bar amplitudes on a simple oscilloscope. I think Chip now has more saturated PAL in his latest dot crawl reduction driver values.
Ideally someone with good PAL colour knowledge and test equipment (like a vector scope?) could play about further to tune it right up to optimal, this stuff is tricky and doing it by eye is not the way to go. I'll have a look to see what your values look like later today @Wuerfel_21 .
Yeah, I'm just eyeballing it with multiple displays connected. They all have different strange quirks.
Ok, here's some further refinement on the NTSC coefficients. They needed to be attenuated a little bit. The blanking levels also needed tweaking to make the S-Video input on my capture card happy (it seems to be very picky about S-Video levels in particular)
Roger,
Just tried to compile your 0.92b helloworld demo sources using Flexspin and got the following:
It works in Pnut so I presume this is some Spin compatibility issue. My Spin foo is near non-existent so don't know how to resolve.
What version of flexspin are you using?
I just tried 5.0.8 and it worked ok. I'll need to try another newer version to reproduce it or you will perhaps to resolve it.
Ok @evanh I just downloaded the latest and tried the code in 5.3.1 and I was able to reproduce the error. Looks like something changed in flexspin that doesn't like constants being brought in from included objects that are declared later, this used to work before but it was probably bad practice.
If you swap the order of the OBJ and VAR declarations then it seems to fix it.
So instead of this:
do this
I've updated the top post to include v0.93 of this driver with this issue fixed.
@"Ken Gracey" you may wish to update your QuickBytes link to this newer version of this zipfile so others don't also encounter this issue.
Doh! I even knew that not so long ago. I guess it's good to have the official source clean anyway.
I think someone some time back mentioned that they thought the automatic PLL calculations in my video driver were not working right. Might have been @evanh or @Wuerfel_21, but I'm not sure anymore who it was.
In any case, today I spent a little while to see if I could find any problems there. I created a test program to exercise the relevant computeClockMode function that determines the PLL settings for a given frequency and iterated through the desired P2 clock frequencies from 4MHz to 350MHz and printed the computed values out. Certainly with a 20 MHz crystal fitted, all integer MHz frequencies appear achievable without any frequency errors.
I do know there will be some tolerance errors introduced for certain crystal frequencies and desired frequencies when there are no multiplier & divisor values available that can provide the precise frequency (which can't be overcome), but my code should work in with that and still give you a close value, within the nominated tolerance, wherever it can.
So until someone can find something specifically wrong with this code and provide a case of where it computes the wrong PLL settings, for now I think I'll just have to assume it is still working okay.
Note: the PLL calculating code does return 0 in the case it could not find a valid PLL setting meeting the nominated frequency tolerance. This zero value does need to be checked against by the caller rather than simply using the zero value for the PLL clock mode. Perhaps that was the issue encountered? There is a comment in my driver also indicating this situation but it is somewhat buried. I could potentially define a new error code in the initDisplay code that could be checked in case this occurs...but it should not be possible with the default 20MHz crystal and 500kHz tolerance at least.
Here's the test program I used - you also need Eric's ers_fmt and SmartSerial objects for it to run over the serial port of a P2.
Wasn't me, I never got round to trying it. I struggled with using Spin's structure and constants handling in your p2textdrv.spin2 to open a custom screen mode.
Running your test program, I do see one problem. MAXVCO_HZ is doing two jobs that needs two limits. One job is simply limiting the top allowable sys-clock frequency.
The other job is above where DIVP is always 1 (%PPPP = %1111). You're generating VCO frequencies up to 360 MHz with sys-clock frequencies much lower. I'd limit that to maybe VCO of 250 MHz. Chip has a limit of 201 MHz in his code.
There's another limit Chip has imposed on the PLL: It's the crystal frequency divided by DIVD, named Fpfd. Limit imposed is 500 kHz. For a 20 MHz crystal, that limits DIVD to a max of 40. Err, that's weird, Got the limit wrong. It's 250 kHz, therefore max DIVD is 80, so not a limit for 20 MHz crystals.
Well to hit any sysclk frequencies over 200MHz you always need to use p=%1111. The VCO needs to be operated this high in these cases because you can't multiply the VCO frequency value, only optionally divide it down. My algorithm prioritizes lower p values first (also treating p=%1111 as a special case and is the "lowest" value). So if the exact match is found it should have used the lowest p value meaning the VCO is operated at its lowest frequency where there is a choice. Although if there is no perfect match but a lower tolerance error setting with a higher P value it will use that as it still aims for lowest frequency error.
Yeah I have this MINPLLIN_HZ setting which is conservatively defaulting to 500KHz right now instead of 250kHz. I guess I could reduce it further which can help improve error tolerance. My only concern was increasing the PLL jitter more in that case.
Problem is, if a better match is found with DIVP > 1 then that gets used even if it produces a VCO frequency of 360 MHz.
Oh, in that case, your limit isn't doing its job. I'm seeing DIVDs greater than 40. Doh! No, the log file goes right up to 40, that one is good.
It should be 350MHz, not 360MHz. In some P2 document I read that the VCO can be operated in the range from 100-350MHz.
Where? What values were you using in the test to see this?
Oops, yes, 350 MHz. Didn't double check the detail. Still, that's far from ideal.
I guess the current algorithm doesn't have a way to trade off accepting a higher frequency error for operating at a lower VCO frequency when there is not a perfect parameter match for the desired pixel frequency. It just looks for the set of PLL parameters that is the closest to the desired pixel frequency within the acceptable tolerance allowed and uses those (and that may include running the VCO at a higher frequency).
I think I've found the right fix. The MAXVCO_HZ here needs to be reduced to the safer VCO frequency:
Yeah that may help. I'll look at it more tomorrow.
Hehe, been playing around and run out of multiplier!