NTSC Spiral Demo - Now with HDMI and VGA output
cgracey
Posts: 14,232
Here is a short program that displays a rotating spiral on an NTSC monitor via P16 (pin can be changed).
It uses the CORDIC for doing cartesian-to-polar conversion and the RGBI8 streamer mode for RGB-select with-5-bit intensity.
An MP4 is below.
Here is the code, NTSC driver first, then spiral program at the end:
It uses the CORDIC for doing cartesian-to-polar conversion and the RGBI8 streamer mode for RGB-select with-5-bit intensity.
An MP4 is below.
Here is the code, NTSC driver first, then spiral program at the end:
'********************** '* NTSC Spiral Demo * '********************** DAT org hubset ##%1_000001_0000011000_1111_10_00 'config PLL, 20MHz/2*25*1 = 250MHz waitx ##20_000_000 / 200 'allow crystal+PLL 5ms to stabilize hubset ##%1_000001_0000011000_1111_10_11 'switch to PLL coginit #1,##@pgm_ntsc 'launch video cog coginit #0,##@pgm_bmap 'launch bitmap cog '********************************* '* NTSC 256 x 192 x 8bpp rgbi8 * '********************************* CON f_color = 3_579_545.0 'colorburst frequency f_scanline = f_color / 227.5 'scanline frequency f_pixel = f_scanline * 400.0 'pixel frequency for 400 pixels per scanline f_clock = 250_000_000.0 'clock frequency f_xfr = f_pixel / f_clock * float($7FFF_FFFF) f_csc = f_color / f_clock * float($7FFF_FFFF) * 2.0 s = 84 'scale DAC output (s = 0..128) r = s * 1000 / 1646 'precompensate for modulator expansion of 1.646 mody = ((+38*s/128) & $FF) << 24 + ((+75*s/128) & $FF) << 16 + ((+15*s/128) & $FF) << 8 + (110*s/128 & $FF) modi = ((+76*r/128) & $FF) << 24 + ((-35*r/128) & $FF) << 16 + ((-41*r/128) & $FF) << 8 + (100*s/128 & $FF) modq = ((+27*r/128) & $FF) << 24 + ((-67*r/128) & $FF) << 16 + ((+40*r/128) & $FF) << 8 + 128 video_pin = 16 ntsc_map = $1000 DAT org ' Setup pgm_ntsc rdfast ##256*192/64,##ntsc_map 'set rdfast to wrap on bitmap setxfrq ##round(f_xfr) 'set transfer frequency setcfrq ##round(f_csc) 'set colorspace converter frequency setcy ##mody 'set colorspace converter coefficients setci ##modi setcq ##modq setcmod #%11_1_0000 'set colorspace converter to YIQ mode (composite) cogid .x 'enable dac mode in pin setnib .dacmode,.x,#2 wrpin .dacmode,#video_pin drvl #video_pin ' Field loop .field mov .x,#35 'top blanks call #.blank mov .x,#192 'set visible lines .line call #.hsync 'do horizontal sync xcont .m_rf,#0 'visible line xcont .m_av,#1 'after visible spacer djnz .x,#.line 'another line? mov .x,#27 'bottom blanks call #.blank mov .x,#6 'high vertical syncs .vlow xcont .m_hl,#2 xcont .m_hh,#1 djnz .x,#.vlow mov .x,#6 'low vertical syncs .vhigh xcont .m_ll,#2 xcont .m_lh,#1 djnz .x,#.vhigh mov .x,#6 'high vertical syncs .vlow2 xcont .m_hl,#2 xcont .m_hh,#1 djnz .x,#.vlow2 jmp #.field 'loop ' Subroutines .blank call #.hsync 'blank lines xcont .m_vi,#0 xcont .m_av,#1 _ret_ djnz .x,#.blank .hsync xcont .m_sn,#2 'horizontal sync xcont .m_bc,#1 xcont .m_cb,.c_cb _ret_ xcont .m_ac,#1 ' Data .dacmode long %0000_0000_000_1011100000000_01_00000_0 .m_sn long $7F010000+29 'sync .m_bc long $7F010000+7 'before colorburst .m_cb long $7F010000+18 'colorburst .m_ac long $7F010000+40 'after colorburst .m_vi long $7F010000+256 'visible .m_av long $7F010000+50 'after visible (400 total) .m_rf long $BF030000+256 'visible rfbyte 8bpp rgbi8 .m_hl long $7F010000+15 'vertical sync high low .m_hh long $7F010000+185 'vertical sync high high (200 total) .m_ll long $7F010000+171 'vertical sync low low .m_lh long $7F010000+29 'vertical sync low high (200 total) .c_cb long $507000_01 'colorburst reference color .x res 1 .y res 1 '************************************** '* Make spirals in 256 x 192 bitmap * '************************************** org pgm_bmap wrfast ##256*192/64,##ntsc_map 'set wrfast to wrap on bitmap .pixel mov .px,.x 'translate (x,y) to (x-256/2,y-192/2) sub .px,#256/2 mov .py,.y sub .py,#192/2 qvector .px,.py 'convert (x,y) to polar (rho,theta) getqx .px getqy .py shr .py,#32-9 'get 9 MSBs of theta add .py,.px 'add rho to twist it add .py,.z 'add z to slowly spin it mov .px,.py 'convert 6 LSBs to 5-bit up/down ramp test .px,#$20 wc if_c xor .px,#$3F and .px,#$1F shr .py,#1 'apply 3 MSBs to RGB bits and .py,#$E0 or .px,.py wfbyte .px 'write rgbi8 pixel to bitmap incmod .x,#256-1 wc 'step x if_c incmod .y,#192-1 wc 'step y if_c add .z,#1 'step z jmp #.pixel .x long 0 .y long 0 .z res 1 .px res 1 .py res 1
Comments
I made three optimizations:
1) Overlapped the CORDIC instructions to get 16 pixels through at once
2) Used the LUT for fast lookup of RGBI8 pixel values
3) Unrolled the loop for 16 pixels at a time
Now, it's running 4.7 times as fast, so it's watchable.
The NTSC works fine, but I am not able to get any of your recent hdmi samples to run... two different cables two different Visio monitors.
Signal not detected.
I have no 640x480 but 800x600 wirks jest fin.
Adjust the kaleidoscope constant for interesting effects. 22 is also good
This is a fun little demo Chip!
Here you go:
It's using the streamer's RGBI8 mode, which is a byte per pixel. The top 3 bits select the color and the lower 5 bits select the intensity. No palette needed.
I just got my EVAL board four days ago. So I can finally move out of the peanut gallery and do some actual testing/coding/playing. Yay! But it will take some time for me to get up to speed (meaning a slow walk, in my case).
In playing with the VGA version, I noticed that the LED for P56 rapidly flickers, which it doesn't do when running the NTSC version. I haven't tried to figure out why (and likely couldn't if I did), but I assume that it's some kind of "artifact" from the code. I did comment out the launch of the bitmap cog (Cog 0) and I didn't see any activity on any of the LED's when I did that, so I assume that something in the code to drive the spiral is somehow affecting P56.
A funny thing happen when I commented out the coginit line for the bitmap cog and recompiled and ran it. The colorful spiral image appeared on the screen in all its beauty. And at first, I thought it was moving, but moving very slowly. So I thought, "What? How can that be? I killed the cog!" But the perception of movement was just my eyes playing tricks on me from having stared too much at the moving spiral.
Then I thought to myself, "But wait a second. Why is the spiral there at all, moving or not?" Well, I hadn't powered down between launches of the moving version and the static version. So, apparently, the hub RAM doesn't get cleared on a reset. I may have read that somewhere back during my peanut gallery days, but it caught me by surprise.
Anyway, I manually hit reset on the board and then re-ran the static version to see if the screen would be black. But it actually displayed a somewhat ghostly-looking version of the screen that shows up from a program that I stored in the flash memory. So, that program actually briefly loaded and put some data in the hub, which was then retained when I quickly relaunched the static spiral version. It makes sense, but it's interesting.
I then set the FLASH dip switch to off (such that my program in flash would not run on reset) to see what would happen. And rather than a black screen, I got a screen filled with random "static," kind of like white noise. So that's interesting, too.
I then re-enabled the bitmap cog (Cog 0) such that it would put up the spiral and fill the corresponding area in the hub. I then commented out the bitmap cog again and cut the power ever so briefly and reloaded the board. I was trying to see if the hub SRAM elements might retain any data, but they didn't (I got snow/static again). I believe that I recall reading that data in a DRAM (oddly enough with its refreshing needs) can persist for several seconds.
Moving on, I modified the original VGA version to allow me to pass in the video base_pin as a parameter (using setq before coginit). I then launched seven instances of the VGA program with separate base pins (0, 8, 16, 24, 32, 40 and 48), such that all eight cogs were running (with the bitmap cog occupying Cog 0). I wanted to see how hot the chip got to the touch. And it did get warm but not hot, much to my (pleasant) surprise, especially with the clock frequency set to 320 MHz for the VGA version (instead of 250 MHz for the NTSC one).
Unfortunately, I only made up a single VGA adapter board to plug into the headers, so I could only look at one cog's output at a time, but I moved the adapter around to confirm that all cogs were outputting on their respective headers (though I cut the power between each move of the adapter board just to be a bit safer when moving it around). Anyway, that pretty much confirms that all the cogs are running concurrently, with Cogs 1..7 all pulling data from the same bitmap cog (Cog 0).
I wouldn't be surprised if the bitmap cog is using the most power, as it uses the cordic. Then again, the vga cogs do have to access the memory slices at a pretty good clip. I haven't wrapped my head around RDFAST entirely yet, so I don't know if a video cog can "rest" a bit with the streamer handling video data transfers in the background. So, I'm not really sure how a video cog's power consumption compares to that of the the bitmap cog.
But I did monkey around with the fclk constant to try lower values than 320 MHz. It seemed to work for everything that I tried down to 243 MHz. The monitor sometimes had to shift the image a couple of times as it tried to lock on to the signals, but it took everything that I threw at it down to 243 MHz. Below that, my little 7" monitor protested "not support" (without the "ed"). But that's pretty impressive.
By the way, things seem to run equally fast at 320 MHz as at 243 MHz. It took about 10 seconds for an arm of the spiral to go all the way around (though I didn't get out a stopwatch). There seems to be another limiting factor involved other than the fclk setting. Perhaps it's due to waiting for the cordic to finish in the bitmap cog. Anyway, I'm not clear on why Chip used 320 MHz for the VGA version (It's not even an even multiple of 25 MHz). But it seems to work well, which is to say that my monitor didn't need to shift the screen to lock on to it.
For all I know (which is practically nothing at this point) perhaps some change could be made in the video cog code to accommodate running at a lower frequency (without any shifting occurring when locking on to the signal). But even if so, eventually such lowering of the clock frequency would cut in to the rate at which the bitmap cog is able to rotate the spiral.
I do know from running Chip's (?) program to display a bitmap (said program being modified by rayman to run on the Eval B version), that I could take the clock frequency all the way down to 23 MHz (Yes, 2 MHz below 25 MHz, not 250 MHz) and the image (bird or whatever) still displayed rock solid.
Incidentally, I know that I once stated my lack of comfort with using 25 MHz for VGA (640x480) instead of 25.175 MHz or something a bit closer to it than 25 MHz. But it seems to work fine on my little monitor (no complaints from it). And 25 MHz and multiples thereof (such as 200 and 250 MHz) are pretty convenient to work with when setting the clock speed.
Still on the To-Do list for my play with this VGA spiral program is to investigate why the base_pin supposedly must be a multiple of 8 instead of 4. I presume that some instruction(s) are so limited in this particular code design (probably for speed). But theoretically, the code could be written such that the base_pin would work on multiples of 4, or am I wrong? But I haven't thought about this much, partly because the VGA adapter board that I built the other day only works on multiples of 8. I guess that I should build one that works for VGA base_pins of 4, 12, 20, 28, 36, 44 and 52 (I likely would not try for pin 60).
Anyway, sorry for the long post, but maybe it's understandable considering that I've spent so much time in the peanut gallery. And many thanks to ersmith, rayman and ozpropdev, et. al., as I've used their compilers, GUI's and loaders to play with the P2. And thanks again, Chip, for providing the VGA version of your spiral program (not to mention the design of the P2 chip). Cheers! --Jim
pin56 toggle is diagnostics for monitoring the rendering frame rate.
Go @JRetSapDoog !!
Power consumption will be relatively small for the video out cogs. Even the hubram bandwidth is not very high. The renderer is the only one doing any hard work.
As for the flicker on P56, yes, I see the pair of "toggle P56 for speed check" lines in the code now. Thanks, Evan. I'm sorry that I did not look myself earlier (that would have taken less time than posting about it).
As for the rotation rate of the spiral, on first thought, it makes sense that it would be faster at 320 MHz than 243 MHz. That's what I expected, but it just seemed like the rates were the same from a cursory look when I ran things yesterday. Too bad that I couldn't run two versions side-by-side to watch (I'd need another P2 board for that).
Anyway, I figured that I had just misjudged the rates. So I went back again today to have another look. This time, I compared running at 375 MHz with running at 250 MHz, the former being 50% faster than the latter. But I still wasn't sure if my eye would be able to reliably detect any difference from successive runs. For this, I only changed the fclk constant in the code, nothing else.
I recorded both runs on video for one minute using a countdown timer. For both of them, the whitish spiral arms "hit" the top of the screen about 11.2 times over the course of a minute. I was expecting the run at 250 MHz to be about a third less (say between 7 and 8 hits). But it wasn't (or at least it didn't seem to be). So, I'm still open to the idea that there might be some other limiting factor, even though that runs counter to expectations.
I apologize in advance if there was something wrong with my test/measurements described above. Perhaps someone else can do a check (or if a code warrior, analyze the code for any possible limiting factor, but I'd see if there's really a difference in the rate first). As for me, I'll likely go on to some other things as I break in (but hopefully not break) this new board.
By the way, as an alternative to running 7 video cogs, I tried the opposite: I launched 7 instances of the bitmap manipulation (spiral animation) code. I assumed that they would kind of clobber each other and produce strange output (after the video cog displayed the bitmap data). But the spiral ran fine (and at the same speed).
As to why they didn't step on each other, maybe the 7 instances were all in lockstep. Or not. They are all use the same pipelined CORDIC, I realize. Well, in all likelihood, it's a case of "last man standing wins," with the final cog clobbering any earlier hub writes (likely with the same or very similar data). Or maybe I botched the code to launch them or they weren't really contending for the same memory for some reason.
Anyway, I was mostly trying to see if I could get the P2 chip to run warmer. But I couldn't tell if it ran any warmer (using my fingertip) than the 7-cog video version.
Actually, I still need to open up the P2 docs to see if there's any reason why the video cog is not running flat out (like I assume that the bitmap cog is), despite the streamer having more than enough time to supply the data (from the bitmap cog). If there's an automatic mechanism to make or allow it to "rest" (and thus take less power), I'm not aware of it yet, and I don't see any wait instructions in this particular VGA video generation code, either.
Anyway, as far as heat generation and dissipation goes, so far, so good. But the EVAL is a four-layer board, with two-ounce (or maybe it's four-ounce?) copper on the bottom layer. I recall that someone (evanh?) got all eight cores/cogs running flat out and was able to consume upwards of 2 amps at 320 MHz (for Vdd). So, I guess I should try that code if I really want to heat things up. But based on the results so far, it doesn't sound like I'll be roasting any marshmallows over the chip any time soon.
Update: Okay, I just ran the speed check described above (375 MHz vs. 250 MHz) again just using a countdown timer (there was no real need to make a video). Again, the whitish spiral arms "hit" the border at the top of the screen a tad over 11 times per minute either way, so it still seems like there's some other limiting factor for the max speed of the bitmap cog. But what? It doesn't seem to make sense. I see references in the code to 10 and 47 frames per second, but I assume those are measured results, not imposed/calculated results. Assuming that Chip is not limiting things, you'd think the faster the clock, the faster the animation. Hmm, could some logic handling the spiral rotation code be directly tied to the 20 MHz crystal speed instead of the 320 MHz (or whatever) clock speed? Hope not. Or is the code imposing an FPS limit (such as 47 frames per second)? Again, sorry if I've made some mistake in measuring the rotation speeds.
At 250 MHz sys-clock:
P56 is pulsing at 18.495 Hz (39.99 36.99 frames per second). Rotation is moving at about 34.5 arms per minute, or about 14 seconds for one rotation.
If you ran six cogs for rendering and had each one handle 80 lines, based on its COGID, then you'd see a 6x speed increase, making a frame rate of 6 x 42fps =252fps. That's way faster than the VGA could keep up with.