Sequence below generated using first 256 XORO32 outputs when seed = 1. Byte input to TMDS encoder is low byte of XORO32 output, i.e. PRN[7:0] and PRN[31:8] are ignored.
For xoroshiro32++ [14,2,7,5] as in first silicon and v32i image
8b = 8-bit input
10b = 10-bit output
dispi = disparity input from previous stage
dispo = disparity output to next stage
disparity = number of '1' bits - number of '0' bits for 8-bit value
ke4pjw,
My understanding is that he's producing the data with the FPGA (too slow to actually work with a TV), saving it and then playing it back on the real chip at 250Mhz via the streamer.
ke4pjw,
My understanding is that he's producing the data with the FPGA (too slow to actually work with a TV), saving it and then playing it back on the real chip at 250Mhz via the streamer.
Current plan is to generate HDMI output in software, write it to hub RAM, then stream it out at 250MHz, all on the P2.
I'm synthesizing the data with a bit of software on the P2 silicon, laying it into memory, then streaming it out at 250MHz. I've confirmed that the software algorithm is the same as the one in the fpga, which is the same that comports with the data that TonyB_ posted. Should be able to try it tonight.
Here's my HDMI test setup. I got a little breakout board on Amazon with screw terminals:
Does that have resistors for the 10mA HDMI current drive ?
Just straight connections. Do you think we need resistors at 250 megahertz?
Yes, I would start with a 'correct' HDMI drive, which is 10mA (500mV into far end's 50 Ohms) & if that works, then you can see if it tolerates a full 3V3 swing.
3v3 is likely to be outside the common mode range of any receiver, not to mention you are asking P2 to sink up to ~ 66mA if driving that 50 Ohms direct, and you are above 200mW peak power in that 50 ohms... vs 5mW in 10mA drive.
Here's my HDMI test setup. I got a little breakout board on Amazon with screw terminals:
Does that have resistors for the 10mA HDMI current drive ?
Just straight connections. Do you think we need resistors at 250 megahertz?
Yes, I would start with a 'correct' HDMI drive, which is 10mA (500mV into far end's 50 Ohms) & if that works, then you can see if it tolerates a full 3V3 swing.
3v3 is likely to be outside the common mode range of any receiver, not to mention you are asking P2 to sink up to ~ 66mA if driving that 50 Ohms direct, and you are above 200mW peak power in that 50 ohms... vs 5mW in 10mA drive.
For testing you may be able get by with 0.1uF coupling caps on all 8 TMDS lines. This works with Lattice FPGAs on 3 of my DVI/HDMI monitor devices at least and I think the signal there is 3.3Vpp. But the closer you can get to a real CML interface the better, otherwise you might be spending time debugging signal integrity stuff not the protocol.
I could just use the 123.75-ohm DAC in digital output mode. That would be like having 123.75-ohm resistors in series with the I/O pins. Do you guys think that would be okay?
I could just use the 123.75-ohm DAC in digital output mode. That would be like having a 123.75-ohm resistor in series with an I/O pin. Do you guys think that would be okay?
Try both. Start with the simplest, and if that works, you can try other ideas until something breaks (hopefully, not irreversibly )
I could just use the 123.75-ohm DAC in digital output mode. That would be like having a 123.75-ohm resistor in series with an I/O pin. Do you guys think that would be okay?
Try both. Start with the simplest, and if that works, you can try other ideas until something breaks (hopefully, not irreversibly )
I calculate the internal resistance of 123.75 ohm will give ~950mV differential voltage at the receiver, which is less than 1200mV max (DC) in the spec. Give it a go!
Although this is probably irrelevant at this time, CML drivers for DVI/HDMI are open-collector. Will the DAC mode be slower than an OC?
One last comment today about the 10-bit control codes output during blanking. DVI mode is simpler than HDMI and this is how I think it works:
CTL0 = 1101010100 Output this always on R & G channels
HSync & Vsync inactive on B channel
CTL1 = 0010101011 Hsync active on B channel
CTL2 = 0101010100 Vsync active on B channel
CTL3 = 1010101011 Hsync & Vsync active on B channel
Although this is probably irrelevant at this time, CML drivers for DVI/HDMI are open-collector. Will the DAC mode be slower than an OC?
It could be a good idea to test both CMOS and open collector.
CMOS will have more predictable tr/tf, but open collector could allow users to modify Vio, at least a little, and it gives 3v3 miss-match tolerance.
There are still clamp diodes present, so Vio would not want be set below ~ 2.7V, in open collector.
I've got the P2 generating the data in memory, then streaming out only three different lines:
1) display line
2) hidden line
3) hidden line with VSYNC
It's pumping out the 8-bit patterns of {R+, R-, G+, G-, B+, B-, CLK+, CLK-} at 250MHz and one of my desk displays is taking it in, without any special resistors or drive levels - just CMOS outputs.
Here's the code that implements, in software, the hardware TMDS algorithm that will be in the next silicon (and much simpler to use). If any of you with a P2 hook P[7:0] to an HDMI connector, this program will generate a display:
con base = $1000 'base address of line bytes
lineticks = 800 'ticks per line
linebytes = lineticks * 10 '*10 for 10b output per tick
'
'
' HDMI display test for early P2 silicon with 20MHz crystal and
' P[7:0] connected to HDMI {R+, R-, G+, G-, B+, B-, CLK+, CLK-}
'
' The next version of silicon will have TMDS hardware to compose LVDS stream on the fly.
'
dat org
hubset ##%1_000001_0000011000_1111_10_00 'enable crystal+PLL, stay in 20MHz+ mode
waitx ##20_000_000/100 'wait ~10ms for crystal+PLL to stabilize
hubset ##%1_000001_0000011000_1111_10_11 'now switch to PLL running at 250MHz
setxfrq ##$80000000 'set streamer to output on every clock
mov dira,#$FF 'P[7:0] = {R+, R-, G+, G-, B+, B-, CLK+, CLK-}
'
'
' Write visible line into hub
'
vline wrfast #0,line_vis 'ready to write line data
mov bal_r,#0 'reset balance accumulators
mov bal_g,#0
mov bal_b,#0
mov z,##640 'write visible pixels
.prep callpa #0*256/3,#pretty
setbyte rgb,pa,#3
callpa #1*256/3,#pretty
setbyte rgb,pa,#2
callpa #2*256/3,#pretty
setbyte rgb,pa,#1
call #convert_rgb
djnz z,#.prep
mov z,#16 'write pre-hsync
.hpre mov color_r,sync0
mov color_g,sync0
mov color_b,sync0
call #output_rgb
djnz z,#.hpre
mov z,#16 'write hsync
.hsync mov color_r,sync0
mov color_g,sync0
mov color_b,sync1
call #output_rgb
djnz z,#.hsync
mov z,#128 'write post-sync
.hpost mov color_r,sync0
mov color_g,sync0
mov color_b,sync0
call #output_rgb
djnz z,#.hpost
'
'
' Write invisible line into hub
'
iline mov z,##640+16
.hpre mov color_r,sync0
mov color_g,sync0
mov color_b,sync0
call #output_rgb
djnz z,#.hpre
mov z,#16
.hsync mov color_r,sync0
mov color_g,sync0
mov color_b,sync1
call #output_rgb
djnz z,#.hsync
mov z,#128
.hpost mov color_r,sync0
mov color_g,sync0
mov color_b,sync0
call #output_rgb
djnz z,#.hpost
'
'
' Write sync line into hub
'
sline mov z,##640+16
.hpre mov color_r,sync2
mov color_g,sync2
mov color_b,sync2
call #output_rgb
djnz z,#.hpre
mov z,#16
.hsync mov color_r,sync2
mov color_g,sync2
mov color_b,sync3
call #output_rgb
djnz z,#.hsync
mov z,#128
.hpost mov color_r,sync2
mov color_g,sync2
mov color_b,sync2
call #output_rgb
djnz z,#.hpost
'
'
' Output screen data over and over
'
rdfast line_cnt,line_vis 'ready to loop-read visible line buffer
.field mov x,#480 'ready for 480 visible lines
.vis xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to invisisble line
if_z fblock line_cnt,line_inv
djnz x,#.vis
mov x,#10 'ready for 10 invisible lines
.pre xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to sync line
if_z fblock line_cnt,line_syn
djnz x,#.pre
mov x,#2 'ready for 2 sync lines
.sync xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to invisisble line
if_z fblock line_cnt,line_inv
djnz x,#.sync
mov x,#33 'ready for 33 invisible lines
.post xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to visisble line
if_z fblock line_cnt,line_vis
djnz x,#.post
jmp #.field
'
'
' Make pretty color from pa and z into x
'
pretty mov x,z
sca x,##round(256.0/480.0 * float($10000))
add pa,0
shl pa,#24
qrotate #127,pa
getqx pa
_ret_ bitnot pa,#7
'
'
' Data
'
sync0 long %1101010100 '
sync1 long %0010101011 ' hsync
sync2 long %0101010100 'vsync
sync3 long %1010101011 'vsync + hsync
line_mod long $1080<<16 + linebytes 'RFBYTE streamer mode
line_cnt long linebytes / 64
line_vis long base + linebytes*0 'visible line address
line_inv long base + linebytes*1 'invisible line address
line_syn long base + linebytes*2 'sync line address
'
'
' Convert 8:8:8:0 RGB into 10-byte TMDS pattern in hub
'
convert_rgb getbyte color,rgb,#3
mov bal,bal_r
call #color_tmds
mov bal_r,bal
mov color_r,color
getbyte color,rgb,#2
mov bal,bal_g
call #color_tmds
mov bal_g,bal
mov color_g,color
getbyte color,rgb,#1
mov bal,bal_b
call #color_tmds
mov bal_b,bal
mov color_b,color
output_rgb mov x,#0
.loop shr color_r,#1 wc
bitc y,#7
bitnc y,#6
shr color_g,#1 wc
bitc y,#5
bitnc y,#4
shr color_b,#1 wc
bitc y,#3
bitnc y,#2
cmp x,#5 wc
bitc y,#1
bitnc y,#0
wfbyte y
incmod x,#9 wc
if_nc jmp #.loop
ret
'
'
' Convert R/G/B in color[7:0] into TMDS in color[9:0]
'
color_tmds ones bal_m,color 'ones > 4 || ones == 4 && !color[0]?
cmp bal_m,#4 wcz
if_z testb color,#0 wc 'c=0 for XNOR
testb color,#0 wz
if_z_eq_c bitnot color,#1
testb color,#1 wz
if_z_eq_c bitnot color,#2
testb color,#2 wz
if_z_eq_c bitnot color,#3
testb color,#3 wz
if_z_eq_c bitnot color,#4
testb color,#4 wz
if_z_eq_c bitnot color,#5
testb color,#5 wz
if_z_eq_c bitnot color,#6
testb color,#6 wz
if_z_eq_c bitnot color,#7
ones bal_m,color 'get bal_m
bitc color,#8
sub bal_m,#4 wcz 'sign of bal_m into c, (bal_m == 0) into z
if_nz cmp bal,#0 wz 'get (bal_m == 0 || bal == 0) into z
testbn bal,#31 xorc 'get (bal_m[31] == bal[31]) into c
wrc bal_sign 'get (bal_m[31] == bal[31]) into bal_sign
if_z testbn color,#8 wc 'inv_m = bal_zero ? !m[8] : bal_sign
bitc color,#9 'finalize TMDS pattern
if_c xor color,#$FF
shl bal_m,#1 'adjust bal
testbn color,#8 wc
if_z jmp #.sum
testb bal_sign,#0 wc
testb color,#8 wz
if_c_eq_z sumnc bal,#2
.sum _ret_ sumc bal,bal_m
'
'
' Data
'
color res 1
bal res 1
bal_m res 1
bal_sign res 1
bal_r res 1
bal_g res 1
bal_b res 1
color_r res 1
color_g res 1
color_b res 1
rgb res 1
x res 1
y res 1
z res 1
t res 1
'
'
' Output screen data over and over
'
rdfast line_cnt,line_vis 'ready to loop-read visible line buffer
.field mov x,#480 'ready for 480 visible lines
.vis xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to invisisble line
if_z fblock line_cnt,line_inv
djnz x,#.vis
mov x,#10 'ready for 10 invisible lines
.pre xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to sync line
if_z fblock line_cnt,line_syn
djnz x,#.pre
mov x,#2 'ready for 2 sync lines
.sync xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to invisisble line
if_z fblock line_cnt,line_inv
djnz x,#.sync
mov x,#33 'ready for 33 invisible lines
.post xcont line_mod,#3 'output visible line
cmp x,#1 wz 'if last line, update fblock to visisble line
if_z fblock line_cnt,line_vis
djnz x,#.post
jmp #.field
So, this code streams out bytes from the FIFO at full blast, on every clock, and the address and block size can be changed on the fly using FBLOCK, to switch between buffers that are being streamed out. You could output 32 bits per clock, as well, if you wanted. And every cog could do the same concurrently,
I think this HDMI thing is done. Now, I'm going to see about clock gating to get dynamic power consumption down.
It works!
...
It's pumping out the 8-bit patterns of {R+, R-, G+, G-, B+, B-, CLK+, CLK-} at 250MHz and one of my desk displays is taking it in, without any special resistors or drive levels - just CMOS outputs.
Good result !
Did you also check the DAC in output mode, and a 270 Ohm series R, and also a 0.1uF in series with 270R as mentioned above ?
The R+C may be a good way to get some extra supply tolerance, and avoid phantom power effects.
It works!
...
It's pumping out the 8-bit patterns of {R+, R-, G+, G-, B+, B-, CLK+, CLK-} at 250MHz and one of my desk displays is taking it in, without any special resistors or drive levels - just CMOS outputs.
Good result !
Did you also check the DAC in output mode, and a 270 Ohm series R, and also a 0.1uF in series with 270R as mentioned above ?
The R+C may be a good way to get some extra supply tolerance, and avoid phantom power effects.
I just needed to know that the digital guts work. They do, without any special analog treatment, so the rest can be sorted later. We"ve got lots of options there.
Comments
For xoroshiro32++ [14,2,7,5] as in first silicon and v32i image
8b = 8-bit input
10b = 10-bit output
dispi = disparity input from previous stage
dispo = disparity output to next stage
disparity = number of '1' bits - number of '0' bits for 8-bit value
No, disparity applies only to 8-bit input during first stage of encoding and low 8 bits of provisional output during second stage.
Both the FPGA and the software versions check out okay with your random data.
Now I'll see if I can stream out a software-generated 640x480 HDMI signal from the P2 running at 250MHz. I need to get some sleep, first.
Here's my HDMI test setup. I got a little breakout board on Amazon with screw terminals:
My understanding is that he's producing the data with the FPGA (too slow to actually work with a TV), saving it and then playing it back on the real chip at 250Mhz via the streamer.
Current plan is to generate HDMI output in software, write it to hub RAM, then stream it out at 250MHz, all on the P2.
That captured bitstream goes to a current rev A P2 for testing on a TV.
Just straight connections. Do you think we need resistors at 250 megahertz?
Are the outputs digital? I think you'll need series resistors, 100 ohm minimum. HDMI receivers have 50 ohm pull-ups to 3.3V.
Yes, I would start with a 'correct' HDMI drive, which is 10mA (500mV into far end's 50 Ohms) & if that works, then you can see if it tolerates a full 3V3 swing.
3v3 is likely to be outside the common mode range of any receiver, not to mention you are asking P2 to sink up to ~ 66mA if driving that 50 Ohms direct, and you are above 200mW peak power in that 50 ohms... vs 5mW in 10mA drive.
270 ohm series resistor, then?
Try both. Start with the simplest, and if that works, you can try other ideas until something breaks (hopefully, not irreversibly )
I calculate the internal resistance of 123.75 ohm will give ~950mV differential voltage at the receiver, which is less than 1200mV max (DC) in the spec. Give it a go!
One last comment today about the 10-bit control codes output during blanking. DVI mode is simpler than HDMI and this is how I think it works:
It could be a good idea to test both CMOS and open collector.
CMOS will have more predictable tr/tf, but open collector could allow users to modify Vio, at least a little, and it gives 3v3 miss-match tolerance.
There are still clamp diodes present, so Vio would not want be set below ~ 2.7V, in open collector.
I've got the P2 generating the data in memory, then streaming out only three different lines:
1) display line
2) hidden line
3) hidden line with VSYNC
It's pumping out the 8-bit patterns of {R+, R-, G+, G-, B+, B-, CLK+, CLK-} at 250MHz and one of my desk displays is taking it in, without any special resistors or drive levels - just CMOS outputs.
Here's the code that implements, in software, the hardware TMDS algorithm that will be in the next silicon (and much simpler to use). If any of you with a P2 hook P[7:0] to an HDMI connector, this program will generate a display:
So, this code streams out bytes from the FIFO at full blast, on every clock, and the address and block size can be changed on the fly using FBLOCK, to switch between buffers that are being streamed out. You could output 32 bits per clock, as well, if you wanted. And every cog could do the same concurrently,
I think this HDMI thing is done. Now, I'm going to see about clock gating to get dynamic power consumption down.
Looks simple, clean, easy to understand. Nice work Chip.
Good result !
Did you also check the DAC in output mode, and a 270 Ohm series R, and also a 0.1uF in series with 270R as mentioned above ?
The R+C may be a good way to get some extra supply tolerance, and avoid phantom power effects.
I just needed to know that the digital guts work. They do, without any special analog treatment, so the rest can be sorted later. We"ve got lots of options there.