Consensus on the P16X32B?

potatohead · 2014-04-07 06:53

I agree as well. P1 video can do some neat things, but when doing them, that's about all it does.

...which is why I asked about some subset of the P2 video system. Depending on how it gets done, a few COGS on the P1+ could potentially deliver video that makes more sense.

The most notable improvement seen in the P2 design was overall signal quality and obtaining reasonable resolutions being easy. Those are the two attributes I would consider important.

Brian Fairchild · 2014-04-07 06:54

potatohead wrote: »

...subset of the P2 video system.

Is there a description of the P2 video system in one place?

RossH · 2014-04-07 06:59

potatohead wrote: »

I agree as well. P1 video can do some neat things, but when doing them, that's about all it does.

...which is why I asked about some subset of the P2 video system. Depending on how it gets done, a few COGS on the P1+ could potentially deliver video that makes more sense.

What limits the P1 on video is lack of RAM more than anything else. This is one reason I'd prefer 512K to 256K - this would allow us to have a couple of large video buffers and still have extra application code space.

Ross.

Bill Henning · 2014-04-07 07:03

Agreed.

I also found the AUX CLUT modes and stack usage extremely valuable, and loved hubexec,INDA/INB PTRA/PTRB/PTRX/PTRY.

The pointer instructions save a ton of code space, as they replace two or three instructions with one instruction, very frequently.

hubexec not only executed at cog native speed (minus tiny delta) with the caching, it also reduced code size greatly (2 longs to 1) with every JMP and CALL... and made core relocatable.

That is why I prefer a 4 cog P2 over 16/32 cog P1+. A 512KB P1+ will have roughly the same LMM/HUBEXEC code capacity as a 256KB P2, and will also lose fast SDRAM.

potatohead wrote: »

I agree as well. P1 video can do some neat things, but when doing them, that's about all it does.

...which is why I asked about some subset of the P2 video system. Depending on how it gets done, a few COGS on the P1+ could potentially deliver video that makes more sense.

The most notable improvement seen in the P2 design was overall signal quality and obtaining reasonable resolutions being easy. Those are the two attributes I would consider important.

Bill Henning · 2014-04-07 07:08

Agreed, mostly lack of memory.

The next biggest limit is memory bandwidth (which is why P2 went to QUAD's, then WIDE's)

P1+ cogs as described in this thread only have a 32 bit wide bus to the hub (no QUAD or WIDE)

Without the hub slot assignment mechanism I proposed, the per cog hub bandwidth would be:

P1+ 16 cog variant: 200/16 = 12.5 * 4 bytes in a long = 50MB/sec (2.5x P1 cog @ 80MHz, 2x P1 cog @ 100MHz)

P1+ 32 cog variant: 200/32 = 6.25 * 4 bytes in a long = 25MB/sec (1.25x P1 cog @ 80MHz, 1x P1 cog @ 100MHz)

A BIG thank you to JRetSapDoog for finding my error above in the P1+ 32 cog calculation above and PMing me!

As you said, I went the wrong way, and as you also said, irrelevant with Chip's nice 128 bus version!

You would have been welcome to post the technical correction

640x480 VGA @ 8bpp takes 307KB for one bitmap

800x480 WVGA @ 8pbb takes 384KB for one bitmap

1024x768 XGA @ 2bpp (4 color) takes 197KB

1280x720 hd720p @ 2bpp (4 color) takes 230KB

No AUX means NO CLUT, so no 16 color mode.

And we can forget 16bpp and 24bpp graphics (maybe could do 320x240 @ 16bpp w/ 2 cogs, 24bpp w/ 3 cogs)

For comparison,

P2 8 cog variant: 200/8 = 25M * 32 bytes in a WIDE = 800MB/sec (40x P1 cog @ 80MHz, 32x P1 cog @ 100MHz)

P2 4 cog variant: 200/4 = 50M * 32 bytes in a WIDE = 1600MB/sec, but as there is no way move/copy that fast, limited 800MB/sec (40x P1 cog @ 80MHz, 32x P1 cog @ 100MHz)

RossH wrote: »

What limits the P1 on video is lack of RAM more than anything else. This is one reason I'd prefer 512K to 256K - this would allow us to have a couple of large video buffers and still have extra application code space.

Ross.

potatohead · 2014-04-07 07:08

@Ross: Lack of RAM is one limiting factor, and the primary one. The secondary one is lack of overall video signal quality. The P1 waitvid is depth limited, only able to serve up 4 8 bit pixels per waitvid, which dominates a video COG, and limits color processing / mode options in the display.

Re: P2 video system docs in one place.

Yes! And the most useful subset is below, where I've left out the pixel engine:

VIDEO
-----

Each cog has a video generator (VID) that can stream pixel data and perform colorspace
conversion and modulation, so that final video signals can be output to the 75-ohm DACs
on the I/O pins.

Pixel streaming, colorspace conversion, modulation, DAC channel driving, and DAC pin
updating are all performed in a pipelined fashion on each cycle of VID's dot clock.

VID gets it dot clock from CTRA's PLL. CTRA must be configured for PLL operation in
order for VID to operate.

The DAC channel(s) must be configured for video output by using CFGDAC0..CFGDAC3 or
CFGDACS. To set all DAC channels to video, do 'CFGDACS #%11_11_11_11'.

The I/O pins which will output the DAC channels must be configured to do so via CFGPINS.

To turn on VID and configure its DAC channel outputs, the SETVID instruction is used:

    SETVID  D/#     - Set video configuration register (VCFG)

        %00xxxxx = off (default)             SIG3    SIG2    SIG1    SIG0
                                             ----------------------------
        %01xxxxx = SDTV/HDTV/VGA             Y_R     I_G     Q_B     SYN
        %10xxxxx = NTSC/PAL S-VIDEO          YIQ     YIQ     _IQ     Y__
        %11xxxxx = NTSC/PAL COMPOSITE        YIQ     YIQ     YIQ     YIQ

        %xx0xxxx = zero-extend Y/I/Q coefficients for VGA colorspace (allows +$80, or '+1.0')
        %xx1xxxx = sign-extend Y/I/Q coefficients for NTSC/PAL/SDTV/HDTV colorspace

        %xxx0xxx = no sync on Y_R         (VGA)
        %xxx1xxx = sync on Y_R            (SDTV/HDTV)

        %xxxx0xx = no sync on I_G         (VGA)
        %xxxx1xx = sync on I_G            (SDTV/HDTV)

        %xxxxx0x = no sync on Q_B         (VGA)
        %xxxxx1x = sync on Q_B            (SDTV/HDTV)

        %xxxxxx0 = positive sync on SYN   (VGA)
        %xxxxxx1 = negative sync on SYN   (VGA)


Before any meaningful video signals can be output, you must set the colorspace coefficients
and offset levels, which are each 8 bits:

    SETVIDY D/#     - Set Y_R's offset level and RGB colorspace coefficients: $YO_YR_YG_YB

    SETVIDI D/#     - Set I_G's offset level and RGB colorspace coefficients: $IO_IR_IG_IB

    SETVIDQ D/#     - Set Q_B's offset level and RGB colorspace coefficients: $QO_QR_QG_QB


All pixels are internally handled by VID as 8:8:8 bit R:G:B data.

Colorspace conversion is performed as sum-of-products calculations on the R:G:B pixel data
and the colorspace coefficients, yielding Y, I, and Q components:

    Where R, G, B are 8-bit pixel color components and Y, I, Q are 9-bit sums (MOD 512):

        Y = (R*YR + G*YG + B*YB)/64        Where YR, YG, YB are 8-bit Y coefficients
        I = (R*IR + G*IG + B*IB)/64        Where IR, IG, IB are 8-bit I coefficients
        Q = (R*QR + G*QG + B*QB)/64        Where QR, QG, QB are 8-bit Q coefficients


    For outputs Y_R, I_G, and Q_B, offset levels are added to the Y, I, and Q components to
    properly position the final signals for SDTV/HDTV. In the case of VGA outputs, the
    offset levels are set to 0, since they are ground-based.

    For modulated outputs YIQ and _IQ, the I and Q components, treated as (I,Q), are rotated
    around (0,0) by an angle that steps 1/16th of a revolution on each dot clock, yielding
    Q'. In the case of YIQ output, the Y component (luma) and Q' (chroma) are added to form
    a composite video signal. In the case of _IQ output, an offset level is added to Q' to
    form an s-video chroma signal. For Y__ output, the Y component (luma) is output alone to
    form an s-video luma signal.


Below are some common colorspace coefficient sets. Note that these values are normalized
to 1.0. In the sum-of-products calculations, 128 is equal to 1.0, so the values below
should all be multiplied by 128 to get the proper 8-bit values for usage as coefficients.
In practice, the values will need to be scaled down so that under 75-ohm load, they will
peak at 1.0V (not 1.65V, which is 3.3V/2). This scaling will compromise DAC span by ~39%,
leaving you with a still-sufficient ~8.3 bits of DAC resolution. However, if you'd like
to keep DAC span maximal, you may leave the coefficients as originally computed and
achieve the proper voltage under load by using an external voltage divider made from two
resistors, being sure to maintain the 75 ohms source impedance.


coefficient positions
-----------------------
YR       YG       YB
IR       IG       IB
QR       QG       QB
-----------------------

RGB (VGA)     VCFG[4]=0
-----------------------
1        0        0           R sums to 1
0        1        0           G sums to 1
0        0        1           B sums to 1
-----------------------

YPbPr (HDTV)  VCFG[4]=1                             x128
-----------------------                             -------------
+.213    +.715    +.072       Y  sums to 1          +27  +92  +9
-.115    -.385    +.500       Pb sums to 0          -15  -49  +64
+.500    -.454    -.046       Pr sums to 0          +64  -58  -6
-----------------------

YPbPr (SDTV)  VCFG[4]=1
-----------------------
+.299    +.587    +.114       Y  sums to 1
-.169    -.331    +.500       Pb sums to 0
+.500    -.419    -.081       Pr sums to 0
-----------------------

YIQ (NTSC)    VCFG[4]=1
-----------------------
+.299    +.587    +.114       Y sums to 1
+.596    -.274    -.322       I sums to 0 *
+.212    -.523    +.311       Q sums to 0 *
-----------------------

YUV (PAL)     VCFG[4]=1
-----------------------
+.299    +.587    +.114       Y sums to 1
-.147    -.289    +.436       U sums to 0 *
+.615    -.515    -.100       V sums to 0 *
-----------------------

* These sets of three coefficients must be scaled by 0.608 to pre-compensate for
  CORDIC rotator expansion which will occur in the video modulator.


Once VID is configured, WAITVID instructions are used to issue contiguous commands
to keep the pixel streamer busy:

    WAITVID --> pixel streamer --> colorspace/modulator --> DAC signals --> I/O pins


VID double-buffers WAITVID commands to relax WAITVID timing requirements.

In single-task mode (on cog start or after 'SETTASK zero'), WAITVID will stall the
pipeline as it waits for VID to take the command. In multi-task mode (after
'SETTASK nonzero'), WAITVID will keep jumping back to itself until VID takes the
command, in order to free up clock cycles for other tasks. In either case, the
POLVID instruction may be used to test whether or not VID is ready for another
command, in which case WAITVID will release immediately, taking only one clock.

    POLVID  WC      - Check if VID ready for another WAITVID, C=1 if ready


Here is the WAITVID instruction:

    WAITVID D/#,S/# - Wait for VID ready, then give next command via D and S

When WAITVID executes, the D and S values are captured by VID and used for the duration
of the command.

The WAITVID instruction has special encoding so that immediate D values can range from
0 to 3583, or $DFF. These large immediate D values are helpful in reducing code size
when issuing WAITVIDs that generate sync signals.

The D operand of WAITVID has four fields:

    %AAAAAAAA_MMMM_PPPPPPP_CCCCCCCCCCCCC

             %AAAAAAAA = AUX base address for pixel lookup (0..255)
                 %MMMM = pixel mode (0..15), elaborated below
              %PPPPPPP = number of dot clocks per pixel (1..127, 0 acts as 128)
        %CCCCCCCCCCCCC = number of dot clocks in WAITVID (1..8191, 0 acts as 8192)


The D operand's %MMMM field determines which pixel mode will be used for the WAITVID and
what the S operand will be used for:

    %0000 = LIT_RGBS32    - S is used as a literal 8:8:8:8 bit R:G:B:SYNC pixel. This is
                            the only mode which can generate sync signals. In this mode,
                            only the %CCCCCCCCCCCCC bits of D are used, so all other bits
                            can be 0.

    %0001 = CLU1_RGB24    - 32 1-bit offsets in S lookup 8:8:8 pixel longs in AUX
    %0010 = CLU2_RGB24    - 16 2-bit offsets in S lookup 8:8:8 pixel longs in AUX
    %0011 = CLU4_RGB24    - 8 4-bit offsets in S lookup 8:8:8 pixel longs in AUX
    %0100 = CLU8_RGB24    - 4 8-bit offsets in S lookup 8:8:8 pixel longs in AUX
    %0101 = CLU8_RGB15    - 4 8-bit offsets in S lookup 5:5:5 pixel words in AUX
    %0110 = CLU8_RGB16    - 4 8-bit offsets in S lookup 5:6:5 pixel words in AUX

                            The CLUx modes use the 1/2/4/8-bit fields of S, lowest field
                            first, as offsets for looking up pixels in AUX, starting at
                            %AAAAAAAA. Upon completion of each pixel, the next higher
                            bit field is used, with the highest field repeating.

                            For CLU1_RGB24..CLU8_RGB24, the 1/2/4/8-bit fields are used
                            as long offsets into AUX, yielding 8:8:8 pixel data from AUX
                            data bits 23..0.

                            For CLU8_RGB15 and CLU8_RGB16, bits 7..1 of each 8-bit field
                            are used as the long offset into AUX, while bit 0 selects the
                            low or high word containing the 5:5:5 (LSB-justified) or
                            5:6:5 pixel data.

    %0111 = STR1_RGB9     - 1-bit pixels streamed from AUX select between 3:3:3 colors in
                            S[17..9] and S[26..18]. The stream start address in AUX is
                            %AAAAAAAA plus S[7..0], with S[31..27] selecting the starting
                            bit.

    %1000 = STR4_RGBI4    - 4-bit pixels are streamed from AUX starting at %AAAAAAAA plus
                            S[7..0], with S[31..29] selecting the starting nibble. The
                            pixels are colored as:

                            %0000 = black
                            %0001 = dark grey
                            %0010 = dark blue
                            %0011 = bright blue
                            %0100 = dark green
                            %0101 = bright green
                            %0110 = dark cyan
                            %0111 = bright cyan
                            %1000 = dark red
                            %1001 = bright red
                            %1010 = dark magenta
                            %1011 = bright magenta
                            %1100 = olive
                            %1101 = yellow
                            %1110 = light grey
                            %1111 = white

    %1001 = STR4_LUMA4    - 4-bit pixels are streamed from AUX starting at %AAAAAAAA plus
                            S[7..0], with S[31..29] selecting the starting nibble. The
                            pixels are used as brightness values for colors determined by
                            S[11..9]:

                            %000 = black..orange
                            %001 = black..blue
                            %010 = black..green
                            %011 = black..cyan
                            %100 = black..red
                            %101 = black..magenta
                            %110 = black..yellow
                            %111 = black..white

    %1010 = STR8_RGBI8    - 8-bit pixels are streamed from AUX starting at %AAAAAAAA plus
                            S[7..0], with S[31..30] selecting the starting byte. The
                            pixels are colored as:

                            $00..$1F = black..orange
                            $20..$3F = black..blue
                            $40..$5F = black..green
                            $60..$7F = black..cyan
                            $80..$9F = black..red
                            $A0..$BF = black..magenta
                            $C0..$DF = black..yellow
                            $E0..$FF = black..white

    %1011 = STR8_LUMA8    - 8-bit pixels are streamed from AUX starting at %AAAAAAAA plus
                            S[7..0], with S[31..30] selecting the starting byte. The
                            pixels are used as brightness values for colors determined by
                            S[11..9]:

                            %000 = black..orange
                            %001 = black..blue
                            %010 = black..green
                            %011 = black..cyan
                            %100 = black..red
                            %101 = black..magenta
                            %110 = black..yellow
                            %111 = black..white

    %1100 = STR8_RGB8     - 8-bit 3:3:2 pixels are streamed from AUX starting at %AAAAAAAA
                            plus S[7..0], with S[31..30] selecting the starting byte.

    %1101 = STR16_RGB15   - 15-bit 5:5:5 pixels are streamed from AUX starting at %AAAAAAAA
                            plus S[7..0], with S[31] selecting the starting word.

    %1110 = STR16_RGB16   - 16-bit 5:6:5 pixels are streamed from AUX starting at %AAAAAAAA
                            plus S[7..0], with S[31] selecting the starting word.

    %1111 = STR32_RGB24   - 24-bit 8:8:8 pixels are streamed from AUX starting at %AAAAAAAA
                            plus S[7..0].


For outputting SYNC signals, the LIT_RGBS32 mode must be used. Because WAITVID's D can be an
immediate value up to 3583, and because S values that generate sync all fit within 9 bits, any
fixed sync pattern can be coded directly with a few 'WAITVID #D,#S' instructions.


    DAC channel outputs (9 bits each, MOD 512) according to S input using LIT_RGBS32 mode
    --------------------------------------------------------------------------------------------------------
    Y_R     %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxxxx = YO*2 + Y                 component/vga pixel  VCFG[3] = 0
            %RRRRRRRR_GGGGGGGG_BBBBBBBB_SSSSSSSS = YO*2 + Y + SSSSSSSS*2    component sync       VCFG[3] = 1

    I_G     %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxxxx = IO*2 + I                 component/vga pixel  VCFG[2] = 0
            %RRRRRRRR_GGGGGGGG_BBBBBBBB_SSSSSSSS = IO*2 + I + SSSSSSSS*2    component sync       VCFG[2] = 1

    Q_B     %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxxxx = QO*2 + Q                 component/vga pixel  VCFG[1] = 0
            %RRRRRRRR_GGGGGGGG_BBBBBBBB_SSSSSSSS = QO*2 + Q + SSSSSSSS*2    component sync       VCFG[1] = 1

    SYN     %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxx0 = VCFG[0]*511              vga sync unasserted
            %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxx1 = !VCFG[0]*511             vga sync asserted

    Y__     %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxx00 = YO*2 + Y                 s-video luma pixel
            %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx01 = IO*2                     s-video luma sync high
            %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx1x = 0                        s-video luma sync low

    _IQ     %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx = QO*2 + Q'                s-video chroma

    YIQ     %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxx00 = YO*2 + Y + Q'            composite pixel
            %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx01 = IO*2 + Q'                composite sync high
            %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx1x = Q'                       composite sync low


The following example programs display luma-graduated color bars in various output modes:

    simple_VGA_1280x1024.spin
    simple_VGA_800x600.spin
    simple_VGA_640x480.spin
    simple_HDTV_1920x1080p.spin
    simple_HDTV_1280x720p.spin
    simple_NTSC_256x192.spin

For basic GUI, industrial control, technical information display, text, pictures, and other things that aren't game related, this subset is a slam dunk, particularly with a couple of COGS able to write to the display buffer region, offer graphics primitives, bitblit, etc... It might take 4 or 5 COGS to really nail it P1 COG style, but that's not too many of the COGS.

@Bill: AUX

Yeah, me too. That also allowed VID to be super easy to code. This opened the door for mixed mode displays being easy. Have a text / character display, with bitmap region for a picture, or perhaps incoming data display for an overall thrifty use of RAM. Doing that kind of thing on a P1 is possible, but mostly a demo / curio. Driver code itself consumed the COG, and an assist COG is needed to build the display only in most cases, requiring a third COG for reasonable library support to drive it. That's sans sprites, and on low color depths.

This P1+, assuming P1 waitvid could drive a DAC pin with 8 bits, could produce higher color depth images at moderate resolutions, depending on slot allocation as Bill mentions. (640x480), at a cost of three COGS to drive a component / HDTV signal. A composite / TV one would not make much sense, and would be grey scale only, unless we play artifacting tricks on PAL / NTSC, and that's demo / curio territory again.

Depending on how things worked out, a feed the display COG and library COG may both be needed, taking the total to 4 - 5 COGS for a reasonable display.

Adding AUX would be nice, because the VID gets easy, but the overall HUB throughput numbers would be low, limiting it's usefulness as scan line buffer data would have to end up in the COG somehow.

Maybe it doesn't make sense at the throughput numbers we get on P1+

Brian Fairchild · 2014-04-07 07:26

potatohead wrote: »

Re: P2 video system docs in one place.

Yes! And the most useful subset is below, where I've left out the pixel engine:

Thanks.

I'm very surprised to see mention of "modulation/NTSC/PAL/TV" in there. Do people still expect that in a professional chip these days? And I'll warrant that the PAL output isn't phase correct PAL.

potatohead · 2014-04-07 07:34

Actually, the TV output is insane good! PAL works as it needs to, and yes it's properly phase correct. Best PAL signal I've ever seen actually, and it's on par with high end game consoles / media players. Andy (Ariba) coded up a PAL driver that is awesome.

What really makes the difference is the necessary timing being in the silicon this time, allowing a very accurate PAL signal, and since it goes through the DACS, a very clean color / sync signal, to spec, no harmonics, etc... I've got a picture somewhere I snagged on my scope. It's perfect.

Doing the chroma modulation for TV isn't a big deal, nor a big cost. Doing it right makes smaller displays, and legacy display support, like for a retrofit, doable with a likely improvement over what was in there before. Rather than push it as a feature, it's a compatability option. And there really isn't anything out there, besides modern game consoles that can make as effective use of ordinary composite.

Here's the neat part: Because of the nice color space correction, scaled displays are easy! One can prototype on TV, then deploy on HDTV / VGA, with few changes. The driver I was building with Baggers was working toward just that. Have it calculate parameters from the system clock frequency, and then output accordingly on whatever display is desired. A PAL / NTSC TV simply gets all the horizontal pixels, and it displays whatever it can. I stuffed 2560 pixels through a TV display, no problem, just for fun, stretching it.

Everything just nicely blended and there were no issues. That same line can go into VGA / HDTV, no color changes needed. Nobody can make effective use of that many pixels of course. Just checking against the future, that's all. Was fun.

Vertically, a TV would need to line skip higher resolution displays to prototype on, but otherwise, VGA / HDTV, both scan all the lines.

Short story: Complete and easy display portability across just about anything ever made in most parts of the world. Nice. One could retrofit just about any industrial control, using the display that's in there to peak performance, if desired. And if not, nearly all of the code moves to a VGA / HDTV, with no changes.

A well written driver could just be dropped in too. Say it's running on TV. Stop that COG, fire up VGA, and now the display is running on VGA! I did this in my FPGA emulation using the monitor. Spiffy! Running both at the same time is no big deal either, but the FPGA didn't really allow for that as the DAC emulation is on COG 0 only, but it's easy to understand how that would work.

Personally, I will often prototype on a TV display to make use of inexpensive PC capture devices. Being able to do that, then deploy on the target display with few changes is sweet!

BTW: An 80's era TV display would render 80x24 text very readable. A monochrome composite display would do 640 pixels nicely, at whatever vertical you get with PAL or NTSC. 50 / 60Hz.

Modern TV's actually decode more pixels off composite, if you send them. I tested on my modern HDTV, and got very good displays at 720+ pixels horizontally. Impressive. It's the lack of spurious harmonics and clean signal that makes the most of the display electronics.

Of course, the big win is component YPbPr, which allows one to drive standard TV scan rates on nice displays, which is good for games and such. (low sweep frequency = max scan line processing time = max visual manipulation possible per frame)

Higher sweep frequencies. Ignoring games, using component, allows for consumer HDTV displays to perform very nicely. Analog decode on those is generally pixel perfect in my tests. HDMI is nice, but analog in to those displays is on par given good signal quality, which the P2 video system delivers with essentially no headaches related to IP / copy protection, etc... let somebody else package that up into a chip, we just use when warranted.

RossH · 2014-04-07 16:03

All,

Chip has now announced the specifications of the new chip, and started a discussion thread about it here.

Thanks to all that contributed to this thread. I don't mind if this thread stays open, but I suggest the moderators may want to lock it to prevent confusion with the new thread.

Ross.

hippy · 2014-04-07 16:53

RossH wrote: »

If Chip elects to add some kind of Hub Exec into the P16X32B or P32X32B, then I will include it in the specs at post #1 in this thread.

Please don't. It will completely misrepresent what people did express an opinion for.

RossH · 2014-04-07 16:59

hippy wrote: »

Please don't. It will completely misrepresent what people did express an opinion for.

Fair point.

Seems a bit academic now though - we'll just have to wait. Now that we have achieved what I think we all really wanted, I'm not going to agitate one way or the other on this issue. If Bill can convince Chip, and Chip can then come up with a solution that doesn't have the jitter problem that seems to be inherent in Bill's proposed solution, then I'd be okay with that.

Ross.

jmg · 2014-04-07 17:05

potatohead wrote: »

Actually, the TV output is insane good! PAL works as it needs to, and yes it's properly phase correct. Best PAL signal I've ever seen actually, and it's on par with high end game consoles / media players.....

Of course, the big win is component YPbPr, which allows one to drive standard TV scan rates on nice displays, which is good for games and such. (low sweep frequency = max scan line processing time = max visual manipulation possible per frame)

Interesting. Did you generate a component YPbPr test on P2 ? - How did that compare with the PAL ?

potatohead · 2014-04-07 17:42

I'm supposed to be on a little break, but my mind is thinking through some of this video related stuff.

Component video, NTSC or PAL, SD scan rates is basically pixel perfect like VGA is. You run it with the same sync signals and scan rates as you do composite, or you can do progressive scan for a non-interlaced, faster display.

I didn't build a progressive scan.

Pal compared to NTSC has greater vertical resolution, and better composite color resolution. Composite signals have fairly low color resolution on the order of a couple hundred pixels, and above that you get artifacts. Additionally, luma resolution is limited, due to the color sub carrier.

Composite bypasses all of that. On an analog display, you throw all the horizontal pixels you want at it, and the TV / display renders all it can. Most displays will do 640 pixels in color nicely. Analog ones don't care and will render whatever is in the signal, subject to their bandwidth roll off. 640 pixels horizontal is a very reasonable expectation. Higher may work on analog displays, and smarter digital ones. I got 800 or so on one of mine.

Frankly, I always wanted to do it on P1, because it makes great use of TV displays that have component inputs. On P1, it would have to be a three COG driver, and I never finished one, because three COGS was too much.

On the P1+ we will have to see what Chip does exactly, but I think this kind of display is no big deal, given the COG speed and dacs.

Component blows away composite, comparable to VGA, with the nice slow sweeps retired from VGA a long time ago. On P1+, we will have to work out a color table. Once done, it's done. Well, it can get done for a sweet spot case, leaving extremes to those who might want them.

The other really nice thing about a component display is a monochrome display is one pin! And if somebody wants to, they can run color at a lower pixel clock too, leaving luma sharp, color a bit less. I will play with that some on P1+

Monochrome TV type displays are found on many older industrial controls. One pin for that, and anything up to HDTV resolutions. This makes most TV's quite useful.

Overall, the loss of the nice color space is a bummer, but it was huge!! We can get very good results in software with the 128 bit waitvid, it should work just fin on resolutions people need. VGA, or whatever.

I like to use a composite for some work because it is portable. We have the speed and accuracy for a very good one in NTSC for sure. Given the power considerations, doing the alternative displays in software is a bit of work, but should work out just fine.

IMHO, it is much better to invest that power budget in larger faster programs. Chip is making the right cuts here. And we have software video routines done for a lot of this on P1 too, including a full on software composite NTSC done by Eric Ball, which should sing with the DAC outputs. On P1, it is impressive with the little three bit DAC P1 used standard. TV will be just fine, even excellent in NTSC land at least on P1+

If we can keep jitter under the tight PAL requirement, due to how it does color across multiple vertical scanlines, TV should be great on PAL too.

If not, we do a component YPbPr display and do it anyway, optionally using a little converter box for anyone wanting to use an older set which lacks component. Most of the sets I see here have them.

Bill Henning · 2014-04-07 17:47

Ross, I am confused.

What inherent jitter problem?

The low speed driver hub slot access? That is not a problem.

If you can come up with a case where it is, I would love to hear it, as I'd learn something.

Being able to change the modulus is a HUGE problem, as every time someone changed the modulus, it has a system wide effect, and breaks any VGA or other high bandwidth cog.

RossH wrote: »

Fair point.

Seems a bit academic now though - we'll just have to wait. Now that we have achieved what I think we all really wanted, I'm not going to agitate one way or the other on this issue. If Bill can convince Chip, and Chip can then come up with a solution that doesn't have the jitter problem that seems to be inherent in Bill's proposed solution, then I'd be okay with that.

Ross.

RossH · 2014-04-07 18:42

Bill Henning wrote: »

The low speed driver hub slot access?

That's the one.

Ross.

Bill Henning · 2014-04-07 19:09

That is a non-issue, as the alternative - high bandwidth video drivers being busted by "modulus" is a problem. See my analyses, written for jmg.

If you can show me a case where the low bandwidth jitter matters more than high bandwidth cogs, I'd appreciate it.

RossH wrote: »

That's the one.

Ross.

RossH · 2014-04-07 19:17

Bill Henning wrote: »

That is a non-issue.

Anything that breaks the determinism of the Propeller can hardly be considered a "non-issue". I'm not in a position to judge how significant it might be, but I'm prepared to leave it to Chip to decide.

Ross.

Bill Henning · 2014-04-07 19:33

Ross,

What break in determinism? jmg is the one trying to break it!

Without a logical reason, just a wish "it would be nice to use any modulus".

FOR NO GOOD REASON. NEVER MIND IT BREAKS OBEX, AND ADDS JITTER. (MODULUS DOES)

Serial driver assigned 1/128 slotmap[cnt&$7F]

The "break in determinism" is entirely due to jmg's attempt at a programmable modulus.

My proposal, fixed 128 slots, is 100% deterministic. that is why I proposed it!

Adding a programmable modulus is what breaks determinism.

You took his word on "jitter", when in fact he was introducing it, without examining the analysis.

I was just too nice to say it flat out. No more.

RossH wrote: »

Anything that breaks the determinism of the Propeller can hardly be considered a "non-issue". I'm not in a position to judge how significant it might be, but I'm prepared to leave it to Chip to decide.

Ross.

RossH · 2014-04-07 19:39

Bill Henning wrote: »

You took his word on "jitter", when in fact he was introducing it, without examining the analysis. I was just too nice to say it flat out. No more.

Actually, I took the "jitter" stuff from your posts, not jmg's. You appear to agree in several posts (e.g. here) that there is jitter in your proposal, but you keep saying it is "irrelevant", "does not matter" or is "a non-issue" without ever explaining why.

So your solution does not introduce jitter?

Ross.

Cluso99 · 2014-04-07 19:47

I cannot see the jitter either. But lets move on to discuss the new chip and its capabilities.

jmg · 2014-04-07 19:48

Bill Henning wrote: »

If you can show me a case where the low bandwidth jitter matters more than high bandwidth cogs, I'd appreciate it.

See my simple DDS and Audio case examples. - and you can have both, there is no 'or' choice here, so no risk.

RossH · 2014-04-07 21:59

jmg wrote: »

See my simple DDS and Audio case examples. - and you can have both, there is no 'or' choice here, so no risk.

I don't think Bill is going to answer jmg - can you provide a link? Never mind - found it.

Ross.

koehler · 2014-04-08 00:19

RossH wrote: »

All,

Just to be clear here - this thread is about the P16X32B, and the other chip defined by Chip in post #1. He didn't give the other one an "official" name, but I now will - I'm going to call it the P32X32B (I will amend the first post to reflect this). These are P1 variants, not P2 subsets. They may end up with some P2 enhancements in them, but that is not the point of this thread. Cluso has started another thread for discussing such hybrids (which he calls the P16+X32B) here.

This thread is for saying Yes or No to whether you want to see something that could reasonably fall into the description Chip has given of the P16X32B or P32X32B. Feel free to discuss related issues, but the main purpose of this thread is to give Chip the input he asked for on whether we could achieve consensus on such a beast.

Ross.

I vote yes for either, though x16 seems more than adequate.

EDIT- Chips' apparently made the decision.

pik33 · 2014-04-12 23:46

P2 became overweighted and overcomplicated. So yes for P16x32B.

16 cogs = no need for 2 propellers on one board
64 I/Os = no need for 2 propellers on one board
More MIPS = for example SIDCog running @128 kHz instead of 32 and fullHD video (available even on P1 buth with big difficulty).. and maybe soft core USB

So, yes, I am waiting for P1+

RossH · 2014-04-13 00:58

Tally updated: 47 in favor, 4 against.

Seems a bit academic now since the new chip has now been announced, but it is worth keeping this thread alive just in case Parallax does decide they need to use some kind of crowd-sourcing.

Ross.

Heater. · 2014-04-13 02:40

Make that 48.

Now that the PII design has back tracked, slimmed down and started again with some extra new ideas thrown in for good measure, and now that the opcode count has dropped from an insane 500 to an almost reasonable 100 and something, I'm enthusiastically for it.

RossH · 2014-04-13 03:03

Heater. wrote: »

Make that 48.

Done: 48 in favor, 4 against.

Consensus on the P16X32B?

Comments