Shop OBEX P1 Docs P2 Docs Learn Events
The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Page 51 — Parallax Forums

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

14849515354144

Comments

  • jmgjmg Posts: 15,175
    edited 2014-04-30 19:41
    If you break the fixed round-robin hub-access order, what you'll end up with is utter chaos.

    Lucky there is a very simple solution to that - use the default setting. Mapping is a safe-superset feature.

    I do not really follow the 'logic' of avoiding user control, just in case, whilst nobbling power users, who may want to choose a Bandwidth/Power balance themselves for each particular design.

    There are already shiploads of things users have to manage in a system when merging P1+ objects like Memory and Pin mappings, as well as Clock and Power Budgets.
    This is not apps running on a iPhone.
  • jmgjmg Posts: 15,175
    edited 2014-04-30 19:52
    cgracey wrote: »
    Going from a 16x16 multiplier to a 24x24 increased the total ALU area by only 10%.

    With extra space provided for 68% area utilization (to accommodate buffering and clock tree insertion downstream), the total ALU size w/24-bit multiplier is now 0.25 square mm per cog. That was for 160MHz, which is way faster than the RAMs could run (320 MHz). The RAMs are only good for 250MHz.

    I think you are saying the COG Multiplier is now going to be 24x24b, and still not affect the 200MHz/100MOP targets ?
    Sounds great.


    Have the 32b COG counter/adders been confirmed at comfortably > 200MHz ? (ie enough margin to include the Reload and Saturate mode options, talked about earlier, which can help Software tasking & general purpose timing)
  • SeairthSeairth Posts: 2,474
    edited 2014-04-30 19:59
    jmg wrote: »
    I do not really follow the 'logic' of avoiding user control, just in case, whilst nobbling power users, who may want to choose a Bandwidth/Power balance themselves for each particular design.

    I agree. As long as any enhanced access mechanism supports a safe fallback to the current behavior, then it shouldn't be an impediment to those that want to keep it simple like it is now. (And, of course, doesn't slow down the clock frequency, consume a lot of power, or delay the release of the chip.)
  • potatoheadpotatohead Posts: 10,261
    edited 2014-04-30 20:13
    "do not understand the logic"

    Having COG code always work is a big deal. Either we do that, or we don't. It's binary.
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2014-04-30 20:31
    jmg, Seairth:
    I'm arguing against it because it's a feature that isn't implemented yet, may cost a fair bit, and has some unknowns (like how does it interact with the mathblock stuff?). I am a power user that would love to have high bandwidth between the HUB memory and COGs, but I also am moving closer and closer to the Heater camp of not wanting to keep adding stuff and instead get it done ASAP.

    The mess that *will* happen if slot sharing/mooching/stealing/whatever is implemented is that objects that would normally all work together just fine, might not anymore. You won't be able to use any video driver, with any serial driver, with any QSPI driver, with any USB driver. You say this is fine, and just another axis of making things work along with pins and whatnot. I disagree, because it's not a hard will or won't work. It's a might work some/most of the time, but fail sometimes and/or in odd ways. It will lead to horrible experiences for users.

    Sometimes the technically best things are not the actual best things for the market. Sometimes, it's better to have a nice ecosystem, instead of a messy one that happens to have some niche use cases.

    The feature set as is, is a big leap up from the P1, and very powerful. Obviously, Chip will do what he feels is reasonable to do in the time available within the size and power constraints.
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-04-30 20:33
    What is the target date for the FPGA image?
  • potatoheadpotatohead Posts: 10,261
    edited 2014-04-30 20:38
    Personally, I'm in favor of mooch for sure at some point in a future design. Maybe more, but that needs some testing and code written against it, and we can do that with an FPGA, proving out something new, while people are making stuff with the chip we know will work well.

    Given the time that passed, and what Chip initially told us, "a few weeks?" is my best guess.

    Just saw Roy's post. Totally agreed.
  • jmgjmg Posts: 15,175
    edited 2014-04-30 20:53
    Roy Eltham wrote: »
    jmg, Seairth:
    I'm arguing against it because it's a feature that isn't implemented yet, may cost a fair bit, and has some unknowns (like how does it interact with the mathblock stuff?). I am a power user that would love to have high bandwidth between the HUB memory and COGs, but I also am moving closer and closer to the Heater camp of not wanting to keep adding stuff and instead get it done ASAP.

    The HW that allocates time slots, changes minimally.
    Currently, it accepts outputs of 4b registers, and decides which-cog from that.
    Mapping simply swaps in a small 4b registered memory instead. (same Tco)
    The run-time operation is the same, all that is needed is a means to set the contents of that memory.
    Roy Eltham wrote: »
    The mess that *will* happen if slot sharing/mooching/stealing/whatever is implemented is that objects that would normally all work together just fine, might not anymore. You won't be able to use any video driver, with any serial driver, with any QSPI driver, with any USB driver. You say this is fine, and just another axis of making things work along with pins and whatnot. I disagree, because it's not a hard will or won't work. It's a might work some/most of the time, but fail sometimes and/or in odd ways. It will lead to horrible experiences for users.

    This effect is not a result of Mapping, as users do not have to use it. It is as safe-superset feature.
  • potatoheadpotatohead Posts: 10,261
    edited 2014-04-30 21:13
    There are no superset features, only features.

    We know from the body of experiences we've had with P1 that moving to a faster cog will be the number one thing tried when people really should either refactor the code to perform better or parallelize the problem.

    Once we permit a distribution of cycles among COGS, we no longer can say COG code always works. And that's a big deal. We know about the reuse advantages. Even poorly written COG code works, and it can be used again and again. Heck, some of us wrote poor COG code early on (me, for one), and it still got used, because polishing it up didn't require in-depth analysis. Many assumptions can be made and acted on when COG code just works that can't when it doesn't and that complicates things.

    For a couple of problems, I know for a fact, I would have maxed out a COG, making the reuse by others moot. Converting something to a mailbox comms method, for example, to be used in C or whatever, is way easier than downgrading something complicated is. In the end, we get a lot more out of a lot more code that way, and we avoid very painful problems, leaving ones we can more easily discuss, share and debug.

    Ever follow some of the XMOS threads where this is at issue? It's not pretty and the things we work on to solve "it won't play well with others" problems are a walk in the park by comparison!

    Features and controls have costs. Infinite control, for example, isn't a freebie. There is the cost of making choices that increases as does the cost of finding an optimal solution. On the other end of things, no choice at all carries the cost of not being optimal at all, however there are no costs to finding an optimal solution either.

    A good balance is control that adds a lot of value without also adding a lot of costs. We see this all the time in UX design where less choice is actually very high value. Interestingly, people may well save a keystroke or click here and there, but they also don't have to make discovery investments and can very easily figure out how the vast majority of things are done.

    When more controls are added, we see some small percentage of the user base improve efficiency, however we see new users fail due to the the increased barrier to discoverability, and or common, casual, non power users fail or see degraded performance due to the increased complexity competing for their mindshare related to the product and their personal goals of using the product.

    Somewhere in my brain I had a reference for this dynamic, and it's a Googler that taught it to me as well. Smile. :)

    Anyway, that's the rough dynamics in play here.

    It's my opinion that mooch, of all the options, is the easiest one. It can deliver many of the performance benefits we see as desirable, and it does so being a single control too, very discoverable. Since it's passive, it's also got the benefit of being something one can't really depend on, which would encourage COG code that always works, but might work a lot better / faster given a mooch.

    Beyond that, I've seen tons of schemes, each with lots of different trade-offs between discoverability, complexity, flexibility, etc... We are a LONG way from settling how this kind of thing should be done. All we really know is a whole lot of ways it could be done, and none of those have the simple clarity mooch does.

    If we did anything at all, I would grudgingly support mooch, but I really don't want any of it as much as I want this design to settle and ship, because we need to transition people who have outgrown P1 into something new so our community can carry on and compete, not end up a backwater of sorts.

    That's more important, and is the "ecosystem" Roy speaks of, than getting the max performance out of a COG on this chip. If we do this right, we get more chips sooner. Ken said. :)

    Since we've not really agreed on much of anything other than it being highly desirable, and mooch seeing a very high consensus apart from that, it makes no sense at all to delay this design to put the feature in there.

    Work needs to be done to optimize it, factor all the complexity out of it we can, polish, present, etc... so that it's costs don't exceed it's benefits overall.

    Doing that work would be kick Smile! Let's do it on the FPGA along with the other advanced things we know we can target for 90nm or smaller and do it right, not clog the works on this design.

    IMHO, Chip is nearly finished with the COG, or is finished, meaning right now he's working on the pins, math block and guts to make it work. We may well get an image in a couple few weeks, and there is a ton of work to do!

    Gcc has gotta get moving. Catalina too.
    We've got SPIN 2 to get sorted out.
    Regarding tools, Pnut needs to support hubex, and get various cleanups.
    Chip needs to do booter, monitor, crypt.
    We need to test instructions, and get a lot of basics written!

    The way I see it, a killer package, well presented with as few barriers to adoption as possible, will maximize our opportunity to continue this adventure and it means taking the HUB slot idea and some other killer stuff that came up on the more aggressive design, and go big! That's the place to do this kind of thing, not here, not now, not on this transitory and increasingly necessary design.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-30 23:58
    This insanity against slot priority is ridiculous.

    1. On P1 we have objects that require overclocking. Examples are video modes where we need to overclock to get higher resolution displays. Yet no-one has commented about this.

    2. Most of my boards are overclocked, usually 104MHz, but also 96MHz.

    3. Now for a real commercial application...
    I have a commercial application that uses 3 propeller chips. All could have been done using cheaper processor chips.
    One of the props runs at 104MHz and has 512KB of SRAM and SD card. It runs a large Catalina C program.
    If the P1 was not overclocked, did not have C, and did not have 512KB SRAM plus SD card, it would probably not use a P1.
    If it did not use a P1, then the benefit of using P1's for the other 2 processors (1 at 96MHz and 1 at 80MHz) would have
    evaporated, and I would have most likely used different processors - ie not Propellers.

    While my app will not use large numbers, at least each board set has 3 P1's.


    If the P2 doesn't get the opportunity to vary slot bandwidth, either by mooching, co-operating pairs, or some other method, how many commercial sales might Parallax lose?
    I cannot answer this - only Ken & Chip might know.
    I don't know how simple or complex it might be, but my engineering design background tells me it is not that difficult.

    Downright opposing something under the disguise that not all objects might work is plain unnecessary fear-mongering, at the expense of well-engineered products being able to take advantage of otherwise unused processing power. This method has to be enabled, and it is quite simple if you want to ban objects from the OBEX utilising it, if you don't want objects using slot sharing.

    What if you cannot do USB FS simply because there is no slot sharing ?
    What if you cannot do those amazing graphics at high resolutions because there is no slot sharing ?
    What if you cannot do Quad SPI because there is no slot sharing ?

    The main arguments are not...
    How much silicon it will take ?
    How long will it take ?
    How much power will it use ?

    BUT just simply, we don't like it because it may break someone's object.
    Commercial Engineers will not care !!!
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2014-05-01 00:28
    Cluso,
    What if it can do all those things without slot sharing?

    What if slot sharing complicates things too much?

    What if everyone things the P2 is unreliable because of it (however misinformed or not)?

    What if

    What if...

    What if...
  • RossHRossH Posts: 5,477
    edited 2014-05-01 03:14
    Cluso99 wrote: »
    This insanity against slot priority is ridiculous.

    Sorry, Ray - I'll never agree that this is a good idea.

    If the Propeller gives away its claim to symmetry, orthogonality and determinism, then what does it have left to offer that can't be had more cheaply elsewhere?

    Ross.
  • Invent-O-DocInvent-O-Doc Posts: 768
    edited 2014-05-01 03:56
    I'm a proponent of keeping determinism and independent, predictable cogs for many of the reasons discussed here. Performance of this new microcontroller design is much higher than we have now. Applications that need much faster execution and much greater power that need these byzantine features would really benefit from special chips designed for this purpose. They are called microprocessors.
  • koehlerkoehler Posts: 598
    edited 2014-05-01 05:02
    Roy Eltham wrote: »
    I have to agree with Phil on this one. Any form of sharing or remapping or whatever is just going to be a mess.
    I think the only "safe" way would be if Chip can make the hub to cog window wider or occur more often (for all of them equally).

    While I agree, I'm still confused why allowing the 'potential' for a Cog to have greater bandwidth is automatically being shot down.
    I'm probably incorrect, however this is what I think is going to be available:

    1- Standard COG useage
    2- Hubexec available on ALL COGs, yes/no (?)

    A #3 would be some form of hubslot sharing, of which Tubular's suggestion seems quite nice from a balancing perspective among others.

    PRO- Would allow double hub bandwidth, or greater depending upon implementation.

    CON- Would potentially impact current OBEX, or would it ( 8 Core vs 16)

    Question- Since this would be a voluntary design decison made by the programmer, why in this day and age would anyone want to castrate the new Prop on this potential bandwidth bonanza ?
    This has high upside potential in any application which actually needs the bandwidth, and is a great feature to advertise/sell.
    Considering the speed differential between Legacy and New Prop, seems like there would be a high probability that many current OBEX will need update for specific timing changes anyways, so what is really being saved?

    The elephant in the room is the fact that new OBEX objects will start to come in which require multiple COGs/sharing, which can cause unknown dependency issues/complexity.

    So the answer is to remove this potentially high-value feature from the new Prop entirely, rather than allow those who might need the feature the option to work around issues.

    Unless there is real die area/power or ChipTime(TM) impacts, seems like this might be best discussed as a Final Feature discussion once Chp has reached a 90% or such completion.

    Also, how does COG0 utilizing COG9's hub access actually impact determinism of COGs 1-8, 10-15?
  • SRLMSRLM Posts: 5,045
    edited 2014-05-01 05:35
    koehler wrote: »
    Question- Since this would be a voluntary design decison made by the programmer, why in this day and age would anyone want to castrate the new Prop on this potential bandwidth bonanza ?
    This has high upside potential in any application which actually needs the bandwidth, and is a great feature to advertise/sell.
    Considering the speed differential between Legacy and New Prop, seems like there would be a high probability that many current OBEX will need update for specific timing changes anyways, so what is really being saved?

    The elephant in the room is the fact that new OBEX objects will start to come in which require multiple COGs/sharing, which can cause unknown dependency issues/complexity.

    So the answer is to remove this potentially high-value feature from the new Prop entirely, rather than allow those who might need the feature the option to work around issues.

    The problem with mapping is that there's nothing free. The mapping scheme would allow the available hub slots to be divided among a number of cogs in a non-even manner, which means that some cogs would get less than others. The problem is not with multi-cog objects. The risk is then that code will be mixed and matched, and an object will be placed in the "reduced bandwidth" cog leading to runtime failures.

    Mooching is a different scheme, where only "free" slots would be allocated. This sets a common guarantee for minimum hub access that cannot be broken.
  • Heater.Heater. Posts: 21,230
    edited 2014-05-01 06:09
    Hey guys,

    I've got this great new device. It's like an FPGA but easier and cheaper. It has logic blocks but these are not simple FPGA logic blocks, no, with this device a logic block can implement whole subsystems, you know like a UART, or multi-channel PWM, or video all sorts. It's really cool.

    Not only that you can run up to half a megabyte of C code without having to blow a processor core into your FPGA. So easy.

    One great thing is that there are hundreds of ready made IP Blocks, "objects" as they like to call them, out there on the net that provide for all those subsystems. Just check github or the old folks OBEX library. You don't have to do any of that hard work. Just concentrate on your application code.

    Let me show you what it can do, you'll love it. See this chip here, well as you can see it's already driving a bunch of servos on my robot arm, measuring the temperatures in my processing tanks, logging to an SD card and communicating with my PC and monitoring the emergency stop switches. Imagine, I did all that with out having to mess with an RTOS, or interrupts, or priorities. Just drop in components and "wire" them up. So easy it's unbelievable.

    But watch this. This will knock your socks off. See I just downloaded this turbo-video driver object. Drop it into my project and add some calls to it to display live data on this screen here.

    Oh, wait, that's not right, my servos have gone mad. That robot arm is going to smash something. That's odd it's been working all week. Can't get anything on the screen either, strange. Let me just have a look at this....just a sec...

    Hey guys, where are you going? Come back it does work. Well it did, honest, this won't take long...guys...guys...oh ****.




    And the moral of the story is:

    Don't break determinism with any silly HUB slot sharing scheme. Don't add coupling between COGS, in timing, by doing away with the round-robin HUB.

    As our story illustrates this is akin to all the problems of building programs out of component parts that need interrupts and priorities and/or an RTOS.

    If you can do it without inducing the above embarrassment, then all well and good.
  • ctwardellctwardell Posts: 1,716
    edited 2014-05-01 06:16
    I see FUD is still alive well.

    Chris Wardell
  • Heater.Heater. Posts: 21,230
    edited 2014-05-01 06:21
    Chris,

    Please elaborate. No FUD here. Just exploring ideas and expressing opinions.
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-05-01 06:27
    All these feature creep suggestions are great, but when are getting the FPGA image?
  • ctwardellctwardell Posts: 1,716
    edited 2014-05-01 06:35
    Heater. wrote: »
    Chris,

    Please elaborate. No FUD here. Just exploring ideas and expressing opinions.

    Sure there is. The majority of arguments against sharing hub slots, allowing them to be allocated, etc. is based on *what if*.

    It is implied that the cogs are totally independent and that life in propeller land is full of milk and honey and nothing will ever go wrong as long as we keep our nice little round-robin slot scheme.

    Never mind that the cogs share the hub and can already step all over each others data. Since the pins are all OR'ed any cog can potentially force a state that wasn't intended.

    But yea, I guess you are right, hub slot sharing or allocation would be a real boogey man.

    C.W.
  • Heater.Heater. Posts: 21,230
    edited 2014-05-01 06:52
    Chris,

    I don't buy into your arguments re: RAM and pins. Assuming one actually has , COGs, RAM and pins available one can drop objects in an they "just work". Issues of code accidentally messing with each others data or pins are software bugs that need to be fixed.

    Causing timing coupling between processes is a bug in the chip that cannot be fixed in software.

    Yes, there is a lot of "what if". Must be as we don't have the chip in hand. I can turn that around and say: What if we didn't have HUB slot priorities? My guess is nobody would notice.

    A new question comes to my mind. If I understand correctly Chip is working on a shared, piplined, CORDIC engine. Basically another HUB resource. How would HUB slot priorities mess with the scheduling of that and complicate the design?

    "What if" we actually get the frikken chip already ! :)
  • ctwardellctwardell Posts: 1,716
    edited 2014-05-01 07:19
    Heater. wrote: »
    Causing timing coupling between processes is a bug in the chip that cannot be fixed in software.

    Bug? Really?

    A failure to understand consequences would be a better way to describe the scenario. Proper coding can certainly prevent any issues, so that statement including "bug in the chip " and "cannot be fixed in software" is what I would call FUD.

    It has already been mentioned more times than I care to count that the primary use case for sharing is to reduce latency for hub access.

    No cog can 'steal' another cogs slot, so as long as you code objects that don't try to use any extra slots their timing cannot be affected.

    If you do create objects that make use of extra slots it is your responsibility to write them such that they do not fail when the extra slots are not available, typically by just running slower.

    How slot sharing might work with the pipelined hub resources like CORDIC is a valid question that will need to be worked out.

    Chris Wardell
  • Todd MarshallTodd Marshall Posts: 89
    edited 2014-05-01 07:19
    RossH wrote: »
    Agreed.
    Is "fixed" (in "fixed round-robin") a crucial word here? If I wanted a COG to get 100% (or 50% or 25%) of the HUB accesses, could I view the round-robin as time slots rather than COG slots? Then if I have just one COG, say 0, and put 0 in all 16 time slots, COG0 gets 100% of the HUB. Put it in odd (or even) slots and it gets 50%, etc. Does this break some use-case of the model we have now where COG0 runs in the 0 slot; COG1 runs in the 1 slot; etc. and if that slot is not used, that slot's access to the HUB is just lost?
  • potatoheadpotatohead Posts: 10,261
    edited 2014-05-01 07:38
    Right now, the behavior is 16 access windows, each associated with one of the 16 COGS. If it's not used, it's not used.
  • Heater.Heater. Posts: 21,230
    edited 2014-05-01 07:42
    Todd,

    You have walked into a debate that has been going on since the dawn of time. In the PII design time scale that is.

    The "use case" that these slot sharing ideas break is that:

    1) I create a object or some C library code that requires more HUB bandwidth than would be available in the simple round-robin style.

    2) You create something that also requires more than it's fair share of HUB bandwidth to work.

    3) Somebody else puts those two together in his project and poof one or the other or both stops working. Or worse fails in odd ways seemingly at random. Or simply starting up free COGS with "normal" code can slow that "greedy, needy" COG and cause failure.

    This is what I mean by "coupling" between COGs. It's in the time domain. It breaks determinism. It impacts confident, painless, code reuse.

    That simple code reuse idea is a unique, valuable and essential part of the Propeller multi-core design idea. At least as far as I'm concerned. Seems I am not alone.

    Imagine, if you will, that you were creating a design in an FPGA. But then you find out that adding some IP core to your design impacts the operation of some other IP core you have there already. That would not fly now would it.

    As you see there are those who value the possibility of "go faster" COGs and don't give two hoots about the determinism of which we speak. I have no idea what dramatic benefits they think that will bring. If they really want an interrupt driven machine with a nest of priorities why not just use an STM32 F4 already?

    @Chris,
    If you do create objects that make use of extra slots it is your responsibility to write them such that they do not fail when the extra slots are not available, typically by just running slower.
    I like the idea.

    Sadly it's not sufficient. If your beautifully crafted and speed independent driver can run fast or slow depending on the current environment. That can still cause my application to fail as the speed is modulated at run time by some other code I have added to my project.
  • ctwardellctwardell Posts: 1,716
    edited 2014-05-01 07:54
    Heater. wrote: »
    @Chris,

    I like the idea.

    Sadly it's not sufficient. If your beautifully crafted and speed independent driver can run fast or slow depending on the current environment. That can still cause my application to fail as the speed is modulated at run time by some other code I have added to my project.

    It is true that such modulation can occur, but again it is your responsibility to plan for it.

    I would say this is an argument for a table based slot assignment scheme like has been mentioned that would allow assigning more slots to some cogs and less to other.

    For some reason though it seems that those opposed to sharing are also opposed to assignment and consider them equally evil.

    Chris Wardell
  • Heater.Heater. Posts: 21,230
    edited 2014-05-01 07:54
    potatohead,
    Ever follow some of the XMOS threads where this is at issue? It's not pretty...
    Do you have a link to such a thread? Because I don't recall ever coming across one. (Mind you, I have not been "over their" for a long time now.)

    The XMOS chips do have exactly the problem we are discussing here. They can run 4 hardware scheduled threads at the fastest possible speed. As you grow to 5, 6, 7, 8 threads the speed of all of them slows down.

    That might sound like a disaster except:

    1) XMOS does not require instruction cycle counting to get things timed correctly. It has enough timers and clocked I/O features to make that unnecessary.

    2) The XMOS dev tools contain a timing analyser that will tell you at compile time if your code is not going to meet it's deadlines. This takes into account any other threads that may be running. This prevents random run time failures.

    Still there is the fact that when you add a thread passed 4 you slow down the existing 4. Which can make mixing and matching code tricky.
  • Heater.Heater. Posts: 21,230
    edited 2014-05-01 08:02
    Chris,
    It is true that such modulation can occur, but again it is your responsibility to plan for it.
    Yes, push the problem onto the user. If they are too stupid to sort it out, well never mind.


    No. I just want to fetch objects and libs and use them. Life is short, time is expensive. I just want to think about my application. Not the timing needs of all the code I'm incorporating.


    Yes, all these schemes are equally evil :)
  • RamonRamon Posts: 484
    edited 2014-05-01 08:03
    Heater: the same useless discussion, the same useless reply: #3973, #3964 (the big thread)

    I do not think Todd is trying to broke anything. It's all about flexibility.

    Grounhog Day, Again !
  • potatoheadpotatohead Posts: 10,261
    edited 2014-05-01 08:07
    1. On P1 we have objects that require overclocking. Examples are video modes where we need to overclock to get higher resolution displays. Yet no-one has commented about this.

    Those displays ended up being possible when we parallelize the problem. One key factor in that is the limits in WAITVID on P1. This chip will have a larger WAITVID buffer, which will bring us plenty of resolution at the clocks being discussed. A P1 with a larger WAITVID buffer would improve in the same way, sans slot sharing.

    This chip will have 16 COGS! Parallelizing the problem costs us less now! And when we do that, it always works, period.

    Also related to those displays was use of smarter data, much better aligned to how the P1 functions. An example would be C64 style color map memory layout for text compared to IBM style. Both do the very same thing; namely, assign one nibble of a byte to either the foreground or background color of a text character. IBM style means fetching a word per character, which is free. C64 style meant fetching two bytes, which isn't free.

    With sprites, similar things are true, and those parallelize easily. Done right, a P1 can sling a ton of things around the screen, and or display high resolutions given it's capability / capacity.

    It is worth noting that the P1 video system really wasn't intended for a lot of what we ended up doing with it. Display tech is considerably better than the P1 video tech is. On this design, with just the simple VGA support Chip said he's doing, that's not anywhere near as true. Composite will work in software, VGA should drive 1024 pixels with no slot sharing games, etc... Of course, we will push it, but when we do, it will always work too. If we turn to COG allocation, it won't always work, yet people will still want the high resolution displays.

    Done the P1 way, they will get them and they will work. And there are more COGS too. Less painful to use two COGS as a video solution, for example.

    Something to think about.

    Again, if I were to support anything on this chip, it's mooch. That one has the least control cost impact, and it's passive, which preserves a lot of the dynamics we are discussing here, but not all. Maybe enough.

    Anything active? We need to have a much longer discussion than makes sense in terms of putting this design into people's hands.

    And I want that discussion too. Just not on this one.

    Re: FPGA, as far as I can see this is idle chatter. Chip's working and you can always tell when he's mostly silent, but for a query or update. Last update was about clock speeds, and he expressed being tired of messing with COGS.

    If I were to guess, he's on the math block and the HUB, gluing things together right now. I think he's got the pins sorted for a first pass as that one hasn't seen discussion in a while.

    Bet it's a few weeks tops until we get an image.

    Maybe he can give us another estimate. His last one was something like 5 weeks, and that was three weeks ago, or something along those lines. One of us could go pick through the posts and nail it down more...
Sign In or Register to comment.