Shop OBEX P1 Docs P2 Docs Learn Events
I'm wondering if the P2 can solve my FPGA headaches.... — Parallax Forums

I'm wondering if the P2 can solve my FPGA headaches....

Hello everyone. It's been a long while since I've been around here.

Lately I've been learning some CPLD and FPGA design. I even built a custom CPLD board for the ATF1508 but after upgrading to Fedora 36, I'm having issues with running WinCUPL in Wine. And, to be honest, WinCUPL is complete garbage.

Anyway, my ultimate goal is to develop a video chip for use in a 6502 based computer. Something that either the 6502 could drive directly, or possibly send simple commands to it like the TMS9918.

I would target anywhere from 4 - 8MHz and could run the 65C02 at 3.3V to avoid any voltage issues.

I'm just not getting FPGA like I want. I'm a software guy anyway and I do have SOME PASM experience with the P1. Plus the FPGA scene seems to change daily and I just can't keep up.

Oh, before anyone mentions it, I'm not looking to emulate a 6502. I know that's popular and the P2 can easily do it but that's not what I want.

So I'm hoping you guys can offer some suggestions and answer a couple questions.

1) Can the P2 keep up with the 6502 running at or near 8MHz? For example, could the P2 be used as fast SRAM for the 6502? I remember the Propeddle could only run about 1 MHz with a 6502 using a P1. I assume the P2 is much faster.

2) Could one of the cores of the P2 (and enough pins) be used as a general purpose MMU or other glue logic for the 6502 and other devices such as sound chips, etc.?

Thanks for any information!

Comments

  • 1) Can the P2 keep up with the 6502 running at or near 8MHz? For example, could the P2 be used as fast SRAM for the 6502? I remember the Propeddle could only run about 1 MHz with a 6502 using a P1. I assume the P2 is much faster.

    Not entirely sure. I know that the P1 is most capable of keeping up with a ~7 MHz Z80 (see the couple RC2014 expansion cards that use one). Though using that as RAM is a poor idea, since it gets there by inserting wait states for the CPU. P2 idk, is of course faster (and can in some cases have instant reaction time due to smart pin logic), but I/O latency to the CPUs is a bit of a tricky story that I don't quite understand.

    A 6502 bus cycle @8MHz takes 125ns. Some of that the CPU needs internally, so I guess(tm) it's more like 90ns of response time that is required, which is about 30 P2 cycles when you overclock it to 320MHz (for which you need good thermals and voltage regulation to not crash). I'd say enough to read/write some data from a flat 64k RAM buffer (assuming the address bus is nicely aligned into the IO pins), but not much else without halting the 6502.

    2) Could one of the cores of the P2 (and enough pins) be used as a general purpose MMU or other glue logic for the 6502 and other devices such as sound chips, etc.?

    Define "general purpose MMU". HuC6280 style or more like modern page tables. With enough wait states all is possible.

    Sound chips, you say? Oh, we got plenty of those and more. Some might say too many.

    Oh, before anyone mentions it, I'm not looking to emulate a 6502. I know that's popular and the P2 can easily do it but that's not what I want.

    But why not? Unless you're making some sort of expansion card for an existing system, there's negative sense in having a discrete 6502 but implementing everything else in a P2. You could even emulate a CPU that doesn't suck ;3

  • The video chip idea will work right away.
    SRAM / rom emulation with MMU will work, but pour over the datasheets to find the right clock edge to sample the address and data.

    Don't get caught up in overclock, nor is it really necessary.

    Start out at a slower clock than what you need, and then speed up is my advice for not going insane.
    Make it work, and then make it work well.

    Be sure to tie your clocks together between the P2 and the 6502. Independent clock drift would be a project killer.

  • @Wuerfel_21 said:

    Not entirely sure. I know that the P1 is most capable of keeping up with a ~7 MHz Z80 (see the couple RC2014 expansion cards that use one). Though using that as RAM is a poor idea, since it gets there by inserting wait states for the CPU. P2 idk, is of course faster (and can in some cases have instant reaction time due to smart pin logic), but I/O latency to the CPUs is a bit of a tricky story that I don't quite understand.

    The more I think about it, the more I think I want the P2/6502 to work asynchronously. Like the TMS9918 did for so many computers/consoles. So my dream is to have a great video chip that can just receive commands from the 6502 and then go off and do the work. Honestly, I've been studying how the Atari 800 did it as well (along with the Amiga) and they used display lists.

    Anyway, basic idea is for the 6502 to send data to P2 like it would to a 6522 VIA or some UART chip back in the day. So maybe a cog could just buffer the commands it receives (hopefully at 8 MHz) and other cogs would do the work. Maybe I could use a 6522 to send the data to the P2 if it can't keep up. That way, the 6502 can move along and do other things while the VIA proxies the data/commands.

    2) Could one of the cores of the P2 (and enough pins) be used as a general purpose MMU or other glue logic for the 6502 and other devices such as sound chips, etc.?

    Define "general purpose MMU". HuC6280 style or more like modern page tables. With enough wait states all is possible.

    That one was a bit of a stretch for me and I should have clarified my question better. Basically, I was going to use a CPLD in place of a few 74' logic chips for address decoding. I'd like to get fairly granular and enable/disable RAM, UARTS, etc. depending on what address the 6502 is accessing. But when I think about it in more detail, the issue is that I would need a lot of address pins and several chip select pins on the P2 to make this practical. Which would reduce other functions. So I may just stick to a CPLD for that.

    Sound chips, you say? Oh, we got plenty of those and more. Some might say too many.

    These audio chip emulations are what excite me about the P1 and P2. That and the video chip emulations. That's sort of the entire point for the Prop for me when it comes to retro computers. I want to design my own custom chips that drive a 6502 computer I design.

    Oh, before anyone mentions it, I'm not looking to emulate a 6502. I know that's popular and the P2 can easily do it but that's not what I want.

    But why not? Unless you're making some sort of expansion card for an existing system, there's negative sense in having a discrete 6502 but implementing everything else in a P2. You could even emulate a CPU that doesn't suck ;3

    Emulating a 6502 is not off the table completely. In fact, it might be a good starting point while I build some basic commands, etc. Thing is, I've already designed a 6502 based computer a few years ago that could use some audio/video. It's just serial now and pretty limited.

    But my biggest issue with doing everything in emulation is that I don't feel like I've actually designed anything. Being a software developer by trade, I'm trying to learn more hardware design. And running everything on one chip isn't really designing hardware. :-)

    Having said that, I got inspired enough that I bit the bullet and ordered a P2 evaluation board! So I'm excited to get started ASAP.

    Thanks for your help.

  • @whicker said:
    The video chip idea will work right away.

    That's what I'm hoping. Just use the P2 as a powerful video chip that the 6502 send commands to (like the TMS9918). I'm just hoping that I don't have to stall the 6502 too much if I clock it between 4-8 MHz.

    SRAM / rom emulation with MMU will work, but pour over the datasheets to find the right clock edge to sample the address and data.

    I still may try and do something with ROM emulation on the P2. Even if I copy it to real RAM. I like the idea of not dealing with actual ROM burning which was a pain on my last computer.

    Don't get caught up in overclock, nor is it really necessary.

    Start out at a slower clock than what you need, and then speed up is my advice for not going insane.
    Make it work, and then make it work well.

    Yeah, for sure. I believe in crawling before running so I will take my time. Which is why I like the emulation idea at first just to get my code working and then move to real hardware using a 6502.

    Be sure to tie your clocks together between the P2 and the 6502. Independent clock drift would be a project killer.

    What about using the P2 as the clock for the 6502? Seems like that might have some major advantages.

    Thanks!

  • @cbmeeks said:

    @Wuerfel_21 said:

    Not entirely sure. I know that the P1 is most capable of keeping up with a ~7 MHz Z80 (see the couple RC2014 expansion cards that use one). Though using that as RAM is a poor idea, since it gets there by inserting wait states for the CPU. P2 idk, is of course faster (and can in some cases have instant reaction time due to smart pin logic), but I/O latency to the CPUs is a bit of a tricky story that I don't quite understand.

    The more I think about it, the more I think I want the P2/6502 to work asynchronously. Like the TMS9918 did for so many computers/consoles. So my dream is to have a great video chip that can just receive commands from the 6502 and then go off and do the work. Honestly, I've been studying how the Atari 800 did it as well (along with the Amiga) and they used display lists.

    Anyway, basic idea is for the 6502 to send data to P2 like it would to a 6522 VIA or some UART chip back in the day. So maybe a cog could just buffer the commands it receives (hopefully at 8 MHz) and other cogs would do the work. Maybe I could use a 6522 to send the data to the P2 if it can't keep up. That way, the 6502 can move along and do other things while the VIA proxies the data/commands.

    With some clever code you can probably get writes to the P2 to go through without waitstates (since the 6502 can never generate two consecutive write cycles, there's essentially twice the response time)

    But my biggest issue with doing everything in emulation is that I don't feel like I've actually designed anything. Being a software developer by trade, I'm trying to learn more hardware design. And running everything on one chip isn't really designing hardware. :-)

    Valid excuse then :) . Though rumor has it that designing a board that can sustain an overclocked P2 like the EVAL and EDGE boards can is actually a tricky endeavor in its own right.

    What about using the P2 as the clock for the 6502? Seems like that might have some major advantages.

    Can totally do that, doesn't really take up any resources and makes the clocks stay perfectly in sync.

  • @cbmeeks said:
    What about using the P2 as the clock for the 6502? Seems like that might have some major advantages.

    It's a good way to go. Also if the 6502 supports it (a static CMOS 65C02 should), you may be able to stretch or even stop the clock to slow things down if required for the P2 to service something complex that takes longer than normal to process...

  • @cbmeeks said:
    Hello everyone. It's been a long while since I've been around here.

    Lately I've been learning some CPLD and FPGA design. I even built a custom CPLD board for the ATF1508 but after upgrading to Fedora 36, I'm having issues with running WinCUPL in Wine. And, to be honest, WinCUPL is complete garbage.

    Anyway, my ultimate goal is to develop a video chip for use in a 6502 based computer. Something that either the 6502 could drive directly, or possibly send simple commands to it like the TMS9918.

    I would target anywhere from 4 - 8MHz and could run the 65C02 at 3.3V to avoid any voltage issues.

    I'm just not getting FPGA like I want. I'm a software guy anyway and I do have SOME PASM experience with the P1. Plus the FPGA scene seems to change daily and I just can't keep up.

    Oh, before anyone mentions it, I'm not looking to emulate a 6502. I know that's popular and the P2 can easily do it but that's not what I want.

    So I'm hoping you guys can offer some suggestions and answer a couple questions.

    1) Can the P2 keep up with the 6502 running at or near 8MHz? For example, could the P2 be used as fast SRAM for the 6502? I remember the Propeddle could only run about 1 MHz with a 6502 using a P1. I assume the P2 is much faster.

    2) Could one of the cores of the P2 (and enough pins) be used as a general purpose MMU or other glue logic for the 6502 and other devices such as sound chips, etc.?

    Thanks for any information!

    Hi,
    perhaps some comments from a less gifted programmer. I have tried and then given up to port a 6809 emulation written in C together with a coco-3 MMU to P2. The goal was to achieve >1.7MHz emulated speed with a P2 @ 200MHz. https://forums.parallax.com/discussion/174794/towards-os9-operating-system-on-p2/p1
    The main problem for speed was, that due to the competing 8 cogs for HUB RAM access, random access is slowed down to about ??? average 1/4 ??? of the total bandwidth. So I think, for random HUB data access speed can be only about 25MHz (P2 @ 200MHz). So if the MMU (or emulated CPU) needs to access HUB RAM additionally then it is unlikely that you can achieve 8MHz emulated speed. The MMU needed a lot of the time. Best idea I implemented was, to just check, if the page was the same as for the last access and in this case just use the same memory offset. Otherwise do all the other lengthy checks, for example for memory mapped I/O.

    In any case I learned, that while FlexxProp is a great tool with C, I would not try to use the C compiler for this kind of job with P2 anymore. I spent too much time to try to trick the compiler to do things, for example inline a function or hold a variable in cog ram.
    Christof

  • pik33pik33 Posts: 2,388
    edited 2022-10-27 08:04

    We already have a good working 6502 emulator which I use for a SID player: https://forums.parallax.com/discussion/174487/compukit-uk101-6502-emulator
    (so there is of course SID emulation available): https://forums.parallax.com/discussion/169566/sid-s-adventure-in-p2-land
    I also wrote several video drivers which use display lists. https://forums.parallax.com/discussion/173228/a-display-list-is-fun-0-50a-beta-0-07g-psram-command-list-used/
    And Amiga inspired audio driver which I use for playing modules. https://forums.parallax.com/discussion/174214/paula-amiga-inspired-audio-driver-0-93-non-integer-skip-fine-tuning-enabled

    There can be a problem with HUB RAM read latency which can eat up to 17 clocks and that is >50 ns even at 320 MHz. About 30 pin needed to interface the 6502.... the emulator seems to be much simpler solution.

  • pik33pik33 Posts: 2,388
    edited 2022-10-27 08:05

    The main problem for speed was, that due to the competing 8 cogs for HUB RAM access, random access is slowed down to about ??? average 1/4 ??? of the total bandwidth.

    There is no such thing as cogs competing for a HUB RAM access in a P2. The egg beater does its job so every cog has access to its own slice in every clock and it can use the access or not use it and that is transparent for the rest of cogs.

    What can compete for HUB RAM access is the cog itself with its FIFO + streamer + setq/read or write bulk.

  • @pik33 said:

    The main problem for speed was, that due to the competing 8 cogs for HUB RAM access, random access is slowed down to about ??? average 1/4 ??? of the total bandwidth.

    There is no such thing as cogs competing for a HUB RAM access in a P2. The egg beater does its job so every cog has access to its own slice in every clock and it can use the access or not use it and that is transparent for the rest of cogs.

    What can compete for HUB RAM access is the cog itself with its FIFO + streamer + setq/read or write bulk.

    Sorry for my bad English. What I meant to say is, that a cog of P2 can only instantly have random access for a 1/8 part of the ram, while a single core processor can have immediate access to all of the RAM. Some other multi-core processors have caches per core and/or RAM that can be configured to reduce the problem.

    The emulated 6502 achieves 1MHz speed without MMU?

  • pik33pik33 Posts: 2,388
    edited 2022-10-27 13:44

    The emulated 6502 achieves 1MHz speed without MMU?

    It is faster than 1 MHz.The original Ccompukit emulator code uses throttling to run at 1 MHz. I have to test how fast it can be without the throttling loop.


    Edit: a short test

    ldy #255
    p1: lda $4100
    sta $4101
    dey
    bne p1
    lda #$12
    sta $4102
    

    which is 3322 6502 clocks, took (less than) 79291 P2 clocks, (with the overhead for measuring time and starting the procedure in the emulated 6502 added). This gives about 23 P2 clocks for one 6502 clock and ~14 emulated MHz with a P2 at 336 MHz

    This was oversimplified and not very accurate test, but the rough approximation which gives the order of magnitude of this emulated 6502 speed. It looks too good, I have to check it again and use something more complex to test.

  • Thanks everyone for your input. I am 99% sure I am going to follow the design below:

    1) Use an emulated 6502 for testing and basic code development

    2) Use a real 65C02 when I feel things are stable (including video)

    3) Use actual SRAM for the physical 65C02. This reduces pins needed for interfacing to the 65C02.

    4) Pass commands and data to the P2 like the TMS9918 did.

    5) Clock the real 65C02 with the P2. Since it is static, this will allow me to stop it, slow it down, stretch, etc. for when the P2 can't keep up. Maybe not ideal, but this is a hobby computer anyway.

    6) Still working on an MMU solution. I may use a modern PLD chip (my original intent) because eventually, I would like to bank in additional RAM for up to 128 KiB or so.

    Thanks!

  • @pik33 said:

    The emulated 6502 achieves 1MHz speed without MMU?

    It is faster than 1 MHz.The original Ccompukit emulator code uses throttling to run at 1 MHz. I have to test how fast it can be without the throttling loop.


    Edit: a short test

    ldy #255
    p1: lda $4100
    sta $4101
    dey
    bne p1
    lda #$12
    sta $4102
    

    which is 3322 6502 clocks, took (less than) 79291 P2 clocks, (with the overhead for measuring time and starting the procedure in the emulated 6502 added). This gives about 23 P2 clocks for one 6502 clock and ~14 emulated MHz with a P2 at 336 MHz

    This was oversimplified and not very accurate test, but the rough approximation which gives the order of magnitude of this emulated 6502 speed. It looks too good, I have to check it again and use something more complex to test.

    14MHz is nice! So about 7.5MHz without overclock and without MMU. Impressive!

  • Yeah, an emulated 6502 running at 7 MHz would also be very nice!

Sign In or Register to comment.