Shop Learn P1 Docs P2 Docs
What would be a good idea for a new CPU and platform to try on a P2? - Page 2 — Parallax Forums

What would be a good idea for a new CPU and platform to try on a P2?

2»

Comments

  • @jmg said:
    When I write in assembler, I can use any radix I like, and the assembler converts that to 'native' hex.
    It's possible for source to have almost no HEX numbers.

    The biggest challenges in assembler are nothing to do with the number base, they are related to the core, opcodes and peripherals.

    Only a junior student who had never seen HEX, could have a slight benefit, but the vast majority will come from other MCUs and already know HEX.

    I agree. Let the assembler/compiler/interpreter handle converting user input into what the machine needs. If I needed to explicitly use hex, I used a chart. Nowadays, you can use whatever calculator your OS provides.

  • cgraceycgracey Posts: 13,795

    @jmg said:

    @cgracey said:
    I think it would be 10x easier for a new person to learn an assembly language if the architecture was completely decimal.

    I'm not following here ?
    When I write in assembler, I can use any radix I like, and the assembler converts that to 'native' hex.
    It's possible for source to have almost no HEX numbers.

    The biggest challenges in assembler are nothing to do with the number base, they are related to the core, opcodes and peripherals.

    Only a junior student who had never seen HEX, could have a slight benefit, but the vast majority will come from other MCUs and already know HEX.

    I mean that instead of hex, you'd have decimal digits down at the hardware level. Registers could be 8 digits DACs and ADCs could be 4 digits. Addresses could be 5 digits, however things get proportiined. The point is that everything is decimal.

  • PurpleGirlPurpleGirl Posts: 97
    edited 2022-08-29 19:32

    Here are some ideas that come to mind:

    • While not really necessary, let 0 be your NOP instruction.

    • To make porting to another design easier, such as a real hardware build with logic chips, it seems the small endian would be the most logical encoding, allowing for math to start during the next fetch.

    • For a Von Neumann machine, block instructions would be nice. That implies having at least 6 registers. You have the accumulator, 2 source indices, 2 destination indices, and a counter. Thus you can blit a block of memory fairly efficiently while using minimal source code. Depending on your microcode, you could do up to 255 or 256 that way if 8-bits and 65535 or 65536, if 16-bits, assuming unsigned for this.

    • For a Harvard design, a bit-banging appointment mode or some other sort of parallel computing. So you could send x number of bytes/words out a port from SRAM and at the same time, do register ops, manipulate a private stack or other local memory pool, receive input, or whatever.

    • A simple form of MMX math/logic should be considered. What if you could do 2 additions, subtractions, multiplications, or divisions at the same time, whether from registers, immediate operands, or memory? I don't know if this would be useful, but what if you could add 16 single bits in a single op and return 16 pairs of bits? Would that have any use? Or, if you want to bit-bang sound Gigatron style, but with a larger CPU, why not add 4 nibbles at a time and return 20 bits? Or 4 6-bit groups (24-bits) and get 28 bits out? Or hell, with that approach, use 7-bit samples? Or in that case, why not have 4-operand additions? There would be no need to work those in pairs unless one wanted 8 channels bit-banged that way. So whatever bits of 4 numbers go in and you get a single sum with up to 2 more bits. So one could use four 8-bit samples and get a 10-bit result, or if one went that far, one could have the opcode shift it 2 places to the right as well to have DAC-ready results. If one wants to break standards and to adaptive mixing, that would take other math, but that would have an audible effect (and what I don't care for in the Windows mixer, since the more sounds you play, the quieter each become).

    • Auto-incrementing memory indices should be an option. That speeds up loops since you don't need extra instructions.

    • Some out-of-order or parallel processing might be nice.

    • Features for various languages might be nice. For BASIC, instructions such as random integers, bounded random integers, binary to ASCII conversion, ASCII to binary conversion, etc., would be nice.

    • If one wants to use different instruction lengths, they would do well to put a descriptor at the beginning, such as a 2-3 bit field to give the length of the cargo. So 0 could mean 0-1 arguments, and 1 for 2, 2 for 3, and 3 for 4, or whatever. For the ambiguous one, if that is wrong, put the extra fetch into the instruction pipe for the next time and skip the fetch there.

    • There should be some external sync opcodes. So if you want to sync to the horizontal or vertical syncs, an external FPU, keyboard (full or clear), whatever, there should be instructions to do that. For a minimalistic machine that doesn't rely on interrupts much, this can give programmers more control over how their program reacts in regards to external devices.

    Other opcode thoughts and principles?

  • pik33pik33 Posts: 1,759
    edited 2022-08-28 18:34

    A native hardware or, if not yet, FPGA or P2 emulated, fast, big, B-r-a-i-n-f*** interpreter :)
    BF is a minimal, yet Turing complete [programming language
    There are actual [people who can program in it


    Edit: Links were broken by forum moderation, deleted

  • Yo, @VonSzarvas , fix the censoring from messing up hyperlinks. (or just disable it. There is no such thing as a "bad word")

  • PurpleGirlPurpleGirl Posts: 97
    edited 2022-08-28 17:41

    @pik33 said:
    A native hardware or, if not yet, FPGA or P2 emulated, fast, big, B-r-a-i-f*** interpreter :)
    BF is a minimal, yet Turing complete programming language :)
    There are actual people who can program in it:

    I'd include all the most common ones. AND is good to turn off bits, OR to turn them on, and XOR for things like clearing, inverting, and partial inversions. I don't know if I should add a Neg. Sure, you can XOR with the maximum range (all 1's) and then add 1 to it, but I don't know how often Neg would be needed for working with signed numbers.

    I wonder if a table logic/math feature would be good. Thus you get closer to native speed by doing things in batch.

  • I can only seem to arrive at a base 2x5 computer.

    5 4321
    0 0000 = 0
    0 0001 = 1
    0 0011 = 2
    0 0111 = 3
    0 1111 = 4
    1 0000 = 5
    1 0001 = 6
    1 0011 = 7
    1 0111 = 8
    1 1111 = 9

    (1 1111) (1 1111) = 99
    (1 1111) (1 1111) (1 1111) = 999
    (1 0001) (0 0111) (1 0111) = 638

    Add one more top bit for a negative sign bit to get to commercially available 16 bits, for a range of -999 to 999.

    (1) (1 1111) (1 1111) (1 1111) = -999

    Make -000 optionally exceptions like invalid or uninitialized.

    ...I'm curious as to the encoding a native base-10 computer would be.

  • pik33pik33 Posts: 1,759

    Because of the "out of order" name of the programming language I mentioned earlier, links to the Wiki page and programming example were broken so I deleted them.

    If someone is interested how to implement the Game of Life in the programming language which has only 8 simple instruction ( < ,>, +, -, ., ,, [, ] ) and seems to be one of simplest, if not the simplest, Turing complete language (~= you can write all things in it), search for Linus Akesson Game Of Life program.

    Linus Akesson is by the way the man who wrote "Turbulence" demo using a P1.

  • PurpleGirlPurpleGirl Posts: 97
    edited 2022-09-10 12:04

    @pik33 said:
    Because of the "out of order" name of the programming language I mentioned earlier, links to the Wiki page and programming example were broken so I deleted them.

    If someone is interested how to implement the Game of Life in the programming language which has only 8 simple instruction ( < ,>, +, -, ., ,, [, ] ) and seems to be one of simplest, if not the simplest, Turing complete language (~= you can write all things in it), search for Linus Akesson Game Of Life program.

    Linus Akesson is by the way the man who wrote "Turbulence" demo using a P1.

    Well, I followed those links and edited the URL, so I glossed over the article. Interesting language.

    And it is interesting to use humor, wordplay, etc., to help remember things. And in remembering what I said about XOR earlier, I sometimes think of it as a "homophobe." Opposite unions are "valid" (1), but unions of the same are "invalid" (0). I know, maybe it is best to not voice such humor and mnemonics, just like the "resistor poem" isn't PC these days. You know "Bad boys..." I will give the cleaner version here. "Big boys race our young girls but Violet generally wins." (That's less graphic and sexist than the more typical one that mentions a form of assault and ends up saying something derogatory about Violet.)

  • @whicker said:
    I can only seem to arrive at a base 2x5 computer.

    5 4321
    0 0000 = 0
    0 0001 = 1
    0 0011 = 2
    0 0111 = 3
    0 1111 = 4
    1 0000 = 5
    1 0001 = 6
    1 0011 = 7
    1 0111 = 8
    1 1111 = 9

    (1 1111) (1 1111) = 99
    (1 1111) (1 1111) (1 1111) = 999
    (1 0001) (0 0111) (1 0111) = 638

    Add one more top bit for a negative sign bit to get to commercially available 16 bits, for a range of -999 to 999.

    (1) (1 1111) (1 1111) (1 1111) = -999

    Make -000 optionally exceptions like invalid or uninitialized.

    ...I'm curious as to the encoding a native base-10 computer would be.

    BCD is more compact. 1001 is the greatest valid number per nibble, and if you get more than that, you carry by adding 6. So 99 will fit in a byte as 10011001. If anything over that is presented as an operand, then the NAN exception flag should be set.

  • pik33pik33 Posts: 1,759

    just like the "resistor poem" isn't PC these days.

    Now I had to find this poem in its original version (found :) ) I live in Poland so we of course have our own poems for such purposes... but I can't remember any for resistor code colors in Polish so let's memorize the English version. This thing can be useful when I need a resistor :)

  • jmgjmg Posts: 14,968

    @cgracey said:

    @jmg said:

    @cgracey said:
    I think it would be 10x easier for a new person to learn an assembly language if the architecture was completely decimal.

    I'm not following here ?
    When I write in assembler, I can use any radix I like, and the assembler converts that to 'native' hex.
    It's possible for source to have almost no HEX numbers.

    The biggest challenges in assembler are nothing to do with the number base, they are related to the core, opcodes and peripherals.

    Only a junior student who had never seen HEX, could have a slight benefit, but the vast majority will come from other MCUs and already know HEX.

    I mean that instead of hex, you'd have decimal digits down at the hardware level. Registers could be 8 digits DACs and ADCs could be 4 digits. Addresses could be 5 digits, however things get proportiined. The point is that everything is decimal.

    but the hardware level is wasteful and does not perform as well. So who would pay for such a more expensive and not second sourced device ?

    You can see the effect on memory : 2^32/10^8 = 42.94967296 that's an enormous level of waste.
    Making Decimal ADC or DAC also needs losing the easy to scale 2:1 ratio on adjacent elements, as well as needing more connections to just get the same precisions.

    Then you have external upgrades ? What if your internal ADC is not quite enough for a project and an external ADC is needed.
    The code now has to radically change, to read from an external ADC/DAC.

  • Beau SchwabeBeau Schwabe Posts: 6,497
    edited 2022-08-28 21:30

    @cgracey said:
    There's no really fast way to detect Tri-state. It takes time and testing. Better to hold all data affirmatively.

    • perhaps not tri-state but a window comparator at some 'standardized' voltage level. Think CANBUS as far as a physical layer.

    We have a multi-drop COM link with over 300 nodes that uses a CANBUS physical layer capable of communication speeds of up to 2 Meg baud (<--Tested over a distance of 200 feet , could be more). The COM link uses a 24V system and "center" or Mark position needs to be at 12V +/-2V for the window to detect a valid Mark. The COM link is ratiometric and will work down to 5V levels.

    Note: It is important to choose a comparator chip with a fast slew rate. Not sure what the slew rate is of the P2 smart pins in a comparator mode.

    Note: You should look into QAM communication also .... it uses an expansion of this idea "comparator and standardized voltage levels" combining two channels and calling them "constellation points" . For the most part is how Cable TV works.

    Reference:
    https://en.wikipedia.org/wiki/Quadrature_amplitude_modulation

  • hinvhinv Posts: 1,190
    edited 2022-08-29 01:44

    @cgracey said:
    I think it would be 10x easier for a new person to learn an assembly language if the architecture was completely decimal.

    So do you mean twice as fast? ;^) Or maybe you mean sixteen times easier.
    lol

  • hinvhinv Posts: 1,190

    @PurpleGirl said:

    @pik33 said:
    Because of the "out of order" name of the programming language I mentioned earlier, links to the Wiki page and programming example were broken so I deleted them.

    If someone is interested how to implement the Game of Life in the programming language which has only 8 simple instruction ( < ,>, +, -, ., ,, [, ] ) and seems to be one of simplest, if not the simplest, Turing complete language (~= you can write all things in it), search for Linus Akesson Game Of Life program.

    Linus Akesson is by the way the man who wrote "Turbulence" demo using a P1.

    Well, I followed those links and edited the URL, so I glossed over the article. Interesting language.

    And it is interesting to use humor, wordplay, etc., to help remember things. And in remembering what I said about XOR earlier, I sometimes think of it as a "homophobe." Opposite unions are "valid" (1), but unions of the same are "invalid" (0). I know, maybe it is best to not voice such humor and mnemonics, just like the "resistor poem" isn't PC these days. You know "Bad boys..." I will give the cleaner version here. "Bad boys race our young girls but Violet generally wins." (That's less graphic and sexist than the more typical one that mentions a form of assault and ends up saying something derogatory about Violet.)

    I grew up on Wisconsin, so i was taught "Blatz beer rots our young guts but Vodka goes well"

  • @pik33 said:
    Edit: Links were broken by forum moderation, deleted

    If you'd still like to include them, feel free to pm me the links, perhaps with dashes between the letters of the word you think is problematic. Always happy to help if improvements can be made.

  • Scratch that... found it!

    Yeah, can't do much about that unfortunate name.

    Seems like anyone could find it by searching for: Turing brainf***
    No need to include links / etc.. here IMO.

  • ersmithersmith Posts: 5,446

    @pik33 said:
    If someone is interested how to implement the Game of Life in the programming language which has only 8 simple instruction ( < ,>, +, -, ., ,, [, ] ) and seems to be one of simplest, if not the simplest, Turing complete language (~= you can write all things in it), search for Linus Akesson Game Of Life program.

    That language is pretty simple, but it's still pretty far from the simplest. The SKI combinator language is pretty popular and has just 4 symbols (3 functions S, K, and I, and one symbol ` to denote function application); the "I" function is actually redundant, so you can get away with 3 symbols. A propeller implementation of the SKI calculus is available at https://github.com/totalspectrum/proplazyk It comes with a compiler from LISP to SKI, and the original LazyK compiler could do the old Colossal Caves adventure (that's too big to fit on the Prop, unfortunately, although it might work on the P2).

    Doing something radically different like a processor based on SKI or lambda calculus would be very interesting -- functional programming is incredibly well suited for parallel evaluation.

    (The smallest Turing complete language has two symbols, iota (a function) and a function application symbol. It makes sense that 2 is the minimum number of symbols :).

  • PurpleGirlPurpleGirl Posts: 97
    edited 2022-08-29 17:57

    Okay, I am going more after what already is used, but in different ways, with some of the better practices.

    For instance, since I'd be going after a Von Neumann model, block operations, array operations, and some minor-MMX capabilities would be nice. That helps get the instruction fetches out of the way and allows for tighter code.

    So I'm more after what was available in the '80s, but with my own spin on it and using some slightly newer features. Since I'd be doing it on a P2, sure, have at least multiplication, division with modulus, random numbers, etc., and a few FP features might be good. Auto-incrementing is good. Loops might be nice. And for a Page 0 if one is used or small enough one is used, it would be nice to have what the 65CE02 had and let you have a register to move it around. And I'd probably want to include external process sync commands, so essentially, spinlocks as opcodes. (The x86 has that, such as Halt and FWait. So the CPU pauses to let something finish.)

    So I'm not after weird bases, Forth-only, OISC (like MyNOR), serial computing, 1-bit computing, light waves, etc. What gets me are the number of folks trying those who think they are the only one.

    I'm not sure if I'd have BCD. That can cut into your critical path if you are not careful. If one doesn't use that much, then it is a safe set of instructions to require more cycles to complete. The discrete 100 Mhz TTL/CMOS 6502 board that Drass over at 6502.org is making will not be cycle correct for BCD math. He either gets to slow the clock or make that cycle accurate. Since I can't think of a case for using that for bit-banging, then making it take longer shouldn't be an adverse thing. When you go that fast, you have to rethink many things, such as having separate AU and LU units. When he made the 20 Mhz one, he used the carry-skip adder arrangement (3 adders and a mux to do the job of 2 adders, but faster). But going to 100 Mhz, the nibble adders were no longer fast enough, even with the carry-skip hack, so he ended up using many transparent latches to do the job, and can add 8-bit numbers in maybe 6.4 ns. And at that speed, your ALU can't really be slower than 2/3 the time for the entire stage it is in.

    The only reasons I'd see for BCD would be either if you need super-accurate accounting software or if you want scores in games and don't really have the power to convert to ASCII from binary. If you have division with remainder, then you can loop and use the remainders of a /10 division to build strings from right to left (after adding to convert the individual numbers to string characters).

  • Cluso99Cluso99 Posts: 18,063

    @cgracey said:
    Imagine an assembly language in base 10. Instead of numbers of bits, you have numbers of digits. An ADC might be 3 or 4 digits. It would make it really easy for people to learn assembly language.

    Take a look at the Singer/ICL System Ten (1970-1981) and System 25 (1981-1996/2000?). Both were pure decimal computers using assembler.

    System Ten had 16 instructions in the Model 22 version. It was a 6bit ASCII and instructions were 10 characters (bytes) long, decimal aligned. Instructions referenced two decimal addresses and operations were A to B.
    Add could add two numbers A + B and the decimal result placed in B.
    It had 20 partitions max which are similar to cogs, operating in hardware time slices.
    Each partition had its' own memory space (cog memory) and common memory (hub memory).
    Multiply could decimal multply A 1-10 decimal digits by B 1-10 decimal digits and the result was placed in B for a total length of 20 digits so overflow was not possible. Division was basically the reverse.
    You could move 1-100 characters.
    There were 3 hardware index registers that could modify the addresses, as well as indirection.
    Branches (jumps) were based on conditions, and the link (call) instruction was like the P1 where A was the 4 character address where the return address would be written and the B address was the goto address (mod 10).

    The assembler was extremely powerful with extensive macro expansion. It came with an OS (DMF, DMF II and later DMF III).

    When I first found the P1 I though how similar the P1 was to the System Ten in so many ways.

    BTW I wrote a System 25 emulator in 486 assembler that was validated but never sold.

  • Let me put the opening premise differently. If you wanted a 6502 to be different and design a platform around it, what would you change?

    • I'd want opcodes that didn't exist but were common to old BASIC programs. So hardware multiplication and division, hardware random numbers (the C64 could let you get them from the SID, though the quality wasn't that great). In the x86, you had a costly RND function in software and it was usually 3 looped divisions with the state maintained for next time, and this could cost 1000 or more cycles, depending on the machine. If the CPU was V20/V30, 186 (in some that weren't 100% IBM compatible), 286 or higher, you were a little better off, as all of those had hardware multipliers. If you were stuck with 8086, or worse, 8088, then it took a long time. At least the 286 not only had a hardware multiplier and 2 ALUs (the main one and one for memory), but it lacked the 8-16 bus bottleneck and the multiplexing bottleneck. They used the same lines for address and data. This added an extra cycle since you had to use registers on the board to separate out the signals, sending the address first and then the data, and maybe a ready line to let the CPU know when the transfer was completed).

    • Maybe double the bus and ALU. So keep the instruction size as it is except maybe allow 24-bit immediate operands (or 16-bits and up to 256 cog registers). So imagine having up to 128K for the "Zero Page."

    • Add moves that auto-increment the memory indexes.

    • I/O traffic and spinlock instructions. That helps with syncing to external devices. While these should be used sparingly, there are times they are needed, and programmers should have the freedom to decide when they are needed. An example would be the FWAIT instruction on x86. That was for when a real FPU was installed. So if you needed the result for a CPU op, and you weren't sure the FPU would finish in time, you'd add the FWAIT to act as a spinlock so that the CPU would halt until the FPU was finished. You don't need that all the time, just when you have a risk of race conditions between the 2 devices. For instance, if you start an FPU operation long before you need it, enough time would have elapsed to where the data is mostly guaranteed to be good. So if you do it that way, you take less time overall, since the FPU and CPU can calculate at the same time. But if you cut things close, then while the result will be faster than having the CPU do it with code, you wouldn't be as fast as you could be. Now, a compiler library will assume that FWAIT is always needed, while a seasoned assembly coder would structure the code to ensure both speed and safety.

    • For I/O, I'd use bus-snooping when possible. Of course, the problem back in the day was the scarcity and expense of memory. For instance, if the Antic chip for the Atari 800 could snoop the bus (and SRAM were used), the Sally 6502 would never have been needed as there would be no hardware races. Antic could have simply used its RAM with the copy of the display list in it, and if something started to overwrite it before it was done, it could have simply toggled the Ready line on the 6502. Just hold the offending write in a register, toggle the Ready line, then store that in its own memory when safe to do so, and then let the CPU continue. But, there was another reason they did it as they did and needed bus-mastering. That was due to using DRAM. Not only did Antic need DMA, but so did the RAM refresh circuitry.

  • Or, why not a P2 design that balances instruction speed and memory throughput? For instance, it could use a "near-native" core, but only for 8-16 bits in terms of instructions. And since I'd want to make using parallel SRAM possible, it should be made to require word alignment. So using up to 24-bit immediate arguments might be an option, or letting 8-bit and 24-bit be the only immediate argument sizes (to account for alignment). I'd have a number of instructions that employ things such as block commands, loops, and many standard functions, subroutines, and calls. As far as that goes, why not have the most used instruction groups as opcodes?

    Speaking of which, what are instructions that folks use together the most? If one knew that, they could create opcodes that use all of those. And a compiler could be made to recognize opcode patterns and replace those with single opcodes. Thus this functions as a form of compression for speed, thus making up for slow memory speed and yet giving back much of the overhead used to translate instructions.

  • Christof Eb.Christof Eb. Posts: 695
    edited 2022-09-16 11:39

    @cgracey said:
    Imagine an assembly language in base 10. Instead of numbers of bits, you have numbers of digits. An ADC might be 3 or 4 digits. It would make it really easy for people to learn assembly language.

    I have been thinking a lot about this statement. hex is only needed to read hex dumps, you can then again read the contents.

    What I find the most difficult part in learning assembler, is that all documents are written in an extremely condensed form. You have to read and understand every single word and sign. There is no redundancy in the documents. Either you get it or you are lost. This is true for all datasheets of the different brands. But top notch is https://docs.google.com/spreadsheets/d/1_vJk-Ad569UMwgXTKTdfJkHYHpc1rZwxB-DcIiAZNdk/edit#gid=0 I think, that documents explaining with some examples are very helpful not only for me.

  • MicksterMickster Posts: 2,222

    For one, I wanna see the Propeller kick butt.

    If I posessed the skills, I would be emulating the RPi Pico. Be compatible with the ever expanding range of goodies while offering the ridiculous P2 capabilities.

    Ride their coat-tails.

    Hook up with Pimoroni and Pi Hut.

    Craig

  • @Mickster said:
    For one, I wanna see the Propeller kick butt.

    If I posessed the skills, I would be emulating the RPi Pico. Be compatible with the ever expanding range of goodies while offering the ridiculous P2 capabilities.

    Ride their coat-tails.

    Hook up with Pimoroni and Pi Hut.

    Craig

    Hi,
    yes, it's interesting to go into a direction, where there is software.
    As far as I understand Eric has got a fast emulator for risc V, using that with GCC is astonishingly as fast as native P2 compiled code. So perhaps this could be brought to some use somehow?
    Christof

  • I've always thought the FV-1 from Spin Semiconductor would be interesting to emulate with a P2.
    Small instruction set. Use the onboard ADCs and DACs. Good path forward to even exceed the FV-1 (larger program size, more 'cores', more audio channels).

Sign In or Register to comment.