RISC V ?

Heater. · 2018-04-28 08:38

That's exactly what my suggestion is.

A currently unused bit in the ADD, SUB, etc, instruction encoding can steer the destination to one of 32 thirty-two bit output ports rather than a register.

Similarly another two bits could cause reads from one of 32 thirty-two bit input ports rather than registers.

That is direct instruction operation on ports without going through memory space with loads and stores. As in the Propeller.

Setting a pin would be:

or io1, io1, r1          // Assuming r1 has a bit set for a pin

Clearing a pin is:

and io1, io1, r1        // Where r1 has the inverse pattern.

Not sure I see the need for instructions that work on single pins if we can do the above.

Sadly I did not get very far with my XMOS dev boards, after some initial tinkering. Clocked I/O and such gets us into a whole other world...

ersmith · 2018-04-28 10:02

I think it's better to put the pin registers in CSRs rather than memory mapping them. There are already standard instructions to do AND and OR on CSRs directly (CSRRC and CSRRS).

Heater. · 2018-04-28 10:17

So they do. Cool. I have hardly looked at the CSR stuff yet.

Certainly we want to get away from memory mapping them.

Ale · 2018-04-30 09:11

There is space for 4096 CSRs but some of them are already defined. The advantage of a RISCV core for the P3 would be a proven design without tiered memory with an established working ecosystem of tools. OTOH, one would need quite a bit of extensions to support what the P2 does in terms of assembly language.
Ataradov's RISCV doesn't implement CSR support. It seems easy to add, I just wonder if such a proto-P3 is the way to go now. I like the idea already

, but I think we would end in something like the epiphany core: it needs some serious infrastructure to get it going, something the P1 doesn't and the P2 probably would not need either.

jmg · 2018-04-30 21:39

Ale wrote: »

There is space for 4096 CSRs but some of them are already defined. The advantage of a RISCV core for the P3 would be a proven design without tiered memory with an established working ecosystem of tools. OTOH, one would need quite a bit of extensions to support what the P2 does in terms of assembly language.

That depends if you want to mix RISCV and P3 cores ?
If you have some COGs there, RISCV needs only be able to support some of the WAIT opcodes.
Or, you could choose to split RISCV into a large-memory version, and smaller memory versions with lockable caches that become new cogs... That would need more opcode work.

ersmith · 2018-05-01 13:07

jmg wrote: »

Ale wrote: »

There is space for 4096 CSRs but some of them are already defined. The advantage of a RISCV core for the P3 would be a proven design without tiered memory with an established working ecosystem of tools. OTOH, one would need quite a bit of extensions to support what the P2 does in terms of assembly language.

That depends if you want to mix RISCV and P3 cores ?
If you have some COGs there, RISCV needs only be able to support some of the WAIT opcodes.

I think the idea is that the P3 COGS would themselves be RISCV cores, that is they would use the Risc-V instruction set (suitably extended, of course). That way all the Risc-V tools (assembler, linker, languages like C, C++, Java, Rust, etc.) would be available right from the start. We'd want to enhance those tools to add the P3 specific features (like pin control, waiting, and so forth) but that's a lot less work than starting a new instruction set from scratch.

The P2 instruction set has a lot of redundancy, frankly, and most of the instructions aren't really needed. They're convenient for assembler programmers, but as memories get bigger and chips get faster less and less work is done directly in assembler. It would be interesting to factor out what's really key to the "Propeller" experience and add just those instructions / registers to a Risc-V.

My take on doing a P1V type core with Risc-V ISA:

1. OUTA, DIRA, INA, and all other pin control registers would be control/status registers (CSRs)
2. Ditto for the rest of the P1 hardware registers
3. Risc-V already has the equivalent of CNT, and a register holding the COG id, both also in CSRs.
4. Risc-V already has a full set of math instructions (add, sub, shift, etc.). The ones P1 has beyond these can be synthesized from Risc-V ones, so they can be implemented as macros.
5. coginit/cogstop could be done as subroutines that manipulate some internal registers.
6. Equivalents of lockclr/locknew/lockret/lockset could be implemented with the Risc-V atomic read/modify/write instructions.
7. movd/movs/movi would be left out (no self modifying code)
8. muxc/muxz/etc. would also be left out (no C or Z bits in Risc-V); programmers would use and/or instead
9. waitcnt/waitpeq/waitpne/waitvid would become custom instructions
A. djnz/tjnz could be left out initially (Risc-V has a compare and branch that does most of these). There is a proposal apparently for DSP extensions to Risc-V that includes a zero overhead looping construct; we could consider including that extension

The overall architecture would have 8 Risc-V cores, acting very much like COGs. Each core would have a 2K (or so) instruction cache, a 2K (or so) scratchpad RAM that it has exclusive access to, and slower access to a shared (HUB) memory. As in P1 COGs, there would be no interrupts.

Have I missed anything? I don't think any of that is particularly outlandish, nor would there be much work required on the Risc-V tool side (just adding the few new custom instructions for waitxxx). Heck, we could even implement the wait instructions implicitly as writes to special CSRs, which would mean the standard Risc-V toolchain could be used for everything.

P2 has a few new features. Frankly only the interrupts and smartpins represent a real change to the core abilities, and from the core perspective the smartpins are just exposed as additional registers to manipulate.

Heater. · 2018-05-01 18:38

The original idea in my opening this thread was to drop a RISC V core into a Propeller design as a good way to get to use existing C/++ and other language compilers and tools on the Propeller. The RISC V core would run the big application C code, the COGs would do what they do for real-time, deterministic work.

This was just a playful idea with no expectation of t happening.

As the months go by, it looks like a more attractive idea to use the RISC V instruction set for all the COGs. No "master" RISC V core, all COGs equal, like the current prop.

I guess this is really never going to happen. Chip has just spent 10 years refining the P2 instruction set. I cannot imagine he would want to throw it all away.

As Eric points out the RISC V instruction set would need some extension to handle some Propeller features. That is OK. RISC V is designed to be extensible/customized.

And as Eric says, high level languages will never used most of the P2 instructions.

As a practical matter. Which is faster? My picorv32 on a DE0-nano runs at about 25MIPS. There are other RISC V designs that do better, at the cost of timing determinism. What is the speed, instructions per second, of a P2 on a DE0-nano?

Ale · 2018-05-02 09:23

I thought P2 images run at a max 80 MHz with 2 cycles per instruction meaning up to 40 MIPS.

ozpropdev · 2018-05-02 10:52

Ale wrote: »

I thought P2 images run at a max 80 MHz with 2 cycles per instruction meaning up to 40 MIPS.

That's correct.
And for a brief moment the FPGA's ran P2 @ 120MHz with 60MIPS. :cool:

ersmith · 2018-05-02 15:42

Ale wrote: »

I thought P2 images run at a max 80 MHz with 2 cycles per instruction meaning up to 40 MIPS.

Of course it's hard to compare "instructions per second". Sometimes Risc-V instructions do more (like "add A,B,C" is one instruction in Risc-V, two in P2). Sometimes P2 instructions do more (some of the P2 bit manipulation instructions could replace 2 or 3 Risc-V instructions).

The real test would be getting fftbench or something similar running on both setups.

jmg · 2018-05-02 21:47

ersmith wrote: »

The real test would be getting fftbench or something similar running on both setups.

yes, and also the FPGA-LUT usage of each core needs to be defined.
I see this topic for RISC-V "Fast Interrupts for RISC-V", which means something like the P2 ROM could be ported to RISC-V as another benchmark.

Heater. · 2018-05-02 21:56

Why would a P2 like RISC V machine need LUT?

What about the P2 MIPS when running from HUB RAM?

jmg · 2018-05-02 22:22

Heater. wrote: »

Why would a P2 like RISC V machine need LUT?

I was meaning the FPGA logic fabric usage, not the P2 LUT RAM, so edited my post a little...

Heater. · 2018-05-02 22:37

Ah, I see.

I forget exactly the LUT usage of the picorv32 I was playing with last year but it was less than 10% of a DE0-nano. I remember thinking 8 of them should fit in there.

jmg · 2018-05-02 22:38

Some interesting Embedded CPU features mentioned here, in a new MIPs MPU - and what they did to move toward hard real time...

http://www.eejournal.com/article/mips-i7200-breaks-the-chain/

relevant bits
"Like its more conventional brethren, the I7200 does multithreading, which has become a MIPS hallmark. The CPU core can handle up to nine threads and switch between threads with zero overhead. Snippets of code can also be preloaded and “parked,” ready for instant deployment in the case of an interrupt handler or a high-priority task. This feature, combined with new scratchpad RAMs that bypass the cache, is designed to make the I7200 more deterministic – another important feature for exotic 5G or LTE Advanced modems."
The new MIPS I7200 marks the debut of a brand-new instruction set called nanoMIPS, and it’s – gasp! – variable-length. The new nanoMIPS ISA is about 12% smaller than ARM’s Thumb2, and a good 15% to 20% smaller than MIPS16e,

Addit: The venerable 8051 is a variable length opcode, of 8,16,24 bits, and some of the very newest 8051 designs execute all of those in 1 SysCLK. Most Microcontrollers have fast local RAM.

Cluso99 · 2018-05-03 01:29

jmg,
You beat me to it. Was just about to post the link

Heater. · 2018-05-04 00:12

I almost stopped reading the article at "The High Sparrow and Lord Protector of RISC canon, MIPS Technologies, has decided that the RISC code is more what you’d call guidelines than actual rules."

Since when did MIPS "own" the RISC idea? They just followed what Patterson and co. described at the time. As did SPARK, ARM, IBM's POWER, Intel's i860 etc, etc.

That's not to say they don't have some neat ideas there to increase timing determinism and reduce latency, like bypassing the cache and so on.

The big idea there seems to be that of a compressed instruction set. As per ARM Thumb. Which is good, saves memory and cache space and memory bandwidth.

Except they are making the compressed ISA compulsory unlike the optional ARM Thumb.

Is it better than the compressed ISA spec. of ARM or RISC V?

Anyway, it's all useless until there is support for the ISA from GCC or Clang/LLVM. It's certainly not something open for use in a future Propeller.

jmg · 2018-05-04 00:28

Heater. wrote: »

I almost stopped reading the article at "The High Sparrow and Lord Protector of RISC canon, MIPS Technologies, has decided that the RISC code is more what you’d call guidelines than actual rules."

Well, yes the prose was of a certain style...

Heater. wrote: »

That's not to say they don't have some neat ideas there to increase timing determinism and reduce latency, like bypassing the cache and so on.
The big idea there seems to be that of a compressed instruction set. As per ARM Thumb. Which is good, saves memory and cache space and memory bandwidth.
Except they are making the compressed ISA compulsory unlike the optional ARM Thumb.
Is it better than the compressed ISA spec. of ARM or RISC V?

Some numbers are quoted, if you wade thru the prose far enough.
Locking or bypass of cache has been done before, and local RAM is also seen in microcontrollers.
Variable length opcodes have been around since CISC , but it is interesting they focused on a optimal merge of features, over backward compatible & put all of this in a MPU level part.

From the sounds of it, they have a large customer which lets you do that type of design. (result is smaller silicon)

It would be interesting to see real devices, if they ever appear on the open market.
Tools will come, as they always do..

Imagine if Microchip released a 2GHz version of this, in a Raspi form factor ?

Heater. · 2018-05-04 02:08

I cannot see this as an MPU level part. It can run Linux for goodness sake!

My prediction is we will never see them available on the open market.

Either way, it's useless, it's not an instruction set open for use. No more use than x86 or ARM. It's not something Parallax could consider adding to a future Propeller.

jmg · 2018-05-04 02:51

Heater. wrote: »

I cannot see this as an MPU level part. It can run Linux for goodness sake!

?
Err, yes, I did say MPU not MCU ?
This is what most understand by MPU, and yes, running Linux is quite common.

Heater. · 2018-05-04 09:04

jmg,

Sorry, yes, I'm getting my wires crossed. My synapses are collapsing after 5 days of hacking C++ 20 hours per day.

Mind you the lines between micro-processor, micro-controller, system on a chip, and such are pretty blurry now a days. What with being able to cram so much into so small chips.

It would be interesting to see a comparison of the code sizes of the new MIPS instruction set vs the compressed RISC V ISA. Which beats out even ARM Thumb.

The rest of those new MIPS features could be implemented in a RISC V core of course.

jmg · 2018-08-27 20:20

Another press release on RISC-V

https://www10.edacafe.com/nbc/articles/1/1608832/GOWIN-Semiconductor-Corp.-Announces-RISC-V-Microprocessor-Implementation-GOWIN-FPGA-Solutions-Expands-Sales-Channels-Americas-Region

No prices or MHz there, but it does include this note
" The Arora® family is also the first FPGA with embedded DRAM in the industry, allowing customers to design without using up I/O for external memory."

Google does find this
https://shop.trenz-electronic.de/en/Products/Programmable-Logic/Gowin-Arora/

http://www.gowinsemi.com/product/arora/

Looks like that GW2AR-18 has
LUT4 20,736
Flip-Flop (FF) 15,552
Shadow SRAM S-SRAM(bits) 41,472
Block SRAM B-SRAM(bits) 828K
Number of B-SRAM 46
Embedded DDR/SDR SDRAM 128M DDR
18 x 18 Multiplier 48
PLLs+DLLs 4+4
I/O Bank Number 8
Max. User I/O 317
Core voltage 1.0V

Heater. · 2018-08-28 03:27

Wow, a whole new FPGA vendor on the block. Interesting. I was thinking about getting adventurous and looking into Microsemi FPGA who also have a RISC V soft core. Now there is another option to consider.

I'm prompted to feel adventurous by the purchase of Altera by Intel.

That Gowin Arora dev board is a bit odd. It has 80 pins coming out to headers via ten 74HC245 octal bus transceivers. Why? That means you can only define I/O pins in banks of 8.

Disturbingly in the Arora User Guide there is a table of all pins and it labels all those pins as I/O = "O". Surely that is a mistake and they are not just output only?

Anyway, you have reminded me I should get back to my RISC V core project, it's been languishing too long...

jmg · 2018-08-28 04:29

Heater. wrote: »

That Gowin Arora dev board is a bit odd. It has 80 pins coming out to headers via ten 74HC245 octal bus transceivers. Why? That means you can only define I/O pins in banks of 8.
Disturbingly in the Arora User Guide there is a table of all pins and it labels all those pins as I/O = "O". Surely that is a mistake and they are not just output only?

Even more strange, is those HC245's have DIR wired to Vcc.
I wondered if they were a lazy place keeper for a series-mosfet translator, but I cannot find those in HC245 pinouts ?

Heater. · 2018-08-28 07:04

That is very strange.

I had a scan through the user manual earlier and was wondering why it said "provides a hardware development and test platform for LED users" and " can be used to test hardware environment in LED display applications, and Ethernet data transmission."

Now I see in Table 2-2 it describes those I/O pins as:

User extension GPIO I/O
used for testing LED display
80 GPIOs
5V

They realy have designed that board for a very limited set of use cases. Driving LED panels and such.

Which makes it useless as general purpose development/evaluation board. Unbelievable. It would have been an expensive mistake to eagerly buy that board without checking first.

On the other hand they have been kind enough to publish the full schematics so we could cook our own little board. If it's possible to get hold of the chips that is ...

jmg · 2018-08-28 07:33

Heater. wrote: »

...
Which makes it useless as general purpose development/evaluation board. Unbelievable. It would have been an expensive mistake to eagerly buy that board without checking first.

On the other hand they have been kind enough to publish the full schematics so we could cook our own little board. If it's possible to get hold of the chips that is ...

Looks like you could lift pin1, and wire to GND to flip directions in groups of 8, but that's clumsy. Shame, as there are 20 pin, bidirectional level shifters they could have used.
The rar files indicate it's designed in Mentor expedition, so not an 'open platform' by any measure.
I did find a BOM in XLS and 2 forms of netlist, and the gerbers import with just a few errors, so someone motivated could use that to clone the 4L PCB in KiCad.
I did not find any XYR placement info.

Heater. · 2018-08-28 07:55

Too much hassle. That's a fail then.

jmg · 2018-08-28 08:24

On the plus news, I notice the smaller Lattice parts are now in stock at Digikey. P1V region parts.

 Part Number            Stock             Price               LAB       LE      RAM bits I/O     Package
LFE5U-12F-6BG256C	119 - Immediate  $5.05000/1	ECP5  3000	12000	589824	 197	 256-CABGA (14x14)
LFE5U-25F-6BG256C	104 - Immediate  $9.29000/1	ECP5  6000	24000	1032192	 197	 256-CABGA (14x14)
LFE5U-45F-6BG256C	  0 9 Weeks     $15.86000/1	ECP5  11000	44000	1990656	 197	 256-CABGA (14x14)
LFE5U-85F-6BG381C	64 - Immediate  $31.26000/1 	ECP5  21000	84000	3833856	 205	 381-CABGA (17x17)

and  at the 3k column that's nudging price parity with P1, and ahead if you factor in $/Memory or $/io
P8X32A-Q44	        Parallax Inc.	IC MCU 32BIT 32KB ROM  32io  44LQFP	        2,827 - Immediate  $7.48800/3k
LFE5U-25F-6BG256C	Lattice 	IC FPGA	              197io  256-CABGA (14x14)    104 - Immediate  $7.85170/3k

LFE5U-12F-6MG285C	12K LUTS, 118io   $4.98940/168   285-CSFBGA (10x10)    Mouser : 160 1 $5.85  25 $5.13  100 $4.75
LFE5U-25F-6MG285C	24.3K LUTS, 118io $9.23042/168   285-CSFBGA (10x10)    Mouser : 283 1 $12.00 25 $10.55 100 $9.75


Eval Boards
FE5UM-45F-VERSA-EVN	DEV KIT FOR ECP5 VERSA	14 - Immediate $208.77000	
LFE5UM5G-85F-EVN	ECP5 EVALUATION BOARD	19 - Immediate  $99.99000

Expandable usability – Board features the ECP5 85K LUT FPGA in the 381-ball caBGA package (LFE5UM5G-85F-8BG381)
and the ability to expand the usability of the ECP5 with Arduino, Raspberry Pi, Digilent Peripheral Module (Pmod™), Microphone Daughter Card (MDC) and general purpose I/O expansion headers.

Rapid prototyping of user functions – Features of the ECP5 Evaluation Board can assist engineers with the rapid prototyping and testing of their specific designs.

Features
ECP5-5G FPGA (LFE5UM5G-85F-8BG381)
USB-B connection for device programming and Inter-Integrated Circuit (I2C) utility and future capability to support Improved Inter-Integrated Circuit (I3C)
On-board Boot Flash – 128 Mbit Serial Peripheral Interface (SPI) Flash, with Quad read feature
8 input DIP switches, 3 push buttons and 8 LEDs for demo purposes
Multiple reference clock sources

I added the 10x10 mm package above, as that's the most appealing for a FLiP style P1V module - Mouser show stock.

Heater. · 2018-08-28 10:02

Cool. If I were go Lattice I would want the parts that can be configured with Yosys.

jmg · 2018-09-27 20:45

another news item on Risc V

https://riscv.org/2018/09/engadget-article-huamis-new-watch-and-bracelet-are-coming/

"At the same time the new product was released, Huami also disclosed that it was developing the “Huangshan No. 1” AI wearable device chip for the first time. This 55nm process has a maximum frequency of 240MHz, equipped with 608kB SRAM, and integrates the controller of the Always On module. It is worth mentioning that “Huangshan No. 1” uses the RISC-V instruction set, and Huami claims that its computational efficiency is 38% higher than the Arm Cortex-M4 architecture. The chip has been successfully streamed and is expected to be released in the first half of 2019."

RISC V ?

Comments