RISC V ?

potatohead · 2018-10-28 20:35

Or, something gets made to use the features more directly.

In the near term sense, inline PASM will get it done.

Heater. · 2018-10-28 21:32

ersmith,

...fastspin is now comfortably ahead of PropGCC on P1, woo-hoo!

Go fastspin!

Does your picorv32 have hardware multiply?

The timings reported above are with hardware multiply. And a barrel shifter.

I thought I'd rebuild hardware and firmware without hardware multiply and report the result. But somehow just now I cannot get it to work. Which is odd because I have notes about how many LE's were used with an without hardware multiply from a few days ago, when everything worked fine. Grrr...

...as you know, the RISC-V instruction set is very well designed

Actually, I do not know.

As I'm struggling with my own RISC V design in HDL I wonder what craziness there is in the RISC V decoding of immediates. For example.

The way I see it now is:

There are things high level language and compiler creators can do. Which sometimes is not well supported by the hardware instruction set. Think stacks and subroutines or array indexing and so on.

There are things the hardware guys can do that the high level language and compiler creators cannot support very well. Think, VLIW, parallel processors and so on.

Perhaps, possibly, maybe, the likes of the RISC V instruction set hit that sweet spot where we can make best use of compiler technology and hardware resources.

It's only taken 50 years of experimentation in hardware and software to arrive at the sweet spot in the RISC V !

Heater. · 2018-10-28 21:46

potatohead,

Or, something gets made to use the features more directly.

That is the age old prayer of hardware designers. Look we have made this super fast hardware, all it needs you compiler guys to make it useful. It's just a matter of software.

Turns out it's not always possible. See the history of the failure of the Intel 860 and Itanium for example.

In the near term sense, inline PASM will get it done.

Yep.

That kind of worries me though. Not many MCU users will want to be getting there heads around the hundreds of obscure P2 instructions and related hardware.

There had better be enough P2 nerds to wrap all that PASM stuff into libraries/objects people can use.

msrobots · 2018-10-28 22:21

well and here comes my curious mind and says that it has to be IMPOSSIBLE to use TAQOZ as glue between HLL and the P2 hardware, because, after all my begging @"Peter Jakacki" still does not see the value of having running TAQOZ as a subsystem WHILE running software in other COGS written in other languages.

The debugger-part of the ROM provides nice entry parts to use the existing code like a bios call. It will be in every ROM and can be used for file access, for example.

TAQOZ will also be in ROM and using TAQOZ one-liners out of other languages could be the needed glue to all other languages, and would even be shareable between them,

But for that TAQOZ needs to coexist with other stuff in HUB. And that seems not to be planned in in any way, shape or form.

Mike

Heater. · 2018-10-29 00:12

ersmith,

OK, I got it. I managed to get it working with no hardware multiply. There seems to be some weirdness going on in my firmware Makefile when switching between hard and soft multiply that I have to sort out.

The result:

1024 point bit-reversal and butterfly run time = 93150us

Four or five times slower.

jmg · 2018-10-29 00:22

Heater. wrote: »

potatohead,

Or, something gets made to use the features more directly.

That is the age old prayer of hardware designers. Look we have made this supper fast hardware, all it needs you compiler guys to make it useful. It's just a matter of software.

Turns out it's not always possible. See the history of the failure of the Intel 860 and Itanium for example.

In the near term sense, inline PASM will get it done.

Yep.

That kind of worries me though. Not many MCU users will want to be getting there heads around the hundreds of obscure P2 instructions and related hardware.

There had better be enough P2 nerds to wrap all that PASM stuff into libraries/objects people can use.

I’m not sure those failures are so much due to HW/SW boundary issues, but more due to the fundamentals of critical mass. Mass market MPU’s is a brutal place - you need not just the first part, but you need to follow the technology curve. The massive costs there mean not all survive.

Fortunately, P2 is well clear of that space. The ideas are leading edge, but the tech is trailing edge, so Parallax do not need billions in sales to survive.
The software side of P2 is already looking very good, and the smart pin cells are unique enough new users will learn new tools.

I’d agree good examples help immensely, but the P2 pin cell modes are not infinite, so not many examples are needed.

Heater. · 2018-10-29 09:24

jmg,

I’m not sure those failures are so much due to HW/SW boundary issues, but more due to the fundamentals of critical mass.

Warning-- Wall of text. Skip to end for conclusion....

Certainly Intel had a huge installed base in their x86 so the i860 was going to be a tough sell.

The i860 was a fabulously fast device. On paper at least. It was RISC, it had a 4 stage pipeline, it could operate on integers and floats at the same time (dual-instruction mode). It had DSP like instructions, multiply and accumulate for example.

Problem was it was impossible to program it using all those features to reach peak performance:

In order to make use of the pipelines (one for integer ops and one for floats) the programmer had to manage them. When doing an ADD for example the result would appear not in the destination register of the ADD instruction, rather it would appear in the destination register specified in an instruction 4 lines further down your assembler source. This made things very confusing for the assembler programmer.

In order to keep both integer and float units busy, dual-instruction mode, one had to write two columns of instructions in assembler, integer ops on the left, float ops on the right. The float instructions having the same pipeline delays to think about.

All this made assembler programming very hard. As I learned during a one day i860 training session at Intel. At the end of the day the Intel guy admitted that Intel itself did not know how to hit the peak FLOPS performance of the i860 during an FFT (Another company figured it out later).

Of course the hope was that compilers would take care of all that complex code generation, ordering things nicely for the pipes and keeping the execution units busy. That turned out not to be possible.

This "impedance mismatch" between the instruction set architecture and the capability of the assembly language programmer or compiler meant the i860 performance sucked. The world had no need for it and continued with it's x86, the i860 died.

Intel repeated this "impedance mismatch" mistake again with the Itanium. This time a VLIW machine that compilers can't deal with efficiently.

Of course that is all about high end general purpose computing, the requirements of tiny micro-controllers and DSP are another story.

With thoughts of instruction set vs programmer/compiler impedance mismatch in mind, my claim is that most potential P2 users are not going to want to get familiar with it's 100's of "weird" instructions and hardware features. In order to deliver the P2 performance to them there will have to be a lot of ready made off the shelf libraries and objects to make use of and show off it's capabilities.

potatohead · 2018-10-29 13:21

Oh, I don't know Heater. The way SPIN is shaping up...

CoolfeatureMethod (x, y, z) | I, j, k

ASM

[Spiffy P2 feature packaged for easy use]

ENDASM

Same with C, because HUBEXEC.

= basically key words and a quick demo and ideally a commented use case.

A whole lot will get done just that way. And the difference between that and a library, in terms of use? Not much, amd often less, IMHO.

The thing was made for that all to be the case. Soon, we shall see.

kwinn · 2018-10-29 15:41

Heater. wrote: »

potatohead,
......
That kind of worries me though. Not many MCU users will want to be getting there heads around the hundreds of obscure P2 instructions and related hardware.

There had better be enough P2 nerds to wrap all that PASM stuff into libraries/objects people can use.

I don't think that will be a problem. I'm confident the folks on this forum will take advantage of in-line PASM and PASM running in cogs to create a lot of useful libraries, objects, and useful general code.

BTW Heater I think you need to grab a snack. You must be hungry because you keep mentioning "supper" in your posts....or maybe you have a bouncy P key on your keyboard ;-)

Heater. · 2018-10-29 16:21

kwinn,

Odd that. I fixed the "supper" here. I found one other misspelled "super" by me on the forum, from 2011. Fixed now. There is another example here not by me.

Then there is this:

Thanks.

Thinking on the other comments.... got to go and get supper.

MIchael_Michalski · 2018-10-29 18:25

The way I understand it, Itanium was supposed to shift the burden for scheduling instructions from the hardware to the compiler. My take on that is, they released half a CPU, and said "Someone else will make the rest of it, just go ahead and throw it in some servers and workstations" Of course those awesome compilers never came. Sure there was AMDs x64 extensions to the x86 cpus. That didn't help. Then there was Microsoft. When x64 came out, Microsoft publicly stated that they were going to support THAT one, but under no circumstances would they support a second ,incompatible 64 bit version of x86. (HUGE win for AMD, it gave them a lot of credibility. ) But really, I think the biggest problem was, they created a CPU that didn't actually ship with a working mechanism for exploiting instruction level parallelism. They expected someone else to do the heavy lifting and no one ever did. Maybe it wasn't possible, or maybe no one bothered, but it didn't matter. In fact, I think they jumped the gun on the whole project. Before the Itanium ever hit production,they should have had that compiler running efficiently rather than just assuming that it would be easy. At a certain point, when the design was nailed down, they should have simulated it,and gotten that compiler working. You don't need real hardware to do that. You just have to know that the compiler running on the simulated hardware is working at its peak simulated theoretical efficiency (regardless of whether that is a 1,000,000,000ips, or 10ips.) If you cant get that to work, put the brakes on, full stop, until you can. Because if you cant, then you end up doing just what they did, shipping half a CPU.

Think about Transmeta. Imagine if they had made their CPU, put it in a laptop, discussed how awesome the "Code morphing" concept was, and then shipped it out promising that "once someone wrote that code morphing layer, its going to be awesome!" (Yes, Transmeta wasn't really that competitive, that was for other reasons)

kwinn · 2018-10-29 19:26

MIchael_Michalski wrote: »

The way I understand it, Itanium was supposed to shift the burden for scheduling instructions from the hardware to the compiler. My take on that is, they released half a CPU, and said "Someone else will make the rest of it, just go ahead and throw it in some servers and workstations" Of course those awesome compilers never came. ...

What they really needed to do was to develop the hardware and compiler in a parallel and iterative style. Then the feedback between the two groups could have morphed both the hardware and compiler into an optimal duo. IMO the fact that Chip and the contributors on this forum exchanged so many ideas is what has resulted in the P2 becoming the powerhouse chip that it is.

the_yonggrand · 2018-11-06 13:17

I have recently obtained a small Risc-V board from Kendryte, China, and it is pretty fast and strong enough to even do facial recognition.

https://hackaday.io/project/162174-kendryte-k210-development-tutorial-for-windows-10

Though having these, I do not intend to abandon ESP32 or even the other microcontrollers as I work with all of these to fit my usage.

Heater. · 2018-11-06 13:55

the_yonggrand,

Wow, what?!!

Never heard of this new kid on the block. Can't find anywhere where it mentions RISC V but the specs are amazing:

Litchi Dan uses the AI chip k210 of Jianan Kanzhi Technology as the core unit, dual-core processing with independent FPU,
64-bit CPU bit width, 8M on-chip SRAM, 400M adjustable nominal frequency, support for multiplication, division and square
root operation. Double precision FPU. High-performance core processing features are exposed.

In the AI processing, k210 can perform operations such as convolution, batch normalization, activation, and pooling. At the
same time, the pre-processing of voice direction scanning and voice data output can also be performed.

Litchi Dan has the following abilities

Digital Video Interface (DVP)
Universal Asynchronous Receiver Transmitter (UART)
Direct Memory Access Controller (DMAC)
Integrated circuit built-in bus (I2C)
Serial Peripheral Interface (SPI)
Integrated circuit built-in audio bus (I2S)
Joint Test Working Group (JTAG)
Field Programmable IO Array (FPIOA/IOMUX)

What’s amazing is that Litchi Dan also has

Neural Network Processor (KPU)
Audio processor (APU)
Fast Fourier Transform Accelerator (FFT Accelerater)
Advanced Encryption Accelerator (AES Accelerater)
Secure Hash Algorithm Accelerator (SHA256 Accelerater)
Fast Fourier Transform Accelerator (FFT Accelerater)

And here is RISC V stamped on the chip:
O1CN011BwA1iHZlOQF3Fl_!!272610009.jpg

the_yonggrand · 2018-11-06 14:05

Heater. wrote: »

the_yonggrand,

Wow, what?!!

Never heard of this new kid on the block. Can't dind anywhere where is mentions RISC V but the specs are amazing:

Litchi Dan uses the AI chip k210 of Jianan Kanzhi Technology as the core unit, dual-core processing with independent FPU,
64-bit CPU bit width, 8M on-chip SRAM, 400M adjustable nominal frequency, support for multiplication, division and square
root operation. Double precision FPU. High-performance core processing features are exposed.

In the AI processing, k210 can perform operations such as convolution, batch normalization, activation, and pooling. At the
same time, the pre-processing of voice direction scanning and voice data output can also be performed.

Litchi Dan has the following abilities

Digital Video Interface (DVP)
Universal Asynchronous Receiver Transmitter (UART)
Direct Memory Access Controller (DMAC)
Integrated circuit built-in bus (I2C)
Serial Peripheral Interface (SPI)
Integrated circuit built-in audio bus (I2S)
Joint Test Working Group (JTAG)
Field Programmable IO Array (FPIOA/IOMUX)

What’s amazing is that Litchi Dan also has

Neural Network Processor (KPU)
Audio processor (APU)
Fast Fourier Transform Accelerator (FFT Accelerater)
Advanced Encryption Accelerator (AES Accelerater)
Secure Hash Algorithm Accelerator (SHA256 Accelerater)
Fast Fourier Transform Accelerator (FFT Accelerater)

And here is RISC V stamped on the chip:
O1CN011BwA1iHZlOQF3Fl_!!272610009.jpg

It's RISC-V - https://cnx-software.com/2018/10/22/sipeed-m1-risc-v-computer-vision-module.

I have this on my hand. I'm trying its peripherals - their toolchain is slightly difficult to use but not more difficult than the ESP32 ones.

I got their I2S, GPIO and the PWM working with its FreeRTOS.

Next is to try testing its LCD module (?) and the new AI module too.

You can check some of my sample codes there: https://github.com/uncle-yong?tab=repositories

jmg · 2018-11-06 19:37

the_yonggrand wrote: »

I have recently obtained a small Risc-V board from Kendryte, China, and it is pretty fast and strong enough to even do facial recognition.

Wow, what did that module cost ? - I see it includes LCD and camera connectors.

jmg · 2018-11-06 19:39

Heater. wrote: »

the_yonggrand,

Wow, what?!!

Never heard of this new kid on the block. Can't find anywhere where it mentions RISC V but the specs are amazing:

Litchi Dan uses the AI chip k210 of Jianan Kanzhi Technology as the core unit, dual-core processing with independent FPU,
64-bit CPU bit width, 8M on-chip SRAM, 400M adjustable nominal frequency, support for multiplication, division and square
root operation. Double precision FPU. High-performance core processing features are exposed.

In the AI processing, k210 can perform operations such as convolution, batch normalization, activation, and pooling. At the
same time, the pre-processing of voice direction scanning and voice data output can also be performed.

Litchi Dan has the following abilities

Digital Video Interface (DVP)
Universal Asynchronous Receiver Transmitter (UART)
Direct Memory Access Controller (DMAC)
Integrated circuit built-in bus (I2C)
Serial Peripheral Interface (SPI)
Integrated circuit built-in audio bus (I2S)
Joint Test Working Group (JTAG)
Field Programmable IO Array (FPIOA/IOMUX)

Hmm, what's "Field Programmable IO Array " - is that like a small FPGA on the pins, like a Verilog-smart-pin ?

jmg · 2018-11-06 19:40

the_yonggrand wrote: »

I have this on my hand. I'm trying its peripherals - their toolchain is slightly difficult to use but not more difficult than the ESP32 ones.

I got their I2S, GPIO and the PWM working with its FreeRTOS.

What speeds can the UART/SPI/i2s work to ?

the_yonggrand · 2018-11-07 00:22

jmg wrote: »

the_yonggrand wrote: »

I have recently obtained a small Risc-V board from Kendryte, China, and it is pretty fast and strong enough to even do facial recognition.

Wow, what did that module cost ? - I see it includes LCD and camera connectors.

It's like $15 for the non wifi version. The slightly more expensive ones have WiFi and other stuff inside.

This is available at taobao, and hope it's available internationally: https://m.intl.taobao.com/detail/detail.html?id=578484113485

jmg · 2018-12-06 01:24

and more news on RISC-V

"The Microsemi division of Microchip will describe a five-core complex that it will embed in its PolarFire FPGAs by early 2020. The chip marks its first step in a plan to standardize on use of RISC-V."

https://www.eetimes.com/document.asp?doc_id=1334032

and this
NXP also just announced a chip (RV32M1) with a Cortex M0+, an M4F, and two RISC-V cores from the PULP project at ETH Zurich: a Zero-RISCY paired with the M0+, and a RI5CY paired with the M4F. It also has a ton of peripherals including Bluetooth LE 5.0, USB 2.0, uSDHC, RTC, ADC, CRC/AES/DES/SHA/RSA/ECC accelerator. https://github.com/open-isa-org/open-isa.org/blob/master/Reference Manual and Data Sheet/RV32M1DS_Rev.1.1.pdf

Plus there is a board with this chip that you can order for free (as far as I can tell NXP themselves are absorbing the cost), and they were giving away a lot of them today at the RISC-V Summit http://open-isa.org/order/

KeithE · 2018-12-06 05:23

SoftCPU winners...

1st place using SpinalHDL

https://github.com/SpinalHDL/VexRiscv

2nd place

Engine-V

https://github.com/micro-FPGA/engine-V

3rd place

https://github.com/PulseRain/Reindeer/blob/master/README.md

Creativity Prize

SERV - bit serial

https://github.com/olofk/serv

The tweet:

“Congrats to the winners of the @risc_v SoftCPU contest: Charles Papon with VexRiscv in 1st place, Antti Lukats with Engine-V in 2nd place, Changyi Gu with PulseRain Reindeer in 3rd place and Olof Kindgren with SERV for the Creativity prize. Thanks to all who participated! #riscv”

jmg · 2018-12-18 04:14

.. and it seems the Open Source, 32b Cores gets another member - MIPS

https://www.eevblog.com/forum/microcontrollers/mips-goes-open-source/

Tor · 2018-12-18 08:29

There are apparently still som patents involved, which may complicate issues for an open-source MIPS processor.

jmg · 2019-04-24 22:08

and I'll bump this with another RISC-V Board - claims :
"The new LoFive R1 features the latest SiFive FE310-G002 RISC-V MCU. This new G002 version of the FE310 adds an I2C port and a second UART over its predecessor, which makes it much easier to add sensor and peripheral interfaces. The PCB has been updated as well. It now has castellated edges in case you want to drop it on another board via SMT instead of through-hole. If the first LoFive was a little too "wild west" for you, LoFive R1 is a much friendlier introduction to the RISC-V ecosystem that seems to be growing in popularity on a daily basis"

https://groupgets.com/campaigns/533-lofive-r1

Price is OK, but it seems to need an expensive FTDI JTAG adaptor to program it....
With just 16k RAM, and a SPI_Flash, large code performance will be modest..

ersmith · 2019-04-24 23:37

Personally I think the P2 makes a nicer RISC-V board; lower raw performance perhaps, but very nice I/O facilities.

jmg · 2019-04-25 01:46

ersmith wrote: »

Personally I think the P2 makes a nicer RISC-V board; lower raw performance perhaps, but very nice I/O facilities.

Yes, P2 is way ahead on peripherals.
As for 'lower raw performance perhaps' that could be interesting to test.
The SiFive FE310-G002 RISC-V MCU has just 16k of RAM, otherwise it has to fetch from SPI flash.
With far more memory, the P2 and a JIT tool chain like you have working already, could actually run faster ?

RISC V ?

Comments