Is it time to re-examine the P2 requirements ???

jmg · 2015-03-05 12:10

potatohead wrote: »

From the RISC V FAQ
We plan to define more optional instruction set extensions for RISC-V beyond the ones we already have, including Packed-SIMD Instructions (P), Bit Manipulation (B), Decimal Floating-Point (L), and Transactional Memory (T). One goal for the RISC-V foundation is to manage development of these future standard instruction-set extensions.

The currently defined extensions to the base Integer (I) ISA are Multiply-Divide (M), Atomic (A), Floating-point in multiple precisions (F, D, and Q), and Compressed Instructions (C). .

Sounds great, so we come back in 2016, maybe 2017, and revisit this, when RISC-V is more complete, and we can also see what size it eventually compiles to. Maybe in time for a P4 ?

David Betz · 2015-03-05 12:13

jmg wrote: »

Sounds great, so we come back in 2016, maybe 2017, and revisit this, when RISC-V is more complete, and we can also see what size it eventually compiles to. Maybe in time for a P4 ?

By sometime in 2016 we should be able to play with both the P2 and the LowRISC chip. It will be interesting to compare them. :-)

jmg · 2015-03-05 12:26

David Betz wrote: »

By sometime in 2016 we should be able to play with both the P2 and the LowRISC chip. It will be interesting to compare them. :-)

True enough, and likely another generation of RaspPi to use as a small host as well...

evanh · 2015-03-05 12:29

Heater. wrote: »

Heck, even professional software teams will call in a specialist to deal with such things if they need it.

Back in the day ... I wanted to be able to throw copious amounts of floats at an ISR (in DOS) ... I certainly didn't expect to find anyone skilled at it nor anyone to recommend it. It was an industrial control app and my first time ever programming a PC.

This was all in the name of ease-of-coding. It was a 486 with a FPU and because I wasn't trying to win any awards and I wasn't about to put lots of effort into sorting out optimisations, 32-bit mode switching without ISR jitter and the likes, I decided that just using the FPU inside the ISR was the best answer.

I found a few snippets of assembly for saving and restoring the FPU state. After working out how to inline assembly code, and a bit of testing, it went smooth as. I was surprised it wasn't a common topic given how easy it ended up being.

rod1963 · 2015-03-05 12:32

What is to compare in regards to a cog based Prop and a LowRISC chip. One is basically a memory constrained microengine meant to do I/O grunt work and the other for general processing?

Heater. · 2015-03-05 12:35

jmg,

Nope, the results above serve to prove that it is very poor at handling calls,...

No. Are you deliberately ignoring the examples I posted. Including the call heavy recursive fibo() that ends up being smaller than PASM?

Looks like nice code, but they have deftly sidestepped a true recursive call, and instead used a loop with an exit.

If you look at the generated code for riscv and propeller they look remarkably similar. So who is sidestepping what here?

OK. Let's compile fibo() without optimization for both machines. What do we get:

RISCV : 28 thirty two bit instructions. Including two recursive calls to fibo()

Propeller : 36 thirty two bit instructions. Including two recursive calls to fibo()

Clearly the RISC V call over head is very good. After all this program does pretty much nothing except get called, call itself twice, and return.

See attachments.

...and still a work-in-progress on interrupts,

Perhaps a work in progress. For good reason. The Prop has no interrupts at all, are you saying the Prop "is not looking great for deeply embedded minion work"?

Err, the RISC-V docs actually say this (as well as all the other 'extensions' they mention )

Yes, that is true. My reason for saying 'No room to add anything "common sense"' was that sure you can use extensions to do what you like but you are only making a more complex device out of a simple one. And it would no doubt be totally incompatible with anything else. Sure someone can do it if they want to build the language tools to support it. I don't believe it's viable to try and turn a RISCV into something as simple as PASM using extensions.

...and needs still more wrappers for Port io...

No, why? Port I/O can be memory mapped on the RISCV as it is on the Propeller and many other devices without any impact on the instruction set or the need for wrappers.

Heck, even things like WAITPE can be done by a normal read from memory and having the hardware stall execution until the input condition is met.

...and Boolean work...

That might be the subject of my next tests....

Wait up a minute, Chrome is having another spasm where it won't let me choose any files to attach....

Edit: One Chrome restart later and here are the attachments.

David Betz · 2015-03-05 12:39

rod1963 wrote: »

What is to compare in regards to a cog based Prop and a LowRISC chip. One is basically a memory constrained microengine meant to do I/O grunt work and the other for general processing?

No, the LowRISC chip will also have minion processors to do the I/O grunt work as well as a hefty dual core CPU to run program logic.

Heater. · 2015-03-05 13:00

rod1963,

One is basically a memory constrained microengine meant to do I/O grunt work and the other for general processing?

I always thought so too.

Recent experiments point to the fact that RISCV might even be better at the memory constrained micro-engine thing than PASM. Lower memory foot print. Low call over head.

I can't help but investigate this comparison further...

Heater. · 2015-03-05 13:03

@David,

Have you read the documentation on GCC inline assembler?

Indeed I have. It starts like this:

asm("movl %ecx %eax");

And immediately progresses to this:

__asm__ ("movl %eax, %ebx\n\t"
          "movl $56, %esi\n\t"
          "movl %ecx, $label(%edx,%ebx,$4)\n\t"
          "movb %ah, (%ebx)");

I rest my case.

@jmg

..so we come back in 2016, maybe 2017, and revisit this, when RISC-V is more complete, and we can also see what size it eventually compiles to. Maybe in time for a P4 ?

I do hope I'm very wrong but there is a good chance that a RISCV machine arrives first before a P2 never mind a P4.

True enough, and likely another generation of RaspPi to use as a small host as well...

Ha, ha, I have a sneaking suspicion that the Low RISC RISCV machine may actually be a future Rasperry Pi. Especially since two of the guys working on LowRISC are intimately connected with the Raspberry Pi Foundation. And one of them has described LowRISC as "A Raspberry Pi for grown ups". Remember you read that here first

@evanh

Back in the day ..

Oh yeah!, back in the day I figured out how to get a 386 running good old 16 bit DOS to use 32 bit integer operations at will. You just have to know where to put the 32 bit mode prefix bytes in front of your arithmetic op codes and such. Then make a little assembler language macro preprocessor to insert those automatically for you. Happy hacking days.

evanh · 2015-03-05 13:04

Heater. wrote: »

Recent experiments point to the fact that RISCV might even be better at the memory constrained micro-engine thing than PASM. Lower memory foot print. Low call over head.

LMM is at a distinct disadvantage. HubExec should solve that.

evanh · 2015-03-05 13:06

Heater. wrote: »

Oh yeah!, back in the day I figured out how to get a 386 running good old 16 bit DOS to use 32 bit integer operations at will. You just have to know where to put the 32 bit mode prefix bytes in front of your arithmetic op codes and such. Then make a little assembler language macro preprocessor to insert those automatically for you. Happy hacking days.

So, then, you are saying you are the specialist that the pros all call?

David Betz · 2015-03-05 13:06

Heater. wrote: »
@David,

Indeed I have. It starts like this:
asm("movl %ecx %eax");
And immediately progresses to this:
__asm__ ("movl %eax, %ebx\n\t"
          "movl $56, %esi\n\t"
          "movl %ecx, $label(%edx,%ebx,$4)\n\t"
          "movb %ah, (%ebx)");
I rest my case.

Those are minor syntactic issues. Yes, they are ugly but they don't stand in the way of getting things done. There are no difficult concepts here, just instructions in double quotes with line terminators. Not so difficult even if it isn't very attractive.

jmg · 2015-03-05 13:08

David Betz wrote: »

No, the LowRISC chip will also have minion processors to do the I/O grunt work as well as a hefty dual core CPU to run program logic.

Will be interesting to see how that plays out, and how much memory they allocate to each Minion, and what 'extended' extras they add The RISC V does need a good chunk of fast local memory to work best.

jmg · 2015-03-05 13:18

Heater. wrote: »

Ha, ha, I have a sneaking suspicion that the Low RISC RISCV machine may actually be a future Rasperry Pi. Especially since two of the guys working on LowRISC are intimately connected with the Raspberry Pi Foundation. And one of them has described LowRISC as "A Raspberry Pi for grown ups". Remember you read that here first

Now that would be very interesting to see !! - but don't underestimate how long these things take.
Microchip has a massive errata on PIC32, and sometime in 2015 expect a fix-pass. - and they are experts at this stuff.

A challenge for Rasperry Pi, will not so much be in doing a LowRISC, but in slotting it into an already established lineup. Their pockets are not bottomless, and the MASK/NRE costs for the leading edge processes are eye-watering.
That means a LowRISC will be some process nodes behind the Broadcom alternative.
- Unless Broadcom fund this, of course.... but even they are unlikely to jump straight to a leading edge node ?

Heater. · 2015-03-05 13:28

@evanh,

LMM is at a distinct disadvantage. HubExec should solve that.

My demonstrations use native in COG code not LMM. It's an even playing field in that respect.

...are saying you are the specialist that the pros all call?

Strangely enough, there was a time when that was true.

@David,

Those are minor syntactic issues.

I would call them major issues. As I said you have been doing this so long and are so familiar with it that you cannot see the problem any more.

Intel assembler is complex, irregular and ugly enough already. Then it's wrapped up in "minor syntactic issues".

Don't expect more than a hard core few to be attracted to going anywhere near that.

PASM, embedded in DAT blocks is orders of magnitude easier to deal with.

@jmg

The RISC V does need a good chunk of fast local memory to work best.

How can you continue to draw that conclusion in light of the evidence presented? Which shows RISCV requiring less memory than even PASM?

What am I missing here?

David Betz · 2015-03-05 13:36

Heater. wrote: »

@David,

I would call them major issues. As I said you have been doing this so long and are so familiar with it that you cannot see the problem any more.

Intel assembler is complex, irregular and ugly enough already. Then it's wrapped up in "minor syntactic issues".

Don't expect more than a hard core few to be attracted to going anywhere near that.

PASM, embedded in DAT blocks is orders of magnitude easier to deal with.

Okay, I give in. PASM is perfect. The P1 architecture is perfect. P2 will be even more perfect if that is possible. I still don't believe that a few quote characters make something unusable. Even the complex addressing modes don't make assembler harder compared to explaining self-modifying code to a beginner. I maintain that you are so familiar with PASM that you don't see the problem anymore. :-)

Heater. · 2015-03-05 13:37

jmg,

I'm sure you are right. The LowRISC is still a "high risk" in that respect.

On the other hand Micro chip may well be experts at this stuff but so are the guys working on LowRISC. And then look at how Andreas Olofsson put together the multi-core floating point accelerator, the Epiphany chip, with only a very small team. It's amazing what people can do.

I don't recall who is funding the LowRISC, I'm sure it's not Broadcom. But given the connections going on in Cambridge it would not surprise me if Broadcom adopted the LowRISC and churned it out themselves. And there it is, a new Raspberry Pi

rod1963 · 2015-03-05 13:44

The only way to resolve the RISCV arguments is to field test one of them with real world applications, which isn't possible yet. That's the problem with bleeding edge, it ain't there.

The best we can do right now is simply compare a Cog against another embedded controller ISA's such as PIC32MX(not the MZ - that thing is a smelly mess and software suite isn't ready for prime time) - like that Micromite another poster mentioned or Freescale S12 or a LPC824 with a M0+ core.

Yeah they aren't sexy and cool like the RISCV but hey they are out and accessible.

Dave Hein · 2015-03-05 13:44

David Betz wrote: »

I still don't believe that a few quote characters make something unusable.

I agree. "I %s %s anything %s with %s syntax %s", "don't", "see", "wrong", "the", "either"

Heater. · 2015-03-05 13:49

David,

Ha, ha, touch

Heater. · 2015-03-05 13:53

rod1963,

The only way to resolve the RISCV arguments is to field test one of them with real world applications, which isn't possible yet. That's the problem with bleeding edge, it ain't there.

You are quite right. It's all speculation.

And the P2 has this problem that it is not here yet either. That is why we are all here speculating:)

David Betz · 2015-03-05 14:07

Heater. wrote: »

David,

Ha, ha, touch

jmg · 2015-03-05 14:19

Heater. wrote: »

["The RISC V does need a good chunk of fast local memory to work best. "}

How can you continue to draw that conclusion in light of the evidence presented? Which shows RISCV requiring less memory than even PASM?

That comment relates to their fundamental Opcode design, nothing to do with any "less memory" claims..
It is a result of their 12b operand field, and I consider that a good feature. (as I've said before)
That fast local memory is essential for the SW stack operations to work well, and it also gives quick access to a working variable pool.
Personally, I would also add register bank switching, or maybe even a register frame pointer (as in Z8 and XC166)
- time will tell what finally makes it to silicon.

jmg · 2015-03-05 14:26

Heater. wrote: »

...
I wonder why they don't just go straight to in line x86 assembler and GCC?

Let's see how many takers we have for in line assembler in prop-gcc in the future.

? Most tools I use here allow in-line assembler.
Do I use it all the time, in every function ? Of course not.
Am I glad the feature is there. Absolutely.
We are even looking at generating the in-line syntax on an automated basis & inserting via an Include file.

Heater. · 2015-03-05 16:23

porcupine,

That is a very good point. I have not started to follow up on LowRISC yet so I'm not really sure what market they are targeting.

There are a great many users of Raspberry Pi and similar ARM boards, not to mention the millions of lesser ARM boards for embedded systems that do not require any video capability.

To be clear RISC V is nothing to do with Open Source. RISC V is only a specification of an instruction set architecture. Yes it is an open standard with no hassles about licensing or royalties etc but that is as far as it goes. RISC V says nothing about how open or otherwise your your implementation should be.

But, there is that Broadcom connection in the back of my mind. These guys all hang out in Cambridge. They all know each other. Some work at Broadcom. In the same way that Pi exists because someone convinced Broadcom to put that tiny ARM core in corner of their old stand alone GPU chip (the GPU is much bigger) it could happen that someone convinces broadcom to put the LowRISC core there instead.

Bingo a RISC V Raspberry Pi with GPU and other peripherals that are totally compatible with the current Pi. Perfect.

This is all wild speculation of course. But it all connects together nicely.

Heater. · 2015-03-05 16:31

koehler,

Here we go, code sizes when compiling the main butterfly function of the fftbench for Propeller and RISC V.

The Propeller compile is for LMM but enabling fcache, this means the big inner loop of butterfly() is loaded to COG and run at full speed. That thing runs nearly as fast as the hand crafted PASM version.

$ propeller-elf-gcc -std=c99 -Os -mfcache -S -o fftbench.S fftbench.c

249 instructions

$ riscv64-unknown-elf-gcc -std=c99 -Os -m32 -S -o fftbench.S fftbench.c

105 instructions

Holy cow Batman. Looks like if the COG used the RISC V instruction set we could stuff more than twice the functionality into the same space!

Or looked at another way, that FFT would run more than twice as fast as it only has to execute half as many instructions. (Assuming those instructions can be dispatched at the same rate as PASM)

This is extreme, I must be doing something wrong.

See attachments for the output assembler listings.

potatohead · 2015-03-05 16:44

One obvious RISC V savings is a hardware multiply.

Heater. · 2015-03-05 17:06

potatohead,

Could well be.

I can't find a way to tell GCC for RISC V to use software multiply. I hope they have allowed for the compiler to work with the base instruction set, no MULL and DIV, and are not relying on a TRAP on the multiply instructions that vectors of to a multiply handler.

Heater. · 2015-03-05 17:16

No, wait a minute.

The the RISC V is using it's hardware multiply instruction. Which gives it an edge on code space admittedly.

But there are only 3 multiplies in the butterfly() function. The Propeller code for butterfly() contains 3 calls to a multiply routine. So the only code space overhead in that respect is getting the operands into and out of that multiply function.

This does not come close to explaining the huge difference in code sizes between Prop and RISC V.

potatohead · 2015-03-05 17:27

Yep.

I think a big part of it is having three register operands. Takes two PASM instructions to do some things that can happen in one in RISC V.

The mul is what? 10-20 instructions, plus overhead, so maybe 30 tops? Still a big gap.

Is it time to re-examine the P2 requirements ???

Comments