Prop2 Interactive debugger

David Betz · 2017-04-01 00:10

evanh wrote: »

Nice piccys solves all. :P

On the other hand, I was trying to make a graphic for the egg-beater the other day but gave up. It's as much a visual tangle as it is to imagine. It'll need an animation I suspect.

Actually, the egg beater seems pretty easy to understand to me. Just 16 memories like the single memory in P1 where each COG gets access to one of them on each clock. The concept of SKIP is also pretty easy to understand. It's just the application that can lead to nearly opaque code especially with variable bit masks. Maybe there could be some way to automatically generate SKIP sequences from a bunch of separate straight-line code.

potatohead · 2017-04-01 00:15

I see it this way.

A ton of this stuff can be ignored. People can take the most useful things and approach them fairly easily.

SPIN is going to include in line for this reason, and for general utility and performance.

We have 16 COGS. It's going to be possible to take a feature and use it without a lot of baggage.

Do it P1 style, or do it as a method, or inline-block in SPIN.

Streamer is a great case here. Smart Pins too.

COG code remains easy for a ton of cases.

Interrupts are reasonably presented. The important thing is they are COG features. It won't be necessary, a lot of the time, to manage interrupts and other time dependent code because of how the COGS work.

Also, because of how the COGS work, P1 methods, polling, waiting, are there too.

HUB code is harder. There are more considerations.

People may not write much HUB code, or it will be in line, or compiled. Fine.

Now, there are specific tasks people want, and those are likely COG tasks. Drivers, SPIN, other performant code, can get complex, dense, but also associated with a COG and used no different from P1 code.

Reuse, with well written PASM is good.

Overall, we still have an awful lot of what makes it all pretty easy. We do have some advanced things in there now too.

A néw user has a very good chance of Monkey see monkey do type learning as well as a potential introduction aimed right at the easy end of the pool.

We want and need that.

As they get good, there is a lot of depth and possibilities for them.

We want this too, but it won't show for a while. When we do, it's going to be possible to really squeeze a lot of out of this design.

.

David Betz · 2017-04-01 00:19

potatohead wrote: »

I see it this way.

A ton of this stuff can be ignored. People can take the most useful things and approach them fairly easily.

SPIN is going to include in line for this reason, and for general utility and performance.

We have 16 COGS. It's going to be possible to take a feature and use it without a lot of baggage.

Do it P1 style, or do it as a method, or inline-block in SPIN.

Streamer is a great case here. Smart Pins too.

COG code remains easy for a ton of cases.

Interrupts are reasonably presented. The important thing is they are COG features. It won't be necessary, a lot of the time, to manage interrupts and other time dependent code because of how the COGS work.

Also, because of how the COGS work, P1 methods, polling, waiting, are there too.

HUB code is harder. There are more considerations.

People may not write much HUB code, or it will be in line, or compiled. Fine.

Now, there are specific tasks people want, and those are likely COG tasks. Drivers, SPIN, other performant code, can get complex, dense, but also associated with a COG and used no different from P1 code.

Reuse, with well written PASM is good.

Overall, we still have an awful lot of what makes it all pretty easy. We do have some advanced things in there now too.

A néw user has a very good chance of Monkey see monkey do type learning as well as a potential introduction aimed right at the easy end of the pool.

We want and need that.

As they get good, there is a lot of depth and possibilities for them.

We want this too, but it won't show for a while. When we do, it's going to be possible to really squeeze a lot of out of this design.

.

I think this same thing could be said about any of the processors we have been criticizing for their complexity. If you bury the ugly parts in libraries or VM implementations or whatever then the rest is pretty easy to understand.

ersmith · 2017-04-01 00:33

David Betz wrote: »

I think this same thing could be said about any of the processors we have been criticizing for their complexity. If you bury the ugly parts in libraries or VM implementations or whatever then the rest is pretty easy to understand.

I very much agree with this. The original Propeller is a beautiful, elegant design -- Chip, you really made a work of art with it! P2 is far more complicated, and the complexity hasn't stopped growing

. Every individual piece of that complexity makes sense -- interrupts let us do certain things more easily, SKIP makes us do other things more easily, etc. But it all adds up to something that seems somehow less coherent than the P1. And really, what have we gained by this extra complexity? Is it really worth the extra time and expense? Maybe, maybe not... it's all kind of moot now with respect to what's already behind us. Going forward, though, I'd *really* urge restraint. At some point an ARM or RISC-V based solution will be just as simple, and far easier to sell

.

ozpropdev · 2017-04-01 00:36

I don't see any part of P2 as "ugly".
The only time I have needed to use that term around P2 is in reference to some of my solder joints/wiring.

David Betz · 2017-04-01 00:49

ozpropdev wrote: »

I don't see any part of P2 as "ugly".
The only time I have needed to use that term around P2 is in reference to some of my solder joints/wiring.

Try reading code written using SKIP. That will be ugly even if the feature itself seems elegant.

Rayman · 2017-04-01 00:49

Skip is something new for prop
Seems I barely understand jmp so might not be using skip anytime soon

ke4pjw · 2017-04-01 01:25

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

ozpropdev · 2017-04-01 01:29

David Betz wrote: »

Try reading code written using SKIP. That will be ugly even if the feature itself seems elegant.

Maybe you haven't noticed the title of this thread or perhaps read the previous posts relating to testing SKIP.
I have been immersed in testing this stuff and currently am adding features to this debugger code to assist in "reading" SKIP code.

Sure, some of the posted examples look complicated but we are testing extreme/weird cases to try and break things. Isn't that the purpose of this whole FPGA program?

The example that inspired Chip to add this feature is logical and elegant to me.
Maybe Chip should have kept this feature "hidden" and surprised us all later.

You don't have to use the "ugly" stuff, the "clean P1" stuff has been there since FPGA V1.

David Betz · 2017-04-01 01:30

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Maybe sausage is an apt metaphor. Lots of ugly things that when assembled together make something tasty that isn't good for you. :-)

evanh · 2017-04-01 01:34

Not allowed to use that excuse, Oz! Not that I have anything against SKIP or any of these extras but it's like saying you're free to walk from civilisation any time you like.

potatohead · 2017-04-01 01:35

I'm not saying that exactly. We can for sure do it that way David.

But we don't have to at all.

On other processors, there is no COG execute and memory isolation.

If SPIN is faster, and compiled will be for sure, a lot of PASM will end up in COGS like it does now. COG code is pretty easy still.

A whole lot about what makes a Propeller distinctive is COG assembly code. P2 still looks pretty good on that front.

Because there are more COGS, people will be able to pick a feature and use it. That is the best, easy case.

In some ways, I'm reminded of 6809. Write it easy. Is it small and fast enough? Done. That is the test code. Then write it again and test. Often possible to double performance and half size.

P2 looks very similar. An obvious, easy solution is out there, sort of like how a lot of people just ignored the mux instruction on P1. That's a test. Now the cool part. The rewrite can be done on another COG. A / B test, and shrink size as well as improve performance.

Super cool. Streamer, pins, interrupts...

And again, dedicate a COG to it, and a lot of the pain of larger assembly code, and tasking goes away.

This COG isolation is a very nice thing. Helps limit the scope of what has to happen in order to make effective use of PASM.

Big difference, again, is on P2, that isn't a hard limit like it was on P1.

Tons of people wanted that. So now we got it.

jmg · 2017-04-01 01:40

Rayman wrote: »

But now, I think I see that I could have "SUB it,#1 wz" followed by an "if_nz JMP #\somewhere".
Either way wouldn't be relocatable to hub though, I think...

Relocatable depends.
Systems that have linkers, use fixups in the linker and the OBJ modules have offset placekeepers that are then added to the final, absolute address. The only requirement is that the opcode used, is one large enough to fit the eventual jump.

In systems without linkers, you can still relocate, provided the whole source is rebuilt by the assembler.

The appeal of a short, relative jump is none of that linker fixup stuff needs doing.
That is zero-effort relocatable

To allow more use of short-rel jumps, more passes in the assembler can help.

David Betz · 2017-04-01 01:41

Yup, we got kind of a kitchen-sink processor with lots of random features thrown in. Sort of reminds me of lots of other processors. I notice that Atmel has some simple AVR chips with some level of programmable logic in them and peripherals that can run independently of the main CPU. I guess people are experimenting with lots of hybrid solutions these days. It will be interesting to see which approach is most successful.

evanh · 2017-04-01 01:50

potatohead wrote: »

This COG isolation is a very nice thing. Helps limit the scope of what has to happen in order to make effective use of PASM.

I feel the nice part was the independence - as in hard multitasking - rather than isolation.

potatohead · 2017-04-01 01:53

Yeah. Me too. I think you are saying it better.

David Betz · 2017-04-01 02:03

ozpropdev wrote: »

David Betz wrote: »

Try reading code written using SKIP. That will be ugly even if the feature itself seems elegant.

Maybe you haven't noticed the title of this thread or perhaps read the previous posts relating to testing SKIP.
I have been immersed in testing this stuff and currently am adding features to this debugger code to assist in "reading" SKIP code.

Sure, some of the posted examples look complicated but we are testing extreme/weird cases to try and break things. Isn't that the purpose of this whole FPGA program?

The example that inspired Chip to add this feature is logical and elegant to me.
Maybe Chip should have kept this feature "hidden" and surprised us all later.

You don't have to use the "ugly" stuff, the "clean P1" stuff has been there since FPGA V1.

It seems to me that there is something wrong with a design where you can't tell what a piece of code is going to do until you see it in a debugger.

potatohead · 2017-04-01 02:48

In the example chip gave, you can tell without a debugger.

Of course people can make pathological cases with runtime choices. Not advisable.

Hey, pass the random number generator to skip, fun times!

ozpropdev · 2017-04-01 03:15

potatohead wrote: »

In the example chip gave, you can tell without a debugger.

Of course people can make pathological cases with runtime choices. Not advisable.

Hey, pass the random number generator to skip, fun times!

+1 AND

The way I see it, this SKIP feature will make SPIN2 "real quick" as well as it being finished "real quick".
Surely a win/win scenario.

cgracey · 2017-04-01 03:54

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Ah, yes... Cuato.

jmg · 2017-04-01 04:17

Here is another useful target test for ByteCode engine design :

https://en.wikipedia.org/wiki/List_of_CIL_instructions

Looks to have basic types of (using bytecount suffix, and prefix of int,uint,real)
i1,i2,i4,i8 , u1,u2,u4,u8, r4, r8

of those, only the 8 byte vars are not native P2,

JRetSapDoog · 2017-04-01 06:47

cgracey wrote: »

And it should be, within an order of magnitude, as fast as PASM, but with fractional user code size.

Am I figuring right below about the speed of SPIN2 vs SPIN or just dreaming?

I think I read that, on the P1, SPIN is about 40X slower than PASM on average. But from Chips post, it sounds like SPIN2 closes the gap, coming in at roughly 10X slower than PASM2. That's a four-fold increase in relative speed (since PASM:SPIN=40X, PASM2:SPIN2=10X, and 40X/10X=4X). That's assuming that an "order of magnitude" is roughly 10 instead of something like 30.

Now the P2 is easily 4X faster than the P1 (though it could be a lot faster than that for some apps due to the Lazy Susan memory and so on). I think 4X is the bare minimum, but it's conservative.

If all that's correct, then that would seem to make SPIN2 at least 4*4 = 16X faster than the current SPIN (in head to head comparison). And I wouldn't be surprised if it's even faster than that. Sounds like SPIN2 is going to fly!

cgracey · 2017-04-01 07:23

jmg wrote: »

Here is another useful target test for ByteCode engine design :

https://en.wikipedia.org/wiki/List_of_CIL_instructions

Looks to have basic types of (using bytecount suffix, and prefix of int,uint,real)
i1,i2,i4,i8 , u1,u2,u4,u8, r4, r8

of those, only the 8 byte vars are not native P2,

It's interesting to look at that CIL bytecode description and see how they dedicated their bytecodes. Lots of similarities to Spin. Lots of stuff in there that would require lots more explanation.

I saw they dedicated single bytecodes to common constants. Also, common variable reads and writes.

Heater. · 2017-04-01 07:40

Hmm.. An order of magnitude faster would be 10 times faster. Two orders of magnitude would be 100. Etc.

It's all to do with the power of 10 in the factor.

So I guess an off the cuff order of magnitude speed gain is anywhere from about 7 to 70. Less than 7 is nothing and more than 70 is more like two orders. Or perhaps 5 to 50, Depends how you want to do the rounding.

jmg · 2017-04-01 09:29

cgracey wrote: »

It's interesting to look at that CIL bytecode description and see how they dedicated their bytecodes. Lots of similarities to Spin. Lots of stuff in there that would require lots more explanation.

I saw they dedicated single bytecodes to common constants. Also, common variable reads and writes.

Yes, common small values - I guess they had to fill the 256 somehow

I've found some rough numbers for eCLR on ARM MCUs which indicates ~ 256k, but much of that will be libraries.
This is enough to run PLC.iec61131 an C# compiler outputs.

Not sure what size the byte-code kernal is, but they have ported this to RaspPi, so it should be possible to get a build report ?

David Betz · 2017-04-01 11:06

cgracey wrote: »

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Ah, yes... Cuato.

It's just not as elegant and clean as we've come to expect of the Propeller.

cgracey · 2017-04-01 12:52

David Betz wrote: »

cgracey wrote: »

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Ah, yes... Cuato.

It's just not as elegant and clean as we've come to expect of the Propeller.

It's certainly not something that anyone is going to have a compiler take advantage of, but for those who code in PASM, it has value.

I've been dreaming about something like this for a long time. Many times, I'd look at similar code snippets when memory was tight and think that if there was just some way I could mix them together and select which instructions would execute in different cases, I could save lots of memory. The problem was that I kept picturing something like a list of nibbles that would direct the PC, which was not very efficient of an approach. Realizing that a string of bits, that lays right onto the instruction sequence, would do everything needed was kind of a revelation for me. Now I can not only write very compressed code, but it runs as fast as it ever could.

David Betz · 2017-04-01 12:58

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Ah, yes... Cuato.

It's just not as elegant and clean as we've come to expect of the Propeller.

It's certainly not something that anyone is going to have a compiler take advantage of, but for those who code in PASM, it has value.

I've been dreaming about something like this for a long time. Many times, I'd look at similar code snippets when memory was tight and think that if there was just some way I could mix them together and select which instructions would execute in different cases, I could save lots of memory. The problem was that I kept picturing something like a list of nibbles that would direct the PC, which was not very efficient of an approach. Realizing that a string of bits, that lays right onto the instruction sequence, would do everything needed was kind of a revelation for me. Now I can not only write very compressed code, but it runs as fast as it ever could.

It certainly is clever. In fact, a compiler might be able to make some use of it. We should try to find someone who will work on creating an LLVM backend for P2. I seem to recall that you were looking into LLVM at one point. Are you likely to work on it for P2?

cgracey · 2017-04-01 13:00

David Betz wrote: »

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Ah, yes... Cuato.

It's just not as elegant and clean as we've come to expect of the Propeller.

It's certainly not something that anyone is going to have a compiler take advantage of, but for those who code in PASM, it has value.

I've been dreaming about something like this for a long time. Many times, I'd look at similar code snippets when memory was tight and think that if there was just some way I could mix them together and select which instructions would execute in different cases, I could save lots of memory. The problem was that I kept picturing something like a list of nibbles that would direct the PC, which was not very efficient of an approach. Realizing that a string of bits, that lays right onto the instruction sequence, would do everything needed was kind of a revelation for me. Now I can not only write very compressed code, but it runs as fast as it ever could.

It certainly is clever. In fact, a compiler might be able to make some use of it. We should try to find someone who will work on creating an LLVM backend for P2. I seem to recall that you were looking into LLVM at one point. Are you likely to work on it for P2?

Personally, I think what I've already got on my plate could eat my whole life up, but I hope/suppose/expect that other people are going to be working on C.

David Betz · 2017-04-01 13:22

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

ke4pjw wrote: »

Seems lately, many in these threads don't like seeing the sausage being made. To those I say, Quade, open your mind!

Ah, yes... Cuato.

It's just not as elegant and clean as we've come to expect of the Propeller.

It's certainly not something that anyone is going to have a compiler take advantage of, but for those who code in PASM, it has value.

I've been dreaming about something like this for a long time. Many times, I'd look at similar code snippets when memory was tight and think that if there was just some way I could mix them together and select which instructions would execute in different cases, I could save lots of memory. The problem was that I kept picturing something like a list of nibbles that would direct the PC, which was not very efficient of an approach. Realizing that a string of bits, that lays right onto the instruction sequence, would do everything needed was kind of a revelation for me. Now I can not only write very compressed code, but it runs as fast as it ever could.

It certainly is clever. In fact, a compiler might be able to make some use of it. We should try to find someone who will work on creating an LLVM backend for P2. I seem to recall that you were looking into LLVM at one point. Are you likely to work on it for P2?

Personally, I think what I've already got on my plate could eat my whole life up, but I hope/suppose/expect that other people are going to be working on C.

Understood. I just asked because I think you expressed some interest a while back when someone pointed to a PDF book describing LLVM. Yes, I imagine it would be good to create a C compiler for P2 if Parallax education is going to continue using C. I'm not sure it's a good idea to dive back into GCC again though when LLVM seems to be taking over. Unfortunately, I don't know enough about LLVM to make much progress. We need to find a new compiler guy. Maybe Roy?

Prop2 Interactive debugger

Comments