HUB EXEC Update Here

Heater. · 2014-01-31 13:28

How does gcc and or catalina style compressed code compare to Spin byte codes? CMM or whatever it is called?

Given that there is no Spin byte code interpreter in ROM on the PII one could imagine adopting something completely different than the old bytecodes.

Is there any merit is compiling to byte/compressed code or native PASM to be decided on an object by object basis? Then no new syntax needs inventing, like CPUB/CPRI/CDAT/CVAR. Nothing would change in an objects source code to move it from being compiled to byte codes or native. I'm not sure where or how we specify the compile mode though.

jmg · 2014-01-31 13:28

@ Dave: The idea sounds good, but since then, GCC has support to direct selective code compiled to a COG, I think. (and has in line PASM as well )

Would it make sense to look at the controls in GCC, and apply the same/similar semantics to Spin, so there is not too much culture shock moving from one to the other ?

SRLM · 2014-01-31 14:54

Heater. wrote: »

How does gcc and or catalina style compressed code compare to Spin byte codes? CMM or whatever it is called?

Early benchmarks indicated the CMM to Spin was about 1:1 on size and 2:1 on speed (source)

SRLM · 2014-01-31 14:55

jmg wrote: »

@ Dave: The idea sounds good, but since then, GCC has support to direct selective code compiled to a COG, I think. (and has in line PASM as well )

GCC can support compiling an entire cog (direct), inlining LMM/CMM code, and inline code for the FCACHE (effectively inline cog code).

rjo__ · 2014-02-03 08:47

I'm using the Nano with the latest jic. Is there a list of commands that were excluded to squeeze the build into the Nano?
I saw speculation on the required changes but no confirmed list.

ozpropdev · 2014-02-03 16:37

rjo__ wrote: »

I'm using the Nano with the latest jic. Is there a list of commands that were excluded to squeeze the build into the Nano?
I saw speculation on the required changes but no confirmed list.

Rich, See here

Brian

rjo__ · 2014-02-03 17:31

Thanks Brian.

cgracey · 2014-02-04 16:30

I got these working the same way in all single-/multi-task modes:

REPS
<spacer instruction>
<REPS block>

REPD
<spacer instruction>
<spacer instruction>
<spacer instruction>
<REPD block>

These should be working, as well, after the current compile completes:

JMPD/CALLD/RETD
<trailing instruction>
<trailing instruction>
<trailing instruction>

These cog changes will make all code execute the same way, no matter the task mix, so that everything can be written for what was single-task mode. Now we'll write all future code using these spacer/trailer rules and it will run optimally in single-task mode, but still work in multi-task mode without any pipeline issues.

I hope to have an update out tonight.

Thanks for your patience.

Bill Henning · 2014-02-04 16:38

Sounds great, fewer "Duh" moments for everyone!

I am hoping to have some quality DE2-115 time in a couple of days

cgracey wrote: »

I got these working the same way in all single-/multi-task modes:

REPS
<spacer instruction>
<REPS block>

REPD
<spacer instruction>
<spacer instruction>
<spacer instruction>
<REPD block>

These should be working, as well, after the current compile completes:

JMPD/CALLD/RETD
<trailing instruction>
<trailing instruction>
<trailing instruction>

These cog changes will make all code execute the same way, no matter the task mix, so that everything can be written for what was single-task mode, only. Now we'll write all future code using these spacer/trailer rules and it will run optimally single-task mode, but still work in multi-task mode.

I hope to have an update out tonight.

Thanks for your patience.

Cluso99 · 2014-02-04 16:51

Fantastic news Chip. Life will be so much simpler this way!

Sapieha · 2014-02-04 16:54

Hi Chip.

Thanks

Looks good.

Now P2 will be user friendly (Compiler's to)

cgracey · 2014-02-04 16:56

Cluso99 wrote: »

Fantastic news Chip. Life will be so much simpler this way!

The only other pipeline issue regarding spacer instructions that I can think of is the write-before-execute issue, which must be coded with two spacers to work in single-task mode. In multi-task mode, zero, one, or two spacers may be needed, with two working for every single- and multi-task case:

ADD :i,#1
<spacer instruction>
<spacer instruction>
:i MOV OUTA,0

So, if these cases are coded with two spacers, they will work under any circumstance. I think that does it for unifying the coding rules across task mixes.

Thanks for sticking around, Guys. I appreciate your help and enthusiasm.

ozpropdev · 2014-02-04 17:35

@Chip

Nice work!

Did you manage to squeeze in the 3 additional REPx blocks?
Brian

jmg · 2014-02-04 18:47

cgracey wrote: »

I got these working the same way in all single-/multi-task modes:

REPS
<spacer instruction>
<REPS block>

REPD
<spacer instruction>
<spacer instruction>
<spacer instruction>
<REPD block>

These should be working, as well, after the current compile completes:

JMPD/CALLD/RETD
<trailing instruction>
<trailing instruction>
<trailing instruction>

These cog changes will make all code execute the same way, no matter the task mix, so that everything can be written for what was single-task mode. Now we'll write all future code using these spacer/trailer rules and it will run optimally in single-task mode, but still work in multi-task mode without any pipeline issues.

Great

I can see a safe mnemonic form for the first two (borrowing from how other DSPs manage this opcode
like this

REPSs  LoopCount, BlockStartAdr, BlockEndAdr 
<spacer instruction>
BlockStartAdr:
  <REPS block>
  <REPS block>
  <REPS block>
BlockEndAdr:


REPDs LoopCount, BlockStartAdr, BlockEndAdr 
<spacer instruction>
<spacer instruction>
<spacer instruction>
BlockStartAdr:
  <REPD block>
  <REPD block>
BlockEndAdr:

A matching 'safe' version of delayed exit is also worth having.
Perhaps something like this ?

  <preceding  instruction>
  <preceding  instruction>
JMPDs/CALLDs/RETDs DestAddress, ActualDepartureAdr 
  <trailing instruction>
  <trailing instruction>
  <trailing instruction>
ActualDepartureAdr :  ' actual departure point

  <other code>   
DestAddress:

in operation, Assembler does a simple sanity check on BlockStartAdr & ActualDepartureAdr and flags if wrong for that opcode. (just like other checks for Adr out of range)

cgracey · 2014-02-04 20:52

ozpropdev wrote: »

@Chip

Nice work!
Did you manage to squeeze in the 3 additional REPx blocks?
Brian

Ah, yes. I forgot to mention that each task has its own REPS/REPD circuit now.

jmg · 2014-02-04 21:07

cgracey wrote: »

Ah, yes. I forgot to mention that each task has its own REPS/REPD circuit now.

Nice. How much added-logic did that cost ? IIRC earlier comments had it not insignificant ?

cgracey · 2014-02-04 22:08

jmg wrote: »

Nice. How much added-logic did that cost ? IIRC earlier comments had it not insignificant ?

It added over 1,000 flipflops to the chip, which already has, probably 60,000. Not a big deal.

Heater. · 2014-02-04 22:08

This is great stuff Chip. Having such regularity in behaviour is a huge win.

Is it time to be freeze things up a bit? That next shuttle run is coming fast isn't it?

cgracey · 2014-02-04 22:10

Heater. wrote: »

This is great stuff Chip. Having such regularity in behaviour is a huge win.

Is it time to be freeze things up a bit? That next shuttle run is coming fast isn't it?

I think things are getting very near to "done".

Next, I want to add a CALLR instruction which writes the return address to a register, instead of a stack. The C compiler guys really want this for leaf functions. Spin could use it, too. It's handy for pulling arguments right after the CALLR, then jumping to the final register value.

Ale · 2014-02-05 01:18

Hei Chip,

I have sort of a general coding question, regarding verilog. You have added loads and loads of opcodes, they need arguments and you have huge fan-outs, and muxes for the results, how is it that it is so fast ?. (I mean 80+ MHz).
I got the idea of having more that one "opcode" register, so to say and then it will have smaller fan-outs, probably nothing new...

Thanks.

Edit: Maybe are the fpgas that fast... Lets see... I'll try to compile my (un-optimized and sub-par 6809) for the Cyclone V and see

(It can do 40 MHz in the MachXO2, and 67 MHz in the Spartan3E, it only has 8 & 16 bit paths, but many muxes

)

Edit: It can do 90 MHz on the cyclone V. (5CEFA2F23C8N).

ctwardell · 2014-02-05 06:38

cgracey wrote: »

I think things are getting very near to "done".

Next, I want to add a CALLR instruction which writes the return address to a register, instead of a stack. The C compiler guys really want this for leaf functions. Spin could use it, too. It's handy for pulling arguments right after the CALLR, then jumping to the final register value.

Is there still a possibility of adding the non-hub flags and increasing the locks?

http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1236830&viewfull=1#post1236830

Thanks,

Chris Wardell

mindrobots · 2014-02-05 07:02

Great stuff, Chip!

The consistent operation will be a big help!

So, before the first P2 developer boards are made, is there going to be a signature sheet passed around so anyone that contributed a feature suggestion or helped with the development and testing and sign it. How cool would it be to have an autographed mask on the board with Chip's signature and all the contributor's signatures?

bartgrantham · 2014-02-05 10:56

cgracey wrote: »

Ah, yes. I forgot to mention that each task has its own REPS/REPD circuit now.

That's fantastic! Having as much of the instruction set be task-agnostic in its usage is super helpful. I can imagine this resulting in a library of tight and simple background task code that can be quickly mixed in with larger cog programs, or used on its own.

Dave Hein · 2014-02-05 11:33

cgracey wrote: »

Next, I want to add a CALLR instruction which writes the return address to a register, instead of a stack. The C compiler guys really want this for leaf functions. Spin could use it, too. It's handy for pulling arguments right after the CALLR, then jumping to the final register value.

Instead of a CALLR instruction maybe it would be useful to have an instruction that sets a register location that is written to when any of the CALL instructions are used. It would be something like "SETRETREG register_number". This would cause all of the CALL instructions to write the return address to the designated register instead of writing it to the return stack. Does this make sense, or would it cause confusion? I think this would satisfy the requirement for the C compiler. David Betz, what do you think?

EDIT: I thought about it a bit more, and this might cause some problems when trying to mix code that expects to use the stack for the return address. So a CALLR instruction might be better as long as it works with both COG addresses and hub addresses.

Bill Henning · 2014-02-05 11:39

Sorry, that would be confusing, and might interfere with the other modes of CALL operation - the intent is to NOT interfere with them, but provide something David/Eric/you need.

I think Chip will either pick $1F1 as the link register, or provide a SETLR (or as you called it, SETRETREG) for the CALLR instruction.

Dave Hein wrote: »

Instead of a CALLR instruction maybe it would be useful to have an instruction that sets a register location that is written to when any of the CALL instructions are used. It would be something like "SETRETREG register_number". This would cause all of the CALL instructions to write the return address to the designated register instead of writing it to the return stack. Does this make sense, or would it cause confusion? I think this would satisfy the requirement for the C compiler. David Betz, what do you think?

Dave Hein · 2014-02-05 11:47

Bill, I agree, it would be confusing. I was just thinking out loud. So will the CALLR instruction allow for calling 9-bit and 16-bit constant addresses, and also calling indirectly through a register? It seems like that might require 2, or maybe 3 different instructions.

Bill Henning · 2014-02-05 12:00

I suspect that there will only be one:

CALLR D/#16bitconst

As that would allow addressing all 256KB (64KLONG) hub; the other CALLx's distinguish between cog/hub addresses as 0-511=cog, 512+=hub

Dave Hein wrote: »

Bill, I agree, it would be confusing. I was just thinking out loud. So will the CALLR instruction allow for calling 9-bit and 16-bit constant addresses, and also calling indirectly through a register? It seems like that might require 2, or maybe 3 different instructions.

DaveJenson · 2014-02-05 12:24

Wow, Chip.
It just keeps getting better and better!

Bill Henning · 2014-02-05 12:48

I was thinking of Sapieha's request for a CALL-equivalent to the new list. Something like that could be useful for VM's, and even more so for operating systems / libraries.

CALLVECT D,#n
CALLVECT D,S

CALLLIST looked weird with 3 L's, so I changed it to VECT for the example

D holds the base address of a WORD table in the hub

n or S are the index

This way it could dispatch to 512 system/VM routines. I think only the 4-level hardware stack version would be needed.

It bears thinking on, even if only for P3.

Bill Henning · 2014-02-05 13:01

Adddendum:

If the addresses in the word table are relative to their position, this would get us DLL's.

The calling task/cog would place the start of the DLL in the register "dllbase"

Then it could call any routine in the DLL with

CALLVECT dllbase,#routineindex ' (0..511)<<2

A dll would be:

WORD @function0
WORD @function1
....
WORD @functionN-1
' local DLL data area
function0:

function1:

...

As relative addresses would be used, no need for a relocating loader

HUB EXEC Update Here

Comments