High-level languages and the LUT

cgracey · 2019-08-31 03:05

RossH wrote: »

msrobots wrote: »

and for that chip stole a bit in the instruction.

So just the lower half of it immediate.

Mike

I'm clearly missing something here

Are you saying that RDLUT/WRLUT can only immediately access the lower half of the LUT? I don't get that from reading the instruction definition, but if it is true I will change my use of the LUT so that instead of using the lower half for code, I will use the upper half.

It's a change that exists in the newest silicon. I am working on the documentation for it now. I already posted a new list of instructions:

https://docs.google.com/spreadsheets/d/1_vJk-Ad569UMwgXTKTdfJkHYHpc1rZwxB-DcIiAZNdk/edit?usp=sharing

RossH · 2019-08-31 03:26

cgracey wrote: »

RossH wrote: »

msrobots wrote: »

and for that chip stole a bit in the instruction.

So just the lower half of it immediate.

Mike

I'm clearly missing something here

Are you saying that RDLUT/WRLUT can only immediately access the lower half of the LUT? I don't get that from reading the instruction definition, but if it is true I will change my use of the LUT so that instead of using the lower half for code, I will use the upper half.

It's a change that exists in the newest silicon. I am working on the documentation for it now. I already posted a new list of instructions:

https://docs.google.com/spreadsheets/d/1_vJk-Ad569UMwgXTKTdfJkHYHpc1rZwxB-DcIiAZNdk/edit?usp=sharing

Ok, I get it. Allowing PTRx means you can only have 8 bits of immediate address. I hope this doesn't break too much existing code

msrobots · 2019-08-31 03:52

You can now use ptra or ptrb with rd/wrlut like you can do with rdlong/wrlong

So you have indexed access to all longs and immediate access to just the first half, as far as I read about this.

I might be wrong but since @TonyB_ said it, I am quite sure I got it right.
maybe immediate is the wrong word

My guess it is about something like

rdlut value, #5

will work and

rdlut value, #511

will not work anymore

just #255

but

rdlut value, address

still works

but additional you can now do some

rdlut value, ptra++

maybe even

rdlut ptrb++, ptra++

copying inside the LUT and COG mem, not HUB

something like that. Feature creep.

Mike

cgracey · 2019-08-31 05:07

One nice improvement to the instruction set is how the BITx/WxPIN/DIRx/OUTx/FLTx/DRVx all use the 5 bits above the bottom bits to specify how many extra bits or pins are to be affected by the instruction.

DRVRND #16 + 7<<6 'output random states to P[23:16]

How many times have you needed to output random states to pins, but had to go through all kinds of trouble coming up with a random number, and then masking it to so many OUT bits, while setting the same DIR bits? Now you can just do it in two clocks with one instruction.

This is really handy:

WRPIN dacmode,#0 + 15<<6 'make P[15:0] into DAC outputs
DRVL #0 + 15<<6 'enable them all

RossH · 2019-08-31 05:19

I predict that the P2 will not be finished as long as there is an unused bit - or even just a rarely used bit - left anywhere in the P2 instruction set!

Cluso99 · 2019-08-31 05:20

Chip,
Don't the lower 6 bits set the pin# 0-63, and the upper 3? bits set up to 8 consecutive pins???
Do you allow more than 8 pins if the value is not immediate? Immediate limits to 9 bits total, so 6 bits for pin# and 3 bits available for number of pins ???

cgracey wrote: »

One nice improvement to the instruction set is how the BITx/WxPIN/DIRx/OUTx/FLTx/DRVx all use the 5 3??? bits above the bottom 6 bits to specify how many extra bits or pins are to be affected by the instruction.

DRVRND #16 + 7<<6 'output random states to P[23:16]

How many times have you needed to output random states to pins, but had to go through all kinds of trouble coming up with a random number, and then masking it to so many OUT bits, while setting the same DIR bits? Now you can just do it in two clocks with one instruction.

This is really handy:
So, I cannot see in the following how you pack 4 bits on top of 6 bits which truncate to 9 bits immediate???
WRPIN dacmode,#0 + 15<<6 'make P[15:0] into DAC outputs
DRVL #0 + 15<<6 'enable them all

evanh · 2019-08-31 05:28

Yes. It's getting to the point where the IN/OUT/DIR addressable special registers are becoming defunct. It was something that stood out to me a while back that the extensive list of features in the prop2 accessible only by new instructions is swamping out what the prop1 had traditionally relied on.

It shows up a more distinct load/store architecture than the prop1 did.

cgracey · 2019-08-31 05:32

Cluso99 wrote: »

Chip,
Don't the lower 6 bits set the pin# 0-63, and the upper 3? bits set up to 8 consecutive pins???
Do you allow more than 8 pins if the value is not immediate? Immediate limits to 9 bits total, so 6 bits for pin# and 3 bits available for number of pins ???

cgracey wrote: »

One nice improvement to the instruction set is how the BITx/WxPIN/DIRx/OUTx/FLTx/DRVx all use the 5 3??? bits above the bottom 6 bits to specify how many extra bits or pins are to be affected by the instruction.

DRVRND #16 + 7<<6 'output random states to P[23:16]

How many times have you needed to output random states to pins, but had to go through all kinds of trouble coming up with a random number, and then masking it to so many OUT bits, while setting the same DIR bits? Now you can just do it in two clocks with one instruction.

This is really handy:
So, I cannot see in the following how you pack 4 bits on top of 6 bits which truncate to 9 bits immediate???
WRPIN dacmode,#0 + 15<<6 'make P[15:0] into DAC outputs
DRVL #0 + 15<<6 'enable them all

Ah, yes, sorry. In a nine bit immediate, you could only fit 3 bits above the bottom 6.

cgracey · 2019-08-31 05:34

evanh wrote: »

Yes. It's getting to the point where the IN/OUT/DIR addressable special registers are becoming defunct. It was something that stood out to me a while back that the extensive list of features in the prop2 accessible only by new instructions is swamping out what the prop1 had traditionally relied on.

It shows up a more distinct load/store architecture than the prop1 did.

I guess if we had a way to specify a base pin and adder for reading and writing, we really could get rid of DIRA/DIRB/OUTA/OUTB/INA/INB. There are still advantages to having them mapped, though, like being able to do a SETNIB/GETNIB/ROLNIB, for example.

msrobots · 2019-08-31 05:55

yeah, I was thinking that too, but being able to write DIRA = 0 (or any other stop condition) DIRB=0 (or any other stop condition), same for outa, outb in a emergency is unbeatable

4 instructions and you are save.

don't touch ina, inb, outa, outb until you have 64 bit long longs in SPIN.

64 pins need 64 bit DataType you could support that in SPIN. we have byte, word, long, why not add another one. 64 bit. Not sure how to name it. quad?

Mike

evanh · 2019-08-31 06:10

2x 32-bit operations on a 64-bit integer would have been easier to make work without interrupts. The prop1 could.

msrobots · 2019-08-31 06:33

not really true.

As long as the interrupt does not need those 64 bit values currently changed, everything is still OK. One could even throw a lock at it, we have still 16, not just 8

then it is atomic

Mike

evanh · 2019-08-31 06:50

Integers are the language of ISRs. Can't use locks in interrupt, only option is wrap each case in a disable/enable to temporarily block the ISR, adding jitter.

msrobots · 2019-08-31 07:57

yeah, that could work too.

But to say spin could not have a 64 bit datatype alike byte, word, long, quad because it has interrupts is not really fair.

amerite?

Mike

evanh · 2019-08-31 12:04

One way is to outright ban them from being used in ISRs. But I suspect there is a lot more to it, otherwise the prop1 could of had them long ago.

evanh · 2019-08-31 12:33

Ah, hubRAM, of course. Funnily, I think the prop2 has an advantage over the prop1 when it comes to atomic updating of hubRAM stored structures. The SETQ+RD/WRLONG bursts in a way that makes each burst appear to be atomic because all cogs burst as sequentially incrementing only, and an individual location is time interleaved just like a single RD/WRLONG on the prop1.

If all hubRAM structure accesses are done as burst transfers then I think they will effectively be atomic structures ... of any length.

EDIT: Well, any length that can fit in cog/lutRAM in a single burst that is.

potatohead · 2019-08-31 22:54

Seems to me the more distinct load store is an artifact of hubexec.

In that context, cog ram really is more like registers.

evanh · 2019-08-31 23:36

cgracey wrote: »

There are still advantages to having them mapped, though, like being able to do a SETNIB/GETNIB/ROLNIB, for example.

I've looked at much of my code that uses RDPIN or GETQX/GETQY and thought it would be faster if those were memory mapped instead. But of course there is trade-offs: In the case of RDPIN there would need to be a map of 64 locations. In the case of GETQX/Y the CORDIC result completion mechanism is built into the instruction - that might be difficult to make happen as memory mapped.

Oh, speaking of something I felt could improve execution speed: With RDPIN, inline checks for a new result would be faster if the IN state is also optionally placed in C or Z as specified by instruction. EDIT: Or maybe have a new instruction that can wait on a new result, like how there is WAITxxx and POLLxxx.

High-level languages and the LUT

Comments