The addressing conundrum

Cluso99 · 2015-10-01 04:36

BTW If we map cog/lut to lower hub for instruction addresses, it does not mean that we cannot compile code for here to be loaded into the cog/lut to work. So it is still useful for both data and code, just like the P1. It is just that we cannot run hubexec here.
In fact, we could run LMM in there, not that you would want to though

Roy Eltham · 2015-10-01 04:45

jmg,
Not sure what you are on about, did you even read all of what I wrote?

Of course, we need cog space code for the determinism and time critical stuff. However, do we need the exact same binary piece of code to both run in hub space and cog space? That's what I was talking about. I don't think so.

jmg · 2015-10-01 04:54

Cluso99 wrote: »

Anyway, with the same model using long addresses, @rel will work identically in both cogexec and hubexec. So it is possible (there will be some restrictions such as no rep instruction) to make the same routines able to be hub and/or cog resident.

Being able to have code run anywhere, is a significant plus, as the HLL libraries can be optimised once.
It also makes cut.paste safer with no lurking fish-hooks.

Cluso99 wrote: »

... skip the lower 8KB hub and map the cog/lut here. We now have 504KB of other hubexec space which gives us 126K instruction space.

Post 926 above has a memory map scheme that avoids any overlap-losses and gives some expansion room for future FPGA variants (P2V?)

potatohead · 2015-10-01 05:00

Except that it will be running 16 concurrent cores out of shared RAM.

Cluso99 · 2015-10-01 05:08

jmg wrote: »

Post 926 above has a memory map scheme that avoids any overlap-losses and gives some expansion room for future FPGA variants (P2V?)

Where do you get the post# from? I was looking for it before and I just looked again

I presume you mean the one with hub starting at $80000.

I don't like this at all. There are lots of reasons where we will want to use hub lower addresses without having to set the top bit as well.
We don't need to be able to run hubexec in every bit of hub. 8KB (2K instruction) loss is no big deal. Use that space for variables, tables, or just code to load into cog and/or lut. No biggie here at all.

jmg · 2015-10-01 05:20

Cluso99 wrote: »

jmg wrote: »

Post 926 above has a memory map scheme that avoids any overlap-losses and gives some expansion room for future FPGA variants (P2V?)

Where do you get the post# from? I was looking for it before and I just looked again

I presume you mean the one with hub starting at $80000.

Yes, this one
http://forums.parallax.com/discussion/comment/1346776/#Comment_1346776

Cluso99 wrote: »

I don't like this at all. There are lots of reasons where we will want to use hub lower addresses without having to set the top bit as well.

Once you have jumped into the space, relative jumps will leave the top bit as-is.

jmg · 2015-10-01 05:23

Roy Eltham wrote: »

Of course, we need cog space code for the determinism and time critical stuff. However, do we need the exact same binary piece of code to both run in hub space and cog space? That's what I was talking about. I don't think so.

If you choose to have Binary-incompatible code, you then have to run TWO copies of the libraries, and all the admin & gotchas that entails.

Cluso99 · 2015-10-01 05:24

While we are on the topic of instruction addresses...

                                       R=0                 R=1
CCCC 100111R 00I DDDDDDDDD SSSSSSSSS   DJZ     D,S/@       TJZ     D,S/@ 
CCCC 100111R 01I DDDDDDDDD SSSSSSSSS   DJNZ    D,S/@       TJNZ    D,S/@ 
CCCC 100111R 10I DDDDDDDDD SSSSSSSSS   DJS     D,S/@       TJS     D,S/@ 
CCCC 100111R 11I DDDDDDDDD SSSSSSSSS   DJNS    D,S/@       TJNS    D,S/@ 
                                       Q=0                 Q=1
CCCC 1100000 QLI DDDDDDDDD SSSSSSSSS   JP      D/#,S/@     JNP     D/#,S/@  'j pinD [not]positive?

In these instructions, the S operand variant uses the contents of cog register "S" to define the goto address.
Currently the lower 9-bits of the contents of "S" are used to define the goto address ($000-1FF) in COG.

Since the P1 & P2 are different enough...
Why couldn't we use the lower 11-bits of the contents of "S" to define the goto address ($000-7FF) in COG/LUT ?

CCCC 1010101 CZI DDDDDDDDD SSSSSSSSS   CALLD   D,S/@ {WC,WZ}  'save return address in Register DDDDDDDDD

In this case...
Why couldn't we use the lower 18-bits of the contents of "S" to define the goto address ($00000-3FFFF) in COG/LUT/HUB ?

Cluso99 · 2015-10-01 05:30

jmg wrote: »

Cluso99 wrote: »

jmg wrote: »

Post 926 above has a memory map scheme that avoids any overlap-losses and gives some expansion room for future FPGA variants (P2V?)

Where do you get the post# from? I was looking for it before and I just looked again

I presume you mean the one with hub starting at $80000.

Yes, this one
http://forums.parallax.com/discussion/comment/1346776/#Comment_1346776

I mean where onscreen does it say post 926 (specifically the 926)???

Cluso99 wrote: »

I don't like this at all. There are lots of reasons where we will want to use hub lower addresses without having to set the top bit as well.

Once you have jumped into the space, relative jumps will leave the top bit as-is.

I am thinking of its use as tables, etc. Don't want to set the top bit too.
IMHO its a kludge because when we get 1MB of hub, we then use the lower half without the top bit set. Then we have exactly the same problems. May as well deal with them now.

jmg · 2015-10-01 05:31

Cluso99 wrote: »

Why couldn't we use the lower 11-bits of the contents of "S" to define the goto address ($000-7FF) in COG/LUT ?

Makes sense, I think Chip said COG can simply roll-into LUT ?
Any downsides ? - does this mean larger code to load S ?

evanh · 2015-10-01 05:32

HubExec was never going to be 100% compatible with CogExec. The differences between the various Prop1 LMM memory models is a good example of that.

evanh · 2015-10-01 06:06

Cluso99 wrote: »
While we are on the topic of instruction addresses...
                                       R=0                 R=1
CCCC 100111R 00I DDDDDDDDD SSSSSSSSS   DJZ     D,S/@       TJZ     D,S/@ 
CCCC 100111R 01I DDDDDDDDD SSSSSSSSS   DJNZ    D,S/@       TJNZ    D,S/@ 
CCCC 100111R 10I DDDDDDDDD SSSSSSSSS   DJS     D,S/@       TJS     D,S/@ 
CCCC 100111R 11I DDDDDDDDD SSSSSSSSS   DJNS    D,S/@       TJNS    D,S/@ 
                                       Q=0                 Q=1
CCCC 1100000 QLI DDDDDDDDD SSSSSSSSS   JP      D/#,S/@     JNP     D/#,S/@  'j pinD [not]positive?
In these instructions, the S operand variant uses the contents of cog register "S" to define the goto address.
Currently the lower 9-bits of the contents of "S" are used to define the goto address ($000-1FF) in COG.

Since the P1 & P2 are different enough...
Why couldn't we use the lower 11-bits of the contents of "S" to define the goto address ($000-7FF) in COG/LUT ?
CCCC 1010101 CZI DDDDDDDDD SSSSSSSSS   CALLD   D,S/@ {WC,WZ}  'save return address in Register DDDDDDDDD  
In this case...
Why couldn't we use the lower 18-bits of the contents of "S" to define the goto address ($00000-3FFFF) in COG/LUT/HUB ?

Is that not the way it is already? Chip just mentioned that HubExec FIFO filling starts and stops on branching instructions. Those branches, including DJxx and CALLx, will have full address range capability to trigger the mode-switch.

Cluso99 · 2015-10-01 06:15

evanh wrote: »
Cluso99 wrote: »
While we are on the topic of instruction addresses...
                                       R=0                 R=1
CCCC 100111R 00I DDDDDDDDD SSSSSSSSS   DJZ     D,S/@       TJZ     D,S/@ 
CCCC 100111R 01I DDDDDDDDD SSSSSSSSS   DJNZ    D,S/@       TJNZ    D,S/@ 
CCCC 100111R 10I DDDDDDDDD SSSSSSSSS   DJS     D,S/@       TJS     D,S/@ 
CCCC 100111R 11I DDDDDDDDD SSSSSSSSS   DJNS    D,S/@       TJNS    D,S/@ 
                                       Q=0                 Q=1
CCCC 1100000 QLI DDDDDDDDD SSSSSSSSS   JP      D/#,S/@     JNP     D/#,S/@  'j pinD [not]positive?
In these instructions, the S operand variant uses the contents of cog register "S" to define the goto address.
Currently the lower 9-bits of the contents of "S" are used to define the goto address ($000-1FF) in COG.

Since the P1 & P2 are different enough...
Why couldn't we use the lower 11-bits of the contents of "S" to define the goto address ($000-7FF) in COG/LUT ?
CCCC 1010101 CZI DDDDDDDDD SSSSSSSSS   CALLD   D,S/@ {WC,WZ}  'save return address in Register DDDDDDDDD  
In this case...
Why couldn't we use the lower 18-bits of the contents of "S" to define the goto address ($00000-3FFFF) in COG/LUT/HUB ?
Is that not the way it is already? Chip just mentioned that HubExec FIFO filling starts and stops on branching instructions. Those branches, including DJxx and CALLx, will have full address range capability to trigger the mode-switch.

It's certainly currently not the way I understand it. I was just presuming the @rel was the way around the DJxx/TJxx/JxP instructions when in hubexec. I thought a JMP/CALLx/RETx was required to switch between hubexec and cog/lutexec.

evanh · 2015-10-01 06:41

Ah, just re-reading a bit more what Chip has said. I looks like only absolute branching can cause a mode-switch. Relative branching doesn't trigger a mode-switch.

cgracey · 2015-10-01 06:50

The D,S/@ branches treat the @ case as such:

The 9-bit S field is used literally and sign-extended to make a 20-bit offset. That offset is added to the PC to get the branch address. In the case of four byte addresses per instruction, that offset can be shifted up two bits to increase the range of those D,@ branch instructions.

In the case of S being a register, the lower 20 bits of S are used as the relative address.

Roy Eltham · 2015-10-01 06:50

jmg wrote: »

Roy Eltham wrote: »

Of course, we need cog space code for the determinism and time critical stuff. However, do we need the exact same binary piece of code to both run in hub space and cog space? That's what I was talking about. I don't think so.

If you choose to have Binary-incompatible code, you then have to run TWO copies of the libraries, and all the admin & gotchas that entails.

Only if you need to have the same bits of code in your libraries run in both cog and hub spaces. This was always going to have issues, so it's just more issues now. You could have fixup tables in your library and fixup the code as you copy it into cog space if you really need it.

It's just going to be incredibly rare to need binary compatibility between hub and cog spaces for code, and why make other common use cases worse to make a rare use case easier?

Honestly, I can't think of any real world code that I would want this on. I'm always going to want the code to be cog space or hub space, not both.

Also, again, since the code HAS to live in hub space in order to be copied to cog space, why not just call it in hub space instead of moving it, unless it truly requires running in cog space.

cgracey · 2015-10-01 06:53

I realized last night that I was wrong when I said that execution can continue from LUT into hub. You need to branch into hub space to start hub exec properly.

cgracey · 2015-10-01 06:56

I don't think it's going to be realistic to routinely load hub exec code into cog/LUT and execute it, even if binaries are compatible. If we had a real caching mechanism, that would be different.

jmg · 2015-10-01 06:59

Roy Eltham wrote: »

Also, again, since the code HAS to live in hub space in order to be copied to cog space, why not just call it in hub space instead of moving it, unless it truly requires running in cog space.

By that logic, why use COG mode at all ?

The point remains, code is more deterministic in COG/LUT and yes, that matters.

Reading David's post above, the lack of binary compatible code, is going to impact HLL development.

evanh · 2015-10-01 07:00

cgracey wrote: »

In the case of S being a register, the lower 20 bits of S are used as the relative address.

Perfect!

Being a relative branch, that isn't able to change between HubExec and CogExec though is it? Or is that restriction limited to just the 9-bit immediate operands?

cgracey · 2015-10-01 07:06

evanh wrote: »

cgracey wrote: »

In the case of S being a register, the lower 20 bits of S are used as the relative address.

Perfect!

Being a relative branch, that isn't able to change between HubExec and CogExec though is it? Or is that restriction limited to just the 9-bit immediate operands?

Any branch that expresses an address that is in hub range, no matter via immediate, relative, only 9-bit, or whatever, enters hub exec.

The cog hardware initiates hub exec any time a branch to hub occurs, either from cog/LUT or hub. In the case of a hub branch to hub, hub exec must be reinitiated because a new instruction stream is needed.

When a branch from hub to cog/LUT occurs, nothing special is done, just an instruction read from cog or LUT.

evanh · 2015-10-01 07:11

That seems pretty clean to me.

jmg · 2015-10-01 07:15

cgracey wrote: »

I don't think it's going to be realistic to routinely load hub exec code into cog/LUT and execute it, even if binaries are compatible. If we had a real caching mechanism, that would be different.

So that means two HLL libraries are needed, one (Fast_Low_Jitter) for small function that best reside in COG.LUT and another variant library for HUB.
Can that share Source, for at least some saving ?

That requires code generation switches, and some tracking means to ensure the code lands where it is binary compatible.
... or, you just tell users they are on their own for COG code, and have to manage their own libraries, or run in ASM only.
I could also see some overlay scheme may be needed, and ASM then needs to allow COG_SEG to be not hard-limited to allow users to build code that can be pulled into COGs on demand.

Cluso99 · 2015-10-01 07:25

cgracey wrote: »

evanh wrote: »

cgracey wrote: »

In the case of S being a register, the lower 20 bits of S are used as the relative address.

Perfect!

Being a relative branch, that isn't able to change between HubExec and CogExec though is it? Or is that restriction limited to just the 9-bit immediate operands?

Any branch that expresses an address that is in hub range, no matter via immediate, relative, only 9-bit, or whatever, enters hub exec.

The cog hardware initiates hub exec any time a branch to hub occurs, either from cog/LUT or hub. In the case of a hub branch to hub, hub exec must be reinitiated because a new instruction stream is needed.

When a branch from hub to cog/LUT occurs, nothing special is done, just an instruction read from cog or LUT.

Great news. Even better than I thought.
Now to solve the addressing issue... Any more thoughts Chip?

evanh · 2015-10-01 07:36

Oi! JMG! You can keep the dirty segmentation talk in the General forum!

Seriously though, Cog/Hub separation is not any sort of example of segmentation. Segmentation is just a way of using registers that are smaller than the general linear address range. It's relative addressing but nerf'd.

The Propeller literally has separate uses of RAM with different bussing methods. A fancy caching scheme might be able to mesh them in a logical sense but that still wouldn't resolve problems in determinism.

As for an overlay'ish arrangement, I think something like that is already implemented in LMM based code, where the compiler will optimise some code into CogExec automatically. There is compile flags for this too I think.

cgracey · 2015-10-01 07:37

Roy Eltham wrote: »

jmg wrote: »

Roy Eltham wrote: »

Of course, we need cog space code for the determinism and time critical stuff. However, do we need the exact same binary piece of code to both run in hub space and cog space? That's what I was talking about. I don't think so.

If you choose to have Binary-incompatible code, you then have to run TWO copies of the libraries, and all the admin & gotchas that entails.

Only if you need to have the same bits of code in your libraries run in both cog and hub spaces. This was always going to have issues, so it's just more issues now. You could have fixup tables in your library and fixup the code as you copy it into cog space if you really need it.

It's just going to be incredibly rare to need binary compatibility between hub and cog spaces for code, and why make other common use cases worse to make a rare use case easier?

Honestly, I can't think of any real world code that I would want this on. I'm always going to want the code to be cog space or hub space, not both.

Also, again, since the code HAS to live in hub space in order to be copied to cog space, why not just call it in hub space instead of moving it, unless it truly requires running in cog space.

This my philosophy, too. Hub and cog/LUT code are going to be two very different animals with different purposes and different structure.

Hub code is going to routinely reference hub data relatively. That same binary code won't work in cog/LUT, no matter what. Cog code will often use the RDFAST/WRFAST hardware for special purposes, which hub exec needs exclusive control over.

By executing hub code in cog/LUT, you could only get an execution speed improvement, but never a functional improvement. Real cog/LUT code will have the express purpose of doing things that just cannot be done in hub exec.

jmg · 2015-10-01 07:50

evanh wrote: »

Oi! JMG! You can keep the dirty segmentation talk in the General forum!

Seriously though, Cog/Hub separation is not any sort of example of segmentation. Segmentation is just a way of using registers that are smaller than the general linear address range.

??
You seem to be rather confusing x86 terminology, with the Linker/memory segments I was referring to.

Roy Eltham · 2015-10-01 07:51

jmg,

org = cog/lut
orgh = hub

done.

cgracey · 2015-10-01 07:57

Cluso99 wrote: »

cgracey wrote: »

evanh wrote: »

cgracey wrote: »

In the case of S being a register, the lower 20 bits of S are used as the relative address.

Perfect!

Being a relative branch, that isn't able to change between HubExec and CogExec though is it? Or is that restriction limited to just the 9-bit immediate operands?

Any branch that expresses an address that is in hub range, no matter via immediate, relative, only 9-bit, or whatever, enters hub exec.

The cog hardware initiates hub exec any time a branch to hub occurs, either from cog/LUT or hub. In the case of a hub branch to hub, hub exec must be reinitiated because a new instruction stream is needed.

When a branch from hub to cog/LUT occurs, nothing special is done, just an instruction read from cog or LUT.

Great news. Even better than I thought.
Now to solve the addressing issue... Any more thoughts Chip?

I understand the notion of all longs being long-aligned, but like Roy said, it just simplifies a rare use case (binary compatibility) and complicates the Sunday driver case of hub memory having no alignment caveats.

jmg · 2015-10-01 08:02

cgracey wrote: »

This my philosophy, too. Hub and cog/LUT code are going to be two very different animals with different purposes and different structure.

Which of those "two very different animals' will GCC support ?

The addressing conundrum

Comments