Is there a better description of the p2 instructions than the Rev B doc and spreadsheet?

cgracey · 2020-04-04 06:31

Roy Eltham wrote: »

Chip,
If you made a new operator (@@@ for example), then when you compile it you do it similar to how @ would be compiled except you save the offset to the compiled bytecode in a fixup table and you force the bytecode compiled size to be always the same for this operator. This fixup table would need to be held in the blobs similar to the pub name and index values. Then when including child blobs during compiling you use their fixup tables to adjust their @@@ offsets by their offset with respect to the current compiling object, then you adjust the fixup table to be relative to the current object and add it to its fixup table. At the end you have one big fixup table that you can then use to do the final fixup of all @@@ offsets to be their absolute hub address. I think it might work out. If you even understand what I am trying to explain. It might be tricky to keep the fixup tables correct when distilling, but maybe not?

I tried a bunch to get @@@ working in OpenSpin way back whenever, and I failed to think of this idea above back then. If you don't poke it full of holes with things I missed, then I might get the gumption to implement it in OpenSpin.

Roy, I understand pretty well what you are saying. I can visualize that. One thing we would not be able to do, practically, would be to incorporate math operations around absolute addresses. It would have to be absolute address, only, to be simply patched.

Cluso99 · 2020-04-04 06:55

evanh wrote: »

I fully understand. Humans have been taught to read most-significant-first but that's no the way little-endian works. So we concoct a strange mix to make it more readable ... and the result is confusion on the address boundaries because the display order is all jumbled.

Not every micro, or mainframe for that matter, is little endian. The Motorola 6800 and 68000 are big endian as is IIRC 6502. The 8080, and hence Z80, and the later derivatives are little endian. There are pros and cons for both. I prefer big endian, but that could be because I cut my teeth on big endian mainframes and minis.

evanh · 2020-04-04 07:19

Ya, the Motorola architecture had it right. The Internet and phone systems around the world were bourne of those times and still retains that heritage.

Cluso99 · 2020-04-04 09:23

The IBM RISC processor is big endian. The ARM micro has two big endian modes 8 and 32 - whatever that means.

And guess what, I think Intel is realising an error because there are big endian instructions to fetch from memory in the later x86-64. Who would have guessed this hey!

Wuerfel_21 · 2020-04-04 09:58

The only advantage of a big endian computer is that you can read multi-byte values easier when looking at a memory dump. Which is only because were used to reading big endian numbers. Which is only because when people first started to borrow arabic numerals for left-to-right languages, they failed to flip them around. (In right-to-left langauges, they indeed are little endian)

Cluso99 wrote: »

And guess what, I think Intel is realising an error because there are big endian instructions to fetch from memory in the later x86-64. Who would have guessed this hey!

Conversely, POWER has endianess-reversing load/store ops (later POWER CPUs can be switched into either endianess).

evanh · 2020-04-04 10:12

Wuerfel_21 wrote: »

... first started to borrow arabic numerals for left-to-right languages, they failed to flip them around. (In right-to-left langauges, they indeed are little endian)

I asked about that and I've been told the case there is, irrespective of the direction on the page, they read the numbers most significant first the same as the west. So we not only borrowed the number system but also the order sense. And Arabic borrowed it from the even older Indian numbers. Roman numerals were most significant first also, although arguably that doesn't count.

David Betz · 2020-04-04 10:34

cgracey wrote: »

Roy Eltham wrote: »

Chip,
If you made a new operator (@@@ for example), then when you compile it you do it similar to how @ would be compiled except you save the offset to the compiled bytecode in a fixup table and you force the bytecode compiled size to be always the same for this operator. This fixup table would need to be held in the blobs similar to the pub name and index values. Then when including child blobs during compiling you use their fixup tables to adjust their @@@ offsets by their offset with respect to the current compiling object, then you adjust the fixup table to be relative to the current object and add it to its fixup table. At the end you have one big fixup table that you can then use to do the final fixup of all @@@ offsets to be their absolute hub address. I think it might work out. If you even understand what I am trying to explain. It might be tricky to keep the fixup tables correct when distilling, but maybe not?

I tried a bunch to get @@@ working in OpenSpin way back whenever, and I failed to think of this idea above back then. If you don't poke it full of holes with things I missed, then I might get the gumption to implement it in OpenSpin.

Roy, I understand pretty well what you are saying. I can visualize that. One thing we would not be able to do, practically, would be to incorporate math operations around absolute addresses. It would have to be absolute address, only, to be simply patched.

As Eric mentioned a while back, FastSpin supports @@@ for P2.

@ersmith: Can you use @@@ in an expression in FastSpin?

ersmith · 2020-04-04 10:48

David Betz wrote: »

cgracey wrote: »

Roy Eltham wrote: »

Chip,
If you made a new operator (@@@ for example), then when you compile it you do it similar to how @ would be compiled except you save the offset to the compiled bytecode in a fixup table and you force the bytecode compiled size to be always the same for this operator. This fixup table would need to be held in the blobs similar to the pub name and index values. Then when including child blobs during compiling you use their fixup tables to adjust their @@@ offsets by their offset with respect to the current compiling object, then you adjust the fixup table to be relative to the current object and add it to its fixup table. At the end you have one big fixup table that you can then use to do the final fixup of all @@@ offsets to be their absolute hub address. I think it might work out. If you even understand what I am trying to explain. It might be tricky to keep the fixup tables correct when distilling, but maybe not?

I tried a bunch to get @@@ working in OpenSpin way back whenever, and I failed to think of this idea above back then. If you don't poke it full of holes with things I missed, then I might get the gumption to implement it in OpenSpin.

Roy, I understand pretty well what you are saying. I can visualize that. One thing we would not be able to do, practically, would be to incorporate math operations around absolute addresses. It would have to be absolute address, only, to be simply patched.

As Eric mentioned a while back, FastSpin supports @@@ for P2.

@ersmith: Can you use @@@ in an expression in FastSpin?

Yes, you can use @@@ in some simple expressions (you can add or subtract constants to an @@@label. and you can take the difference of @@@ labels).

evanh · 2020-04-04 10:50

Cluso99 wrote: »

The IBM RISC processor is big endian. The ARM micro has two big endian modes 8 and 32 - whatever that means.

POWER was touted as being able to operate in either right from the start, or at least from when the first single chip PPC came to market. I've never looked into the details but I do know IBM had a wacky numbering of the bits which suited little-endian really well.

Here's a snippet from an ARMv8 manual:

"In the AArch64 execution state, data accesses can be LE or BE, while instruction fetches are always LE.

So that's only halfway capable. Admittedly it'll clean up data structures and files, which are the most important parts.

I wouldn't be surprised to find PPC is actually the same.

EDIT: Note: Pointers/addresses probably don't count as data in that context.

ersmith · 2020-04-04 10:55

I've got to say that, as a tool author, it's extremely frustrating that everyone asks for features for Spin2 but ignores the tools that do provide those features.

We shouldn't expect Spin2 to do everything: Chip has enough on his plate, and in any event no single tool can be expected to provide every feature. There are other tools for the P2. There's Taqoz (which is severely underrated, I think). There are at least 3 C compilers, there's micropython, and there's fastspin with it's Spin, C, and BASIC dialects. Should we throw away all those other tools now that Spin2 is here, and expect Chip to support all of their features? That's the impression I'm getting from a lot of users. If nobody uses those other tools then what's the point of having them?

evanh · 2020-04-04 11:09

I'm happily still using Fastspin. Although I'm only writing pasm code.

With the greater amount of RAM on the prop2 I probably should try out C. But so far it's all just testing the hardware in small routines so not really any need to go the next step.

David Betz · 2020-04-04 11:28

ersmith wrote: »

I've got to say that, as a tool author, it's extremely frustrating that everyone asks for features for Spin2 but ignores the tools that do provide those features.

We shouldn't expect Spin2 to do everything: Chip has enough on his plate, and in any event no single tool can be expected to provide every feature. There are other tools for the P2. There's Taqoz (which is severely underrated, I think). There are at least 3 C compilers, there's micropython, and there's fastspin with it's Spin, C, and BASIC dialects. Should we throw away all those other tools now that Spin2 is here, and expect Chip to support all of their features? That's the impression I'm getting from a lot of users. If nobody uses those other tools then what's the point of having them?

I will certainly be using FastSpin when I get back to P2 programming. With the larger hub memory I would prefer to use a compiler that produces native code. I might switch to Spin2 if I start running out of code space but I don't see that happening for most if not all of what I am likely to do.

JRoark · 2020-04-04 11:44

@ersmith I’m using FastSpin and FastBASIC exclusively. These are such good, low-drama tools these days that the silence you think you are hearing is actually the happy ticking of many keyboards using them!

Cluso99 · 2020-04-04 11:50

Eric,
I love the tools you’ve done. It’s a tremendous task.

However, I want to be able to use my existing code and that is spin1 and pasm1. So I need to convert to spin2 and pasm2. Spin provides small bytecode. I really don’t want my spin code converted to pasm2 which I believe you can do.

So currently my only option is to use punt and later PropTool, at least until you have spin2 working (if you ever do). I am disappointed Chip and Parallax haven’t supported you better.

My other option was to use my spin1 interpreter on the P2. I don’t have the time currently to complete this - I believe it only needs debugging. That way I could compile spin1 using P1 tools and just include the binaries. A year or more ago I did prove the basics for this method do work.

TonyB_ · 2020-04-04 11:58

wmosscrop wrote: »

evanh wrote: »

Maybe try moving the whole thing into hubexec. Assuming timing isn't that critical, this will need some additional work to split off the working variables because they still generally will want to be cog registers.

PS: lutram is excellent for indexed 32-bit tables and buffers.

Yes, but that then precludes the use of XBYTE (which needs the FIFO used by hubexec), which I definitely want to use. It has potential for saving quite a bit of code in this particular cog.

Apart from the EXECF table(s) for XBYTE, I've found that it's best to use LUT RAM for code. Your IBM 1130 emulator only needs 64 longs for the XBYTE LUT and the remaining 448 longs would probably be big enough to hold debugging routines, instead of in hub RAM.

The IBM 1130 is less complicated than other processors that can fit entirely in one cog with no hub RAM required and XBYTE should give you large code savings. I'm getting code compression of 3+, i.e. code size would be more than 3x times larger if implemented as separate routines compared to EXECF/SKIPF sequences.

ersmith · 2020-04-04 12:17

Cluso99 wrote: »

However, I want to be able to use my existing code and that is spin1 and pasm1. So I need to convert to spin2 and pasm2. Spin provides small bytecode. I really don’t want my spin code converted to pasm2 which I believe you can do.

So currently my only option is to use punt and later PropTool, at least until you have spin2 working (if you ever do).

fastspin can compile Spin1 directly into an executable P2 binary. So you don't have to port the spin part of your code to spin2, you can just convert the pasm part from p1 pasm to p2 pasm, and then compile everything directly.

Yes, the binaries produced by fastspin are probably larger than the ones produced by the Spin2 compiler (I don't think anyone has ever actually compared to verify this though, so it may not be very much larger). If you're porting P1 code to P2 I seriously doubt that will matter: you're moving from 32KB of RAM to 512 KB of RAM. If the compiled program ends up twice as large then there's still *plenty* of space left.

If I added bytecode support to fastspin would anyone use it? I really have the impression that people will use Spin2 no matter what, because it's the "official" compiler.

ManAtWork · 2020-04-04 12:46

ersmith wrote: »

I've got to say that, as a tool author, it's extremely frustrating that everyone asks for features for Spin2 but ignores the tools that do provide those features.
We shouldn't expect Spin2 to do everything ...
If nobody uses those other tools then what's the point of having them?
...
I really have the impression that people will use Spin2 no matter what, because it's the "official" compiler.

No. At least, it's me who is still using Fastspin and who doesn't complain. I've made a few suggestions (see operator methods and flexgui and pnut look and feel) but those can wait.

I think Fastspin will stay my favorite compiler (at least for big projects) because it produces faster code than interpreted spin and that enables me using high level language for things I'd otherwise had to use assembler. So the code looks better and is easier to maintain (and ported if there's a P3 some day). For small programs, Parallax' Spin2 will be probably a better choice.

And BTW, I've never understood what that "@@" and "@@@" is all about. I've been coding for the P1 for nearly 10 years. I've done many and real big projects using all of the resources of the P1 and I've never ever needed them. I don't even understand why they are necessary and I refuse to care.

avsa242 · 2020-04-04 13:35

I use fastspin exclusively...it's the only cross-platform spin2 compiler for the P2. There are things I would love to see supported fastspin but even if it didn't I'd still use it. The existence of the preprocessor, the ability to mix languages (though admittedly I haven't tried this yet, I do see a few uses for it personally that I can't find other solutions for), the ability to use it as an external tool in another editor... all pluses.

wmosscrop · 2020-04-04 14:03

cgracey wrote: »

Here is another way which moves the addresses out to the hub, to conserve on cog RAM:

A slight bug: the delay and hub variables need to be defined in this order (res have to be last).

delay	long	2_000_000
hub	res	1

This might work for my purposes. Still looking at moving the debug code to lut.

wmosscrop · 2020-04-04 14:24

TonyB_ wrote: »

The IBM 1130 is less complicated than other processors that can fit entirely in one cog...

One "feature" of the 1130 was that the 3 index registers are actually core locations 1, 2, and 3.
For performance reasons, I used shadow hub registers to keep track of these indexes. But this means that I had to monitor changes to core in case one of these indexes were changed... and of course any changes to an index have to be reflected in core. Since most instructions reference these registers, it was worth the extra code to improve performance.
With the P2 being much faster, I may not have to go to this extreme.
And yes, the Fortran compiler did take advantage of accessing the indexes via core addresses. When you have to be able to run in 4k words (8k bytes) every word counts. There are nearly 20 phases in the compiler due to this restriction.

Ariba · 2020-04-04 16:29

@ersmith

I think threads like this one are an attempt to get Chip's Spin 2 at least close to the quality of Fastspin.
Since Spin2 will be the official programming tool for the P2, this is simply necessary.
Once the PropTool is adapted to Spin2, many will probably use this IDE. FlexGUI is very functional, but not a full IDE, I miss syntax highlighting in particular.

I made some objects for Fastspin, but I stopped because I was afraid that Fastspin would be adapted to Spin2 and then everything would not work anymore.
So it would be important to know what you are planning to do with Fastspin. Will the substantial changes of Spin2 be incorporated into it, or will it remain as Spin1 compatible as it is today?

Actually I planned to write the general purpose objects in Spin, and then use them in main programs written in Flex-C.
This would have had the advantage that you could use the objects also in the official Spin2. But since the two Spin dialects are now quite incompatible, this makes less sense, and I could write anything in C.

Andy

ersmith · 2020-04-04 16:40

Ariba wrote: »

I made some objects for Fastspin, but I stopped because I was afraid that Fastspin would be adapted to Spin2 and then everything would not work anymore.
So it would be important to know what you are planning to do with Fastspin. Will the substantial changes of Spin2 be incorporated into it, or will it remain as Spin1 compatible as it is today?

I intend for fastspin to always be Spin1 compatible when compiling objects with a ".spin" extension. I intend to keep supporting P1 in it, so I think this is important.

I plan for it to be (mostly) Spin2 compatible when compiling objects with a ".spin2" extension. At the moment Spin1 and Spin2 are a bit mixed up, with some Spin2 features enabled when compiling for P2, regardless of the file's extension, but I'm going to try to fix that. The goal is to make the source language independent of the target processor, as long as no PASM is involved. You'll be able to compile a pure Spin1 (.spin) file for either P1 or P2, and similarly a .spin2 file for either P1 or P2.

Cluso99 · 2020-04-04 18:35

wmosscrop wrote: »

TonyB_ wrote: »

The IBM 1130 is less complicated than other processors that can fit entirely in one cog...

One "feature" of the 1130 was that the 3 index registers are actually core locations 1, 2, and 3.
For performance reasons, I used shadow hub registers to keep track of these indexes. But this means that I had to monitor changes to core in case one of these indexes were changed... and of course any changes to an index have to be reflected in core. Since most instructions reference these registers, it was worth the extra code to improve performance.
With the P2 being much faster, I may not have to go to this extreme.
And yes, the Fortran compiler did take advantage of accessing the indexes via core addresses. When you have to be able to run in 4k words (8k bytes) every word counts. There are nearly 20 phases in the compiler due to this restriction.

The IBM 1130 sounds interesting. I worked on an ICL System Ten and it had 3 index registers that were at specific core addresses. Programs were often only 5KB (6bit bytes -ASCII columns 2-5) but there was an additional 10KB that contained callable OS routines by all programs. There could be up to 20 separate partitions running simultaneously (like cog equivalents).

wmosscrop · 2020-04-04 20:11

Cluso99 wrote: »

The IBM 1130 sounds interesting. I worked on an ICL System Ten and it had 3 index registers that were at specific core addresses. Programs were often only 5KB (6bit bytes -ASCII columns 2-5) but there was an additional 10KB that contained callable OS routines by all programs. There could be up to 20 separate partitions running simultaneously (like cog equivalents).

Interesting. I had heard of ICL but not that machine. It's amazing what they could do with so little memory. Now we have tons of it and just waste it (IMHO).

The 1130 was my first "real" computer. I was lucky enough, in high school, to have physical access to one. I can still remember the smell of the machine oil & punched cards... I started working with the P1 to create my own 1130. I thought it would take 6 months. Hah. After 8 years I finally said I had done enough.

And now I'm moving it over to the P2. I don't think it will take 6 months. Time will tell.

Cluso99 · 2020-04-05 01:09

wmosscrop wrote: »

Cluso99 wrote: »

The IBM 1130 sounds interesting. I worked on an ICL System Ten and it had 3 index registers that were at specific core addresses. Programs were often only 5KB (6bit bytes -ASCII columns 2-5) but there was an additional 10KB that contained callable OS routines by all programs. There could be up to 20 separate partitions running simultaneously (like cog equivalents).

Interesting. I had heard of ICL but not that machine. It's amazing what they could do with so little memory. Now we have tons of it and just waste it (IMHO).

The 1130 was my first "real" computer. I was lucky enough, in high school, to have physical access to one. I can still remember the smell of the machine oil & punched cards... I started working with the P1 to create my own 1130. I thought it would take 6 months. Hah. After 8 years I finally said I had done enough.

And now I'm moving it over to the P2. I don't think it will take 6 months. Time will tell.

The System Ten was designed in 1969-70 by Friden for Singer (sewing machines - but they also make the flight simulators for 727 etc). ICL bought Singer (computer division) in 1976. ICL did a major redesign in 1980 (System 25) and was in production until 1993 and maintained until 2000+. I bought a large System Ten in 1977 and installed it in my garage where it worked faithfully until 2000 when I sold it for scrap - to recover the gold. There were only 16 instructions which included multiply 1-10 digits by 1-10 digits, the length of the result was the sum of the source digits so overflow was not possible, all in decimal. Divide was the reverse of multiply. Memory was also addressed decimally. It used a 60-bit instruction which addressed both operands with optional indexing, so everything was memory to memory, much like the P1 & P2. It had up to 20 partitions which were just like cogs with their own memory, and a common memory like hub. In 1990 I wrote an emulator in 486 assembly that was 3x faster and had it fully validated.

potatohead · 2020-04-05 03:05

6502 is little Indian, FYI

ersmith wrote: »

If I added bytecode support to fastspin would anyone use it? I really have the impression that people will use Spin2 no matter what, because it's the "official" compiler.

I would. If it could be mixed with PASM it would be amazing, but not necessary.

In my travels, I may have caught the virus. Been quiet for a while getting past being sick, and caring for my family, who also got it. We are going to be fine. Was really rough.

Just want to report in, explain why I have been quiet.

I love FastSpin. Official or not, it is exemplary.

Bytecode would make for big programs. Bonus if that is transparent, in that one can inline PASM, and write PASM as it can be done today.

I think it would be near optimal use of P2, allowing really big programs.

Regarding Chip and Spin2: He has a vision. I am eager to see it realized. Will use the tool, but not as my primary. That is OK. Having such great and distinctive tools is a very nice problem to have.

Thanks for great work Eric.

The only standing request I have is being able to just declare an address in HUB for an included file, etc... we have discussed it, and it is OK that is not supported right now, if ever. Nice to have though, just saying ; D

Roy Eltham · 2020-04-05 03:41

Eric,
You already know I intend to use FlexC going forward, for P2 AND P1. I think once you get a few more features working it will be the ideal thing to use for P2 code.
Also, with how trivial it is to use Spin (or other language) objects directly from C, it means I can use objects other people write no matter what language they use.

evanh · 2020-04-05 04:29

potatohead wrote: »

In my travels, I may have caught the virus. Been quiet for a while getting past being sick, and caring for my family, who also got it. We are going to be fine. Was really rough.

Glad to hear all survived. No one went to hospital?

Door handles proved to be a huge vector for the flu at work. When we put hand sanitisers at the security door, where it was one person through at a time, the yearly cases dropped right off.

ersmith · 2020-04-05 13:40

potatohead wrote: »

In my travels, I may have caught the virus. Been quiet for a while getting past being sick, and caring for my family, who also got it. We are going to be fine. Was really rough.

Oh boy, that's scary. Glad to hear you're recovering. Take care!

cgracey · 2020-04-05 14:40

ersmith wrote: »

potatohead wrote: »

In my travels, I may have caught the virus. Been quiet for a while getting past being sick, and caring for my family, who also got it. We are going to be fine. Was really rough.

Oh boy, that's scary. Glad to hear you're recovering. Take care!

Yeah, Potatohead. I suppose your family has developed immunity now. You're ahead of the rest of us.

Is there a better description of the p2 instructions than the Rev B doc and spreadsheet?

Comments