De-Lurk: My P2 Project Plans:

potatohead · 2018-11-18 03:14

Red, cog addresses are in longs. No byte, or word addresses.

Think registers, enumerated.

Hub is byte addressed. No alignment needed.

__red__ · 2018-11-18 03:19

Sure, but in the example given, when I decode A, I get -36... not the (expected) -8

It should be -8 longs... so, if I divide -36 by 4 I get -9. Adding 1 for the expected PC increment gives me -8.

Is that the correct logic to get to the instruction JMP #$-8 ?

potatohead · 2018-11-19 21:27

Red, I have not found the discussion, but am pretty sure bytes are correct. The COG and HUB computations are done as each memory space needs.

The intent is code that may run from either memory space. Bytes are the least common denominator.

__red__ · 2018-11-19 22:06

Thanks, oddly that made far more sense to me that having the documentation say something like: "the PC increments +4 in hub and +1 in COG mode" in one place, and "addresses in Cog are bitshifted twice before stored".

I guess I would have found it easier to parse the idea that PC is always +4 wherever but your cogs must always be 4 aligned. I'm guessing we did it the other way to ensure that the compiler never lets a programmer do anything programmatically that could end up out of alignment.

Of course, my logic that surrounds:
0x000 -> 0x1FF (Cog)
0x200 -> 0x3FF (Lut)
> 0x400 (Hub)

has to be wrong too now right? If I'm examining the value actually encoded in the instruction the ranges are really all multiply by 4.

This is all starting to take some shape now.

In theory I have 300-400 opcodes to implement in an emulator and I'm looking for an ordering to tackle them.

Any suggestions?

Part of me is toying with the idea of implementing instructions as they appear in another pieces of software (like, I dunno - the spin2 interpreter). There's something quite meta about implementing the instructions needed to emulate an interpreter within a virtual machine on a foreign processor.

It's perverse enough that it might just work.

(Plus, hopefully it will also help me better understand how Chip's spin2 interpreter code works which should help me when I start writing my own implementation of the erlang VM for the P2).

I'm currently plotting dedicating one of the cogs to manage memory and having 7 of the others run as erlang schedulers.

It may not excel at doing "large-blob volume" data-processing but for dealing with streaming data and having the ability to scale CPUs at the price of a P2 for 7 cores... it will SCREAM.

(I'm serious btw as to suggestions for the order of which instructions to write emulation for. I'm guessing I probably shouldn't write how each instruction is implemented with the same level of verbosity thus far or it'll never get finished).

__red__ · 2018-11-19 22:11

Regarding Hardware:

I'm nearly done on my first revision of the hardware console for the P2 minicomputer. I'm pretty sure I'm going to have a P1 run the console.

The basic idea being that the debug/instrumentation routines in the P2 will stream the data to the P1 and the P1 will render it to the LEDs, take inputs from switches etc etc... all via some kind of protocol as yet to be defined.

The other advantage of making the separation there as opposed to having the P2 do that, is that that means that my emulator will have the exact same debugging interface so that I can build a simulated console that works the exact same way.

So - perhaps serial smartpin operations so we can read/write "hello world" should be next on the list of instructions to implement.

I just need to find some sample code to play with as I don't think smartpins have been implemented in spinsim yet.

potatohead · 2018-11-20 01:36

1 it is done the way it is with COG addressing, so thate addresses can occupy 9 bits nice and neat. P1 works that way, has no ability to run code from HUB.

A little virtual machine called lmm mode, was used to fetch and execute instructions from The Hub, again on P1.

potatohead · 2018-11-20 01:41

I saw Erlang for the P2. Thought that was where you were going. I think that's a very interesting project. And given my limited understanding now, a very good match, aside from the small HUB RAM.

The optimizer, when you get there, is going to really need to optimize.

__red__ · 2018-11-20 01:45

potatohead wrote: »

I saw her laying the M4 the P2.

M4 for the P2? Wassat?

potatohead · 2018-11-20 01:46

Google Voice got that one. I've made edits

__red__ · 2018-11-20 02:18

potatohead wrote: »

Google Voice got that one. I've made edits

Gotcha and yes, that is my destination.

I've been spending a significant amount of time in "the beam book" which goes into amazing levels of detail as to how it all fits together in the "open source implementation of the erlang virtual machine" - all the way down to how the compilers work, how the datastructures are stored in memory, even down to individual bitfields. It's truly amazing.

One of the main questions I'm asking is how much of that infrastructure do I want to move over and does it make sense to do it the same way?

In some cases yes, in others no.

Arguably, it would be easier to implement the Erlang VM on the P2 than the P2 on the Erlang VM as there are only 140 or so instructions to emulate :-P

potatohead · 2018-11-20 20:51

I think that looks like a really neat project.

Will be interesting to see how the COGS and HUB work in a functional paradigm.

Re: puzzles

Would a random number function not just be to fetch a number from the source? /dev/random, or whatever source makes sense?

Seems to qualify on all aspects, given "always returns same output" means a random number, just not a specific one, though a PRNG would do that given a seed, but would also fail, or need to keep state should it be required to give another random number with the same seed.

In that case, a random number is needed more than strict compliance is.

Is the point of that one to highlight where some lines are drawn? Ie: no function is possible, so next best thing is done?

Not thinking about the others yet. But, I did see something odd that made me think, and it was a dumb ATM machine.

Most of the time, you give one your card, input things and other things happen. Deposit, get cash, get info. On smarter ones (and I am using that term to differentiate here mostly), you can do multiple things. It maintains a state. You essentially have to tell it you are done, or it has to behave that way when some conditions are met.

This dumb one got the card and would either do a single thing, or exit. In either case, the only option is to start again, as if nothing had ever happened.

No state in action! Each transaction is atomic, and it either completes, or it does not.

And the sequence is the same each time. The machine has no real concept of a user. Just does transactions, and it is anal about it too, even returning card prior to the actual transaction.

They made sure there can never be any pending state. And I am sure that machine keeps nothing. The screen even blinks as if it restarted each time. It may actually be doing that for all I know.

Thought that behavior looked familiar. Was thought provoking.

__red__ · 2018-11-20 21:42

potatohead wrote: »

Would a random number function not just be to fetch a number from the source? /dev/random, or whatever source makes sense?

Pretty much.

You've "wrapped" the actual random number in a function and that function looks the same regardless of what the actual value of that number is.

So, that function can populate its way through your program, only ever being "unwrapped" at the very last second if it is ever needed.

For example:

z = x + y;
x = 3;
y = randomInt(100)

... the ordering didn't matter.

If we just ran:

print(x);

... then the randomInt function would never have been executed.

To add more fun into the mix:

finalresult = wibble - y
wibble = (z * 3) - (2 * y);

print finalresult;

the **compiler** can do the full algebraic thing:

finalresult = wibble - y;
            = ((z * 3) - (2 * y)) - y;
            = ((x + y) * 3) - (2 * y) - y;
            = (3 * x) + (3 * y) - (2 * y) - y;
            = (3 * x) + (3 * y) - (3 * y);
            = 3 * x;

(if my algebra actually works)

The idea being that as the compiler knows that the randomInt function will always return an int, it can follow all the "int" rules during optimization - including ones that may result in the value being completely cancelled out.

Sure, this is a convoluted example, but you get the idea.

Most of the time, you give one your card, input things and other things happen. Deposit, get cash, get info. On smarter ones (and I am using that term to differentiate here mostly), you can do multiple things. It maintains a state. You essentially have to tell it you are done, or it has to behave that way when some conditions are met.

This dumb one got the card and would either do a single thing, or exit. In either case, the only option is to start again, as if nothing had ever happened.

No state in action! Each transaction is atomic, and it either completes, or it does not.

And the sequence is the same each time. The machine has no real concept of a user. Just does transactions, and it is anal about it too, even returning card prior to the actual transaction.

They made sure there can never be any pending state. And I am sure that machine keeps nothing. The screen even blinks as if it restarted each time. It may actually be doing that for all I know.

Thought that behavior looked familiar. Was thought provoking.

Glorious isn't it?

There's a project called "erlang on xen" which spawns a unique Xen VM for every web request. You read that right. Not a unique VM per user, or a unique VM per session... but a unique VM per web request.

Imagine if that was internet banking. The URL indicates the full extent of the data that should be available in that transaction which means that the VM creation can be created with ONLY that data in that VM **as a constant**.

You can't attack static data. You can't cross security domains to another user if the VM has zero access to any database layer. Not only is that the ultimate in elastic VMs, but garbage collection is your VM being destroyed ;-)

potatohead · 2018-11-20 22:27

Very interesting. Not sure about glorious, due to a dumb machine I know can be more capable.

But, from an infosec point of view? Ya!

The idea of spinning up VM's?

First, one could be setup in a provable, known start state. Just copy a blob and go. When it dies, the whole memory region returns to the free pool. Could be super quick, depending on the hardware.

Maybe just initalizing one can get that quick. No blob.

Some of us talked about using P2 COGS that way. One can either point a COG at a memory region and run it HUBEXEC style, or load the COG RAM first and then run it that way.

I did not see any of this earlier, but it appears the P2 may be just as much of a playground for these kinds of things as it is one for signals and real time display, input.

Cool beans. I am thinking about this stuff now. A better start than anything prior.

Electrodude · 2018-11-20 22:38

You don't even have to copy the whole VM image thanks to copy-on-write and virtual memory and all that.

potatohead · 2018-11-21 01:40

You don't think all these funky page table and timing exploitz are a factor when doing that?

__red__ · 2018-11-21 12:50

potatohead wrote: »

You don't think all these funky page table and timing exploitz are a factor when doing that?

No, because the further away your bug is from your user input the harder it is to get malicious data to where it needs to be to exploit.

Or to put it another way, a private exploit that close to metal that can be triggered by user input is not going to be used by an attacker against any system that I'm building...

If it's public, other people will likely be targets before any of my previous employers

potatohead · 2018-11-21 19:06

Totally, I was speaking to exploiting the VM. That data can be mutated.

__red__ · 2018-11-21 19:34

potatohead wrote: »

Totally, I was speaking to exploiting the VM. That data can be mutated.

Sure, but you can only potentially get at the data that's already pre-loaded... and anything you exploit is gone at the end of that specific web TCP session.

potatohead · 2018-11-21 22:07

Totally. I'm not saying it's very vulnerable. But I was referring to though is virtual memory and copy on write type functions. That could leave things open to corrupt the virtual machine itself, and it could be persistent depending upon where the attack was.

As an attacker, I'm way less interested in the data than I am running some code. My code.

__red__ · 2018-11-21 22:50

potatohead wrote: »

As an attacker, I'm way less interested in the data than I am running some code. My code.

As an attacker, I'm the exact opposite :-) I'm guessing we're in different fields.

potatohead · 2018-11-21 22:56

Not at all. It's different goals. Red team is red team man. ;D

An awful lot of people just want the data. They want to corrupt it, exploit it, or cultivate capture it for sale or use.

Others want control for various reasons. Like to lock things up with encryption, shut systems down, repurpose them, that kind of thing. Of course that also gets at the data.

I tend to be in the control camp, that's just where my personal focus has been. I don't do any of that professionally, but I have cracked a few things in my time. Truly for educational purposes of course.

Functional is relatively New Concept to me. Kind of ignored it over the years, because it was obtuse. This discussion really opened up quite a few lines of thinking in my mind. I really appreciate that. It's going to be fun watching you build this thing.

A lot of thoughts about states too. Often I struggle with that. I definitely see some ways where I don't have to do that, and I don't necessarily have to be all that functional either. That's high-value.

Sidebar, why does Google Voice randomly capitalize things? I swear they're trying to fingerprint everybody as they use this thing.

__red__ · 2018-11-22 04:37

Another post: Emulating Debugging:

https://evil.red/posts/emulating-debugging/

Overview of the plan to get the same debugging in emulator as in silicon.
Implementing async serial tx framework.

I'm starting to paint in much broader strokes...

potatohead · 2018-11-22 18:36

Interestingly, a simple bitstream messaging protocol is how the COG communicates with a Smart Pin.

__red__ · 2018-11-22 20:50

I toyed with the idea of asking whether the cogs directly connect to the smartpins or go via the hub.

I figured that it was unlikely given the amount of physical routing that would be required as well as the permissioning issues.

In the emulator it would be easier if course to go directly as well as more performant... But if two cogs write to the same smartpin those requests will be serialized by something (probably the hub) so I figured it would be most likely to fail the same way.

I'm having a blast writing the emulator, the blogging is slowing me down a lot but it's interesting for me to look back on my thought processes...

evanh · 2018-11-24 13:42

potatohead wrote: »

Interestingly, a simple bitstream messaging protocol is how the COG communicates with a Smart Pin.

That progressively became fully 32-bit parallel in the end. I'm worried it may have contributed to the reduction in total cogs. The number of buses in the Propeller 2 is almost obnoxious. The "eggbeater" makes it look worse than it is I guess.

evanh · 2018-11-24 13:48

Red,
I think I've discovered another detail about addressing modes. PC-relative addressing appears to be always byte based as was pointed out not so long ago.

However, this is not true of absolute addressing. Absolute mode is as everyone has said all along, namely longword based within the cogs and byte based throughout hubram.

See https://forums.parallax.com/discussion/comment/1454799/#Comment_1454799

potatohead · 2018-11-25 06:10

Didn't I read that relative *could* be used, just that it was not intended or going to be put into the assembler?

(Not that it should be, I just remember us having that conversation)

evanh · 2018-11-25 09:12

For branching, absolute is used only for crossing between hubram and cog space. PC-relative is used otherwise. Because LOC uses the code addressing calculation, to make it useful it always needs a qualifier to tell Pnut to encode an absolute address. #\@label for hub addresses, and #\label for cog addresses.

See https://forums.parallax.com/discussion/comment/1454805/#Comment_1454805

De-Lurk: My P2 Project Plans:

Comments