How close is the propeller v2 ?

cgracey · 2010-02-23 19:09

The inescapable problem of doing C on the Propeller is that the whole culture of C is rooted in infinite-memory-model thinking. It's not a fit with the Propeller, as the execution memory comes in limited and isolated sections. To a C-minded person, the "problem" of memory was already solved long ago, but the Propeller will blow it back in their face in hot, sulferous chunks.

In making a C compiler, If you go small-model, you're limiting programs to be what would amount to tiny C sub-programs, which would be like clotheslining the typical C programmer. If you go large-model, you're not getting the raw MIPS that C thinking automatically supposes should be there (every C programmer knows that C is compiled and executes directly, and this is not debatable - any exception would indicate an arcane architectural flaw, for which there is absolutely no acceptable rationale). If you do some large/small combo, you've forced a fissure down the center of everyone's assumptions. There's no winning, and not because of the C language, but because of the voluminous baggage and presumption that is part-and-parcel of the typical C mindset.

With the next Prop, things will be bigger and higher-performance enough that Richard's ICC might be able create an adequate illusion of that all-inclusive five-star club med C resort that the C culturists can only picture themselves in. And I'm not talking about the few of you here who would accept a realistic Propeller C, but that huge sea of "customers" out there who could/would use the Propeller, if only it was C.

For whatever reason, for as long as we've been making tools (1989), C has been a contentious matter. When the BASIC Stamp came out, C programmers went out of their way to deride it because it wasn't C. Anything we've ever done that wasn't C got the same treatment from the C crowd. C rather equates to "Cranky" and "Chip on the shoulder" in my head. C is like Cigarettes, for people who must have them. There's no negotiating. I've programmed in C, myself, but didn't like the feel of it, at all. It makes it hard for me, personally, to carry any torch for it. That's just my opinion, though. The practical matter is that I happen to not be designing things that are very C-friendly.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Phil Pilgrim (PhiPi) · 2010-02-23 19:18

One big reason I've gravitated to Parallax products over the years is that I've never had to program in C to make effective use of them. I've looked at C, own a C book, and dabbled in NQC for the Lego Mindstorms. 'Can't C the attraction, really.

-Phil

Mike Green · 2010-02-23 19:34

I've done operating system coding in C, in Pascal, and in PL360 to name a few. Of those, both C and Pascal required using subsets of the language to ensure that the code produced would be relatively efficient and that no library routines or only select library routines that, in turn, were rewritten for efficiency and careful use of resources. PL360 didn't require this because it was an assembly language with a high-level syntax and all basic operations translated directly into individual instructions, something that wouldn't work on the Propeller because of the RISC design ... too hard to come up with a meaningful, useful high-level notation for all of the bits and pieces of a given instruction.

Roy Eltham · 2010-02-23 20:03

My job involves using C/C++ to program very large projects (think multi-million lines of code) as part of a team of many programmers. The clients are Windows/DirectX applications and the Servers are clusters and handle many thousands of simultanious users. I can't really imagine coding these things in anything other than C/C++ or similar high level languages. Anyway, needless to say I am quite comfortable coding in C/C++.

When I first started out on the Propeller I wanted to use C and went with ICC, but over time I learned PASM and Spin. I now I am using PASM/Spin exclusively for my Prop stuff, and I don't think I would want it any other way with the scale and performance of the Prop 1 chip. When the Prop 2 comes out, I may want to use C for some larger projects that the Prop 2 will be capable of, but most likely I'll stick with PASM/Spin for most things.

The main thing I miss when working on PASM code is being able to debug the code by stepping through instructions and reading memory and state. Like I do in Visual Studio at work. This would probably not be a very easy thing to support in the Prop/cog architecture without a lot of extra stuff being added to make all the cogs and hub run in lockstep when single stepping. Of course, we can get his when simulating things in a Prop emulator, but that doesn't let you debug things in circuit with all your sensors and such hooked up...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out the Propeller Wiki·and contribute if you can.

Dave Hein · 2010-02-23 22:38

heater said...
If you want C with byte codes use GCC and the Zog VM [noparse]:)[/noparse]

Dave Hein: Is that 2500 lines of C code for JTAG open for inspection? I'd love to compile it for Zog byte codes and see how it compares in size to your C to Spin translated version.

Heater,
The C code was posted in the thread at http://forums.parallax.com/showthread.php?p=872289·.· The source is in a zip file about half-way down the thread.
Dave

grahamreitz · 2010-02-24 00:04

Chip Gracey (Parallax) said...
The inescapable problem of doing C on the Propeller is that the whole culture of C is rooted in infinite-memory-model thinking. It's not a fit with the Propeller, as the execution memory comes in limited and isolated sections. To a C-minded person, the "problem" of memory was already solved long ago, but the Propeller will blow it back in their face in hot, sulferous chunks.

Hi Chip. The infinite-memory-model perspective is a function of the operating system. For a long time the upper limit of allocable memory in a Window's process has been 4GB. Even when the machine might only have 1GB of physical RAM installed. Every process running in the OS is provided this illusion. This is also similar on Linux and Unix based systems.

C as language says little about the underlying hardware memory model. That's why it works so well on 8-bit through n-bit systems on a variety of different architectures. C is even being successfully used on general purpose GPUs (see OpenCL, for example). We would expect this flexibility from a systems language.

I think what you are referring to is the C/C++/Java culture that has developed in the higher level consumer programming space, where development happens under a traditional operating system. Most of the C-fans here probably aren't a member of that club. We spend most of our time with devices without consumer OS'es on them and limited resources.

If there is little motivation for additional market segments by providing features for the C-language-fans, that's ok. The desire is to see the Prop2's unique architecture benefit the C/C++ community as well as its Spin counterpart. There is no intention to deride Spin and etc. Consider, throwing us a bone in the Prop2 for C language development?

Kindly,
graham

Post Edited (greitz) : 2/24/2010 3:52:01 AM GMT

Bill Henning · 2010-02-24 00:16

The new indexed hub memory access modes will make generating code for accessing stack frames and data structures MUCH easier and more efficient - I expect C compilers will benefit greatly due to much more efficient stack and data structure manipulation being possible for LMM.

greitz said...

Chip Gracey (Parallax) said...
The inescapable problem of doing C on the Propeller is that the whole culture of C is rooted in infinite-memory-model thinking. It's not a fit with the Propeller, as the execution memory comes in limited and isolated sections. To a C-minded person, the "problem" of memory was already solved long ago, but the Propeller will blow it back in their face in hot, sulferous chunks.

Hi Chip. The infinite-memory-model perspective is a function of the operating system. For a long time the upper limit of allocable memory in a Window's process has been 4GB. Even when the machine might only have 1GB of physical RAM installed. Every process running in the OS is provided this illusion. This is also similar on Linux and Unix based systems.

C as language says little about the underlying hardware memory model. That's why it works so well on 8-bit through n-bit systems on a variety of different architectures. C is even being successfully used on general purpose GPUs (see OpenCL, for example). We would expect this flexibility from a systems language.

I think what you are referring to is the C/C++/Java culture that has developed in the higher level consumer programming space, where development happens under a traditional operating system. Most of the C-fans here probably aren't a member of that club. We spend most of our time with devices without consumer OS'es on them and limited resources.

If there is little motivation for additional market segments by providing features for the C-language-fans, that's ok. The desire is see the Prop2's unique architecture benefit the C/C++ community as well. There is no intention to deride Spin and etc. Consider, throwing us a bone in the Prop2 for C language development?

Kindly,
graham

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system

RossH · 2010-02-24 00:35

Hi Chip,

You know the funny thing is that I share your opinion of C - before I started Catalina I hadn't programmed in C for years - most of my experience was in Pascal, then Modula-2, then C++ (yuck! - even worse than C) then Ada. But I saw the Propeller and immediately fell in love with what it promised (and delivers!) - and I decided I wanted to write programs for it. Perhaps this was a reaction to years of Ada programming. Ada is a fantastic language - but a highly abstract one, too far removed from the actual bits and bytes. Many Ada programmers spend years obsessing over the merits of various obscure aspects of Ada's implementation of inheritance or its tasking model - and never actually write any programs in it!

My first Propeller project was in SPIN, but when I found out just how slow it was I realized that this was not viable. My next project was in PASM - but I found out that while I could get the speed I needed I just couldn't fit the program in - not even if I had all 8 cogs dedicated to executing PASM directly. I toyed with the idea of a multi-prop solution but decided against it - mainly because even if I had enough cogs to throw at the task, writing 10,000 lines of code in any assembly language is simply not my idea of fun. Then I discovered Bill's work on LMM and realized there was an answer - but it would require a compiler that could generate LMM code (which at the time there wasn't).

So I found the simplest compiler to which sources are publicly available (which happened to be a C compiler) and got Catalina running in a couple of months - mainly because the Prop architecture is so flexible and PASM is so simple (but I still don't want to write 10,000 lines of it!).

Yes, the Prop is not very "C friendly" - but I can now write and run the programs I originally wanted to build when I bought my first Propeller. I also happen to think C will help the Prop go "mainstream" - but that's just an opinion, and not really relevant to why I use it.

I'm perfectly happy with the Prop I the way you designed it (apart from the missing B port, which should have been wired internally even if it was not available externally - grrr!) and I look forward to the Prop II being even better (fingers crossed you will include internal ports this time round - but if not, well - c'est la vie!).

I don't need the 5-star club med resort. I'm happy with a tent pitched by the beach. The only problem I have now is that Catalina takes so much of my time I don't have much left to just sit back and enjoy the view

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

evanh · 2010-02-24 00:45

greitz said...
... Every process running in the OS is provided this illusion. ...

... C as language says little about the underlying hardware memory model. That's why it works so well on 8-bit through n-bit systems on a variety of different architectures. C is even being successfully used on general purpose GPUs (see OpenCL, for example). We would expect this flexibility from a systems language.

I think what you are referring to is the C/C++/Java culture that has developed in the higher level consumer programming space, where development happens under a traditional operating system. ...

And why does the illusion exist? After all, using more ram than exists is just asking for trouble. I think chip is spot on, that it's always been a mentality. And it certainly predates C++ et-al.

And Mike has hit another big nail square on the head - standard library code is notorious for bloating and causing subsequent bloat. This is where C++ has powered ahead of plain C but certainly is not alone in it's exceptional fallout.

Funnily enough, those two points are probably more intertwined than first appears.

RossH · 2010-02-24 01:08

Roy Eltham said...

The main thing I miss when working on PASM code is being able to debug the code by stepping through instructions and reading memory and state. Like I do in Visual Studio at work. This would probably not be a very easy thing to support in the Prop/cog architecture without a lot of extra stuff being added to make all the cogs and hub run in lockstep when single stepping. Of course, we can get his when simulating things in a Prop emulator, but that doesn't let you debug things in circuit with all your sensors and such hooked up...

Roy, there are several programs that allow this. One such is POD - it works a treat for a single cog. I actually incorporated a version into Catalina and it has helped me many times to debug the Catalina kernel (which is written in PASM), and single step through the code that Catalina generates. There are also other solutions that support multiple cogs. I also think Viewport can do what you want (although I've not used it).

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

heater · 2010-02-24 02:35

evanh: "And why does the illusion exist?"

Because when they created C and Unix they had very small computers and wanted to run bigger programs.
Hence Virtual Memory", swap space etc. Seems the demand to do that has not changed over the years in the world of general purpose computers and their operating systems. It is not C's fault as such.

evanh: "After all, using more ram than exists is just asking for trouble."
In a real-time embedded control system almost certainly yes !
For a word processor on my desktop, perhaps I'd rather have it continue slowly, whilst page swapping, than just crash and burn when memory is exhausted. Or have it refuse to make a bigger document for me until I've installed more RAM.

The C language should not get the blame for memory bloat or the infinite memory model mind set. Don't forget the machines it was original created for were of the order of capacity as the Prop, speed and space efficiency were important.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Beanie2k · 2010-02-24 05:38

Well, I beg to differ with the view that C is based on an infinite memory model. Current implementations are, but anyone who wrote C code back in the days of DOS can tell you horror stories of tiny, compact, small, medium, large and HUGE memory models, each with their own limitations, and all of them bad.

My own story on the Prop and its limitations: I was looking for an FPGA-type chip when I ran across the Prop, and saw that it was a sort of "poor man's FPGA", in that you could have up to 8 processes running entirely independent of each other, and all for a reasonable price. This seemed like what I needed for some projects which had very strict timing requirements, and even X86 under DOS, let alone Windows and Linux, was just too jittery. I had used 8051-based chips in the past, which are easy to program in ASM (never used C on them, too abstract from the hardware), but as a single core processor, these are limited to a single thread unless you use some form of time-sharing, and its attendant jitters. Well, I became disappointed with the Prop for the following reasons:

1: Spin is just too slow for my applications, which involve real time processing of serial data streams in the tens to hundreds of kb/s.

2: ASM would be fast enough, but you are limited to 496 words of code max, and the instruction set is definitely reduced, especially with no bit level instructions. i.e. you can't just do "jb P5.3, here" to get the IP to jump to location "here" if pin 3 on port 5 goes high. You have to write a fairly lengthy procedure using masks, etc. With the 496 word instruction set, this really becomes a limitation.

3: Also there is the issue of self-modifying code. My memories of SMC were of the mid-1980's when it was used as a hack by show-off geeks for the sole purpose of squeezing every last byte out of a program; programs which broke under the 80286 and were thoroughly demolished·by the 80386. I really don't want to go down that path.

One area where the Prop definitely shines is in video, so I thought I would use it with a small NTSC composite display as a cool sort of graphics display for the project, along with·some massively mathematical data crunching, and leave the fast real-time bit stream parsing to something else. Well, after I loaded the TV driver and the graphics package, I was out of hub RAM, with no place to put the data crunching code! In other words, it's been one frustration after another, with a chip that *almost* does what I want.

I do think the Prop has potential though, with 3 improvements. 1) Increase the COG ram to at least 16K words, even if it breaks backwards compatibility. This will allow decent sized programs to run within a COG, giving truly excellent performance. 2) Add some bit level instructions. A microcontroller really REALLY needs them. 3) Lose the self modifying code. Again, even if it breaks compatibility.

In light of the above, I guess it's pretty obvious the Prop just isn't for me. But who knows? Maybe some day I'll find a use for it.

Take care.

Post Edited (Beanie2k) : 2/24/2010 5:44:42 AM GMT

Mike Green · 2010-02-24 07:12

It may be that the Prop isn't for you, but it isn't because of a shortage of bit level instructions. The Prop has a barrel shifter which means that any shift takes the same amount of time (4 cycles). There isn't a single instruction "bit test and jump", but that can be easily done with two instructions (TEST and JMP) and, in some cases where the conditional execution bits can be used, you don't need the jump. You could have conditional subroutine calls with two instructions (TEST and CALL).

There is no way you'll get more cog RAM, certainly not 16K words. Not only would that require a completely different instruction set, but there's no room for that much memory on any kind of decent sized chip.

Bitmap graphics of any size is just not doable with the on-chip memory available. Remember that a 640 x 480 pixel display is 300K pixels. That's already an order of magnitude memory more than what's available when we're talking about 8 bits per pixel, not anywhere near 24-bit.

Lastly, self-modifying code may be intellectually messy, but in practice, it's pretty easy. Because of the limited memory available, you're really better off in many cases having buffers in hub memory and accessing them with RDxxxx/WRxxxx instructions where you can use computed addresses quite easily.

The Propeller really was designed to have fairly straightforward high speed, heavily time dependent coding done in assembly language and complex, messy, control code done in Spin where there's more memory available directly and code is more compact with more facilities (like a stack).

evanh · 2010-02-24 11:24

heater said...
evanh: "And why does the illusion exist?"

Because when they created C and Unix they had very small computers and wanted to run bigger programs.
Hence Virtual Memory", swap space etc. Seems the demand to do that has not changed over the years in the world of general purpose computers and their operating systems. It is not C's fault as such.

No one is blaming C. It's the mentality of unlimited that is to blame. And, yes, unlimited is the mentality Mr Beanie! Even in DOS days, there was bloated apps that ran incredibly slowly and that was when they didn't even have the option of swapping.

heater said...
evanh: "After all, using more ram than exists is just asking for trouble."
In a real-time embedded control system almost certainly yes !
For a word processor on my desktop, perhaps I'd rather have it continue slowly, whilst page swapping, than just crash and burn when memory is exhausted. Or have it refuse to make a bigger document for me until I've installed more RAM.

Well, that's either buggy code or dumb code. If a document is treated as fitting in larger space than really exists then the program will inevitably start thrashing. That's dumb.

heater said...
The C language should not get the blame for memory bloat or the infinite memory model mind set. Don't forget the machines it was original created for were of the order of capacity as the Prop, speed and space efficiency were important.

Again, it's not the language that is being blamed here. It's the coders and system designers.

Swapping is not meant for one program to abuse, it's meant for the OS to manage a multitasking multiuser environment where there is likely to be lots of programs that are sitting idle.

evanh · 2010-02-24 11:53

As for the future:

I don't see a need for architectural changes within the Cogs other than some new instructions - particularly to support LMM and further virtual machine speed ups. The virtual machine support may eventually even reach to Java level. Not that I think much of Java at this stage but maybe I've been tainted by all that unlimited bloating ... ;P There is no need to compile to native PASM for anything beyond I/O handling and tight streaming functions.

More Hub ram is obvious, and any Prop3 will have masses of it. 16 Cogs still sounds viable to me. Chip has described how he's been getting impressive Hub<->Cogs throughput all at once.

For I/O the serializer/deserializer (Existing video circuitry get revamped for this?) will be the big winner after the ADCs/DACs. Don't need much else. Level shifting would be nice.

Graham Stabler · 2010-02-24 11:58

I may be completely off the mark because to be honest I don't understand at least every other thing being said here but it seems that everyone has jumped on Chip's memory model dig and ignored subsequent sentence which I think was the real point he was making.

Infinite memory model meaning simply that within reason you know you have one large chunk of memory to use, some programs will use it more wisely than others but that is not the point when you are comparing it to the propeller and it's cog based system.

Is the C mindset also that within multi-core programming you should just be able to just write code and the compiler will sort out the details of using multiple processors?

Graham

heater · 2010-02-24 12:12

evanh: I agree but:

"No one is blaming C." - Yes they are, Chip said "The inescapable problem of doing C on the Propeller is that the whole culture of C is rooted in infinite-memory-model thinking"

I was interpreting that as clearly linking C and "infinite memory". I was just pointing out that it is not quite so. Many people are using C on 8 bit PICs and AVR's for tiny little programs they'd like to throw together as quickly as possible.

Anyway I think we agree here.

"Swapping is not meant for one program to abuse,"

I agree. But I would say that it makes no difference if it is one user and one program that is not fitting in the space available or many users and many programs. As far as the user is concerned the result is the same, eventually there is a catastrophic failure as RAM runs out or the system suddenly refuses to do what you want for the lack of space to work in. Virtual Memory at least provide for more "gradual degradation" in that situation.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

evanh · 2010-02-24 12:13

As a virtual machine (Spin and LMM for examples) the Hub ram becomes main memory. Cog ram vanishes to become registers at best. So, you now have flat Hub-space addressing. Prop2 is extending Hub addresses from 16 bit to 32 bit I believe.

As for making using of multiple cores at once, the best approach has always been "like a production line". You need more than just the compiler to sort it out, it's a whole model, not just a language.

evanh · 2010-02-24 12:16

heater said...
As far as the user is concerned the result is the same, eventually there is a catastrophic failure as RAM runs out or the system suddenly refuses to do what you want for the lack of space to work in. Virtual Memory at least provide for more "gradual degradation" in that situation.

It should never be catastrophic. Only the buggy programs fail on low memory.

As far as the user is concerned, thrashing *is* catastrophic!

Rayman · 2010-02-24 15:43

Wow, this thread is outa control!· Don't want to miss my chance to pile on though... [noparse]:)[/noparse]

Mike Green said...
Well, the LPC1114 is pretty cheap as are most commodity microcontrollers. With
I understand, but I'm still amazed at these discussions. The Propeller is a very different concept from any of the "competition". While the ARM Cortex-M0 is nice, it's not very different from PICs, AVRs, etc. in that you have a processor core (with flash memory and SRAM) surrounded by special purpose peripheral processors. There are different processor cores, each with its own advantages and disadvantages, but they're not very different from each other.

I think the Cortex is a whole different class of MCU...· The·Propeller is perfect for small applications that either need TV/VGA output and/or time-critical control (which I suppose TV/VGA is a subset of).· But, I wouldn't want to make a PDA with a Propeller...· The ARM chips are 1000X better for that.·

Also, regarding C.· I think it's not a big deal since existing C codebase is mostly useless for Propeller apps.· But, with Prop 2, I think the situation will be very different and it will be very nice to use existing C code for things like png/jpg images and mp3 audio and so forth.

Also, I'm not sure I agree to the points about multiple small cores being an issue for C.· Somehow NVidia made a C library for CUDA, which uses the hundreds of little cores in my GPU to speed up apps...· So, I think multiple cores can be handled by C.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm

My Prop Products:· http://www.rayslogic.com/Propeller/Products/Products.htm

kwinn · 2010-02-24 16:16

@ Rayman, re "I'm not sure I agree to the points about multiple small cores being an issue for C". You are probably correct that C can be used for multiple small cores since it was initially designed to be an assembly language replacement, but that does not necessarily make it the best choice for the propeller approach. As far as I understand the GPU architecture they are basically a "pipeline" of small cores that operate simultaneously and pass the data from one to the next whereas the Prop is a parallel architecture with a somewhat limited core to core communications capability.

It is simpler to use a sequential execution language like C on multiple cores that process data and pass it down the pipe than it would be to use it on truly parallel cores with limited core to core communications. Not saying it can't be done. The standard C library has provided a great deal of power and flexibility to the language and can probably be adapted to the Propeller (1 and 2). Being able to use existing C code for png/jpg and mp3 as you suggest would be nice, but I would also like to see something with a better fit as well.

Graham Stabler · 2010-02-24 16:23

Rayman, Cortex is different in performance but similar in operation, it would be hard to deny that the propeller is very different in operation to PICS/AVRS which is perhaps why C is not the best fit. But will all future micro development be limited to making it nice for C? In 50 years will is still all be C?

You access Cuda based GPUs via APIs/libraries which seems different to running native code on it. You can also use Fortran, Java, Python, and the Microsoft .NET Framework.

Graham

Dave Hein · 2010-02-24 17:17

I view the COG memory as 512 32-bit registers.· The COG's processor basically executes instructions only from it's registers.· In theory, the processor could be extended to directly execute from HUB memory as well.· The program counter would need to be increased from 9 bits to 13 bits or more.· Addresses below 512 would refer to the COG memory, and addresses of 512 and above would be interpreted as HUB addresses.· HUB word addresses less than 512 (2048 bytes) would only be used for data, and not instructions.· The instruction fetcher would have to be modified to essentially do a rdlong into the instruction register.· This should allow the cog to execute a hub instruction every 16 cycles.

The program counter could be increased by adding a bank register.· Anybody who's programmed the SX knows this is messy, but it works.· Long jumps can be done by setting the bank register with the upper bits of the target address, and then jumping to the address in the lower bits.· In addition, special jump instructions could be added that use both the source1 and source2 address fields to provide up to 18 bits for the jump address.

Is it too late to get this into the Prop II?

Bill Henning · 2010-02-24 17:48

Prop II reduces the hub window to 8 cycles, and can handle 6 pasm instructions (in addition to RDLONG or WRLONG) in each window... so there is no need for the execute mode, especially with the indexed autoincrement code!

I don't know exactly what syntax pasm will use, so the following example is sure to change, but here is an 8 cycle LMM inner engine:

next     rdlong inst,[noparse][[/noparse]pc]++
           nop  ' could be used to count execution or something - spare delay slot to allow for executing inst
inst      nop
           jmp #next

So any simple LMM instruction would execute in 8 cycles - 20Mips - meaning LMM on Prop2 will be as fast as PASM on Prop1 for simple instructions.

A long jump would also execute in 16 cycles (rdlong pc,pc)

A long call would take 24 cycles with a hub based stack, and I suspect 16 cycles with a cog based stack (maybe in the 128 extra fifo longs) depending on how INDIR works.

fcall  wrlong pc, --[noparse][[/noparse]sp]  
        rdlong addr,newaddr
        jmp #next

A far return would take 16 cycles with a hub based stack, 8 with a cog based stack.

Basically with what Chip has disclosed about the Prop2, I've started a "paper" design for LMM2 - and it will be FAST, even without the ideal instruction for LMM (described below):

next  EXEC inst,[noparse][[/noparse]pc]++
        nop ' delay slot, gets executed BEFORE inst
inst   nop
        ' no jump, because EXEC loops implicitly after executing inst (unless INST does a jump or call)

Exec would basically give us a "free" Pasm instruction to execute

Now if every fetch EXEC did was a double long, but only incremented PC by one long, it would allow for 8 cycle FJMP, FCALL, and load constant. Combined with using the FIFO 128 longs for a return stack, it would allow for a sustained 20MIPS, including far jumps, calls, 32 bit constant loads, and returns.

next  DEXEC inst,[noparse][[/noparse]pc]++  ' double-long fetching EXEC
        nop ' delay slot, gets executed BEFORE inst
inst   nop  ' instruction placed here by double-long fetch
c32   nop  ' constant 32 placed here by double-long fetch, can be ignored if not needed, or can be 32 bit data or address
        ' no jump, because EXEC loops implicitly after executing inst (unless INST does a jump or call)

No need for an FJMP kernel primitive, because FJMP becomes:

mov pc,c32

No need for an FRET kernel primitive, because it becomes

mov pc,INDIR--

FCALL is still needed, but fits within 8 cycle window:

mov ++INDIR,pc
mov pc,c32
jmp #next

Basically, DEXEC as described would allow a fully deterministic 20MIPS 8 cycle per instruction LMM mode.

If a return stack larger than 128 levels is needed, add 8 cycles to FCALL and FRET - a slight slowdown, but again, still deterministic.

Dave Hein said...
I view the COG memory as 512 32-bit registers. The COG's processor basically executes instructions only from it's registers. In theory, the processor could be extended to directly execute from HUB memory as well. The program counter would need to be increased from 9 bits to 13 bits or more. Addresses below 512 would refer to the COG memory, and addresses of 512 and above would be interpreted as HUB addresses. HUB word addresses less than 512 (2048 bytes) would only be used for data, and not instructions. The instruction fetcher would have to be modified to essentially do a rdlong into the instruction register. This should allow the cog to execute a hub instruction every 16 cycles.

The program counter could be increased by adding a bank register. Anybody who's programmed the SX knows this is messy, but it works. Long jumps can be done by setting the bank register with the upper bits of the target address, and then jumping to the address in the lower bits. In addition, special jump instructions could be added that use both the source1 and source2 address fields to provide up to 18 bits for the jump address.

Is it too late to get this into the Prop II?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system

Post Edited (Bill Henning) : 2/24/2010 6:01:52 PM GMT

Dave Hein · 2010-02-24 18:54

Bill,

That sounds great!· You have obviously looked into this with a lot of detail.· With the ability to execute 6 instructions for every hub read it would be tempting to implement psuedo-ops.· However, it would be hard to decode and execute a psuedo-op in such a small loop.· It might be possible to build a psuedo-op interpreter that has the hub loads interspersed in it.

As an example, a 32-bit multiply can be implemented with 4 16-bit multply instructions and some shifts and adds.· It might be possible to implement a 32-bit multiply psuedo-op in the same amount of time it takes to do other simple operations.· There are several DSP-type instructions that could be implemented using psuedo-ops.· This could make an LMM implementation almost as fast as a PASM implementation for certain DSP functions.

Dave

Bill Henning · 2010-02-24 19:27

Dave,

Thanks... I've been following every post having to do with PropII from the beginning, making suggestions as time goes by... Chip liked some of them, and did not like some others. I've been designing LMM2 from the beginning, adjusting the design as details are released.

There is no need for a pseudo-op interpreter, just a regular pasm "jmp #pseudoop", with the pseudio op terminating with "jmp #next", or perhaps "jmp pseudo-vector" if we want to allow position-independent (in the cog) pseudo-ops, providing some independence from fixed pseudo op addresses in exchange for a jump table.

Pseudo ops that only take three instructions to implement (in addition to jmp#next) would fit in the 8 cycle window.

I do fear that it is too late for Chip to implement DEXEC, or that it won't fit his vision - or it may be too difficult to implement - even though it would allow deterministic 20MIPS LMM, and would make C and other conventional languages quite viable on the Prop2.

Bill

Dave Hein said...
Bill,

That sounds great! You have obviously looked into this with a lot of detail. With the ability to execute 6 instructions for every hub read it would be tempting to implement psuedo-ops. However, it would be hard to decode and execute a psuedo-op in such a small loop. It might be possible to build a psuedo-op interpreter that has the hub loads interspersed in it.

As an example, a 32-bit multiply can be implemented with 4 16-bit multply instructions and some shifts and adds. It might be possible to implement a 32-bit multiply psuedo-op in the same amount of time it takes to do other simple operations. There are several DSP-type instructions that could be implemented using psuedo-ops. This could make an LMM implementation almost as fast as a PASM implementation for certain DSP functions.

Dave

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system

jazzed · 2010-02-24 19:57

Bill Henning said...
...
I don't know exactly what syntax pasm will use, so the following example is sure to change, but here is an 8 cycle LMM inner engine:
next     rdlong inst,[noparse][[/noparse]pc]++
           nop  ' could be used to count execution or something - spare delay slot to allow for executing inst
inst      nop
           jmp #next

What other possibilities does your post-increment imply? Are you suggesting that [noparse][[/noparse]pc++] would change the address referenced by pc rather than the data [noparse][[/noparse]pc]++ ? Is that possible without an extra instruction? Would the current convention work? I.e. #pc++ and pc++

Assuming Propeller II uses a 4 port RAM and executes 1 instruction/per clock in most cases (no pipeline), why would we need an instruction slot delay for a pipeline?

Bill Henning · 2010-02-24 20:11

As I said, I was guessing at the syntax [noparse]:)[/noparse]

Here is the C equivalent:

long *pc;

inst *pc++

I am assuming that due to pipelining and the way the cog memory works on Prop1 that a delay slot will still be required for "self modifying code".

jazzed said...
Bill Henning said...
...
I don't know exactly what syntax pasm will use, so the following example is sure to change, but here is an 8 cycle LMM inner engine:
next     rdlong inst,[noparse][[/noparse]pc]++
           nop  ' could be used to count execution or something - spare delay slot to allow for executing inst
inst      nop
           jmp #next
What other possibilities does your post-increment imply? Are you suggesting that [noparse][[/noparse]pc++] would change the address referenced by pc rather than the data [noparse][[/noparse]pc]++ ? Is that possible without an extra instruction? Would the current convention work? I.e. #pc++ and pc++

Assuming Propeller II uses a 4 port RAM and executes 1 instruction/per clock in most cases (no pipeline), why would we need an instruction slot delay for a pipeline?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system

jazzed · 2010-02-24 20:24

Bill Henning said...
As I said, I was guessing at the syntax [noparse]:)[/noparse]

Of course [noparse]:)[/noparse] All we can do is guess [noparse]:)[/noparse] Yes, C *pc++ is the same as [noparse][[/noparse]pc++] in Spin assuming a byte pointer.
The question to me is why would you need [noparse][[/noparse]pc]++? If [noparse][[/noparse]pc++] is not possible, does [noparse][[/noparse]pc]++ makes sense?
I've not seen you *accidentally* post anything before [noparse]:)[/noparse]

Bill Henning · 2010-02-24 20:27

Actually, I did mean [noparse][[/noparse]pc++] ... I blame a lack of coffee [noparse]:)[/noparse]

jazzed said...

Bill Henning said...
As I said, I was guessing at the syntax [noparse]:)[/noparse]

Of course [noparse]:)[/noparse] All we can do is guess [noparse]:)[/noparse] Yes, C *pc++ is the same as [noparse][[/noparse]pc++] in Spin assuming a byte pointer.
The question to me is why would you need [noparse][[/noparse]pc]++? If [noparse][[/noparse]pc++] is not possible, does [noparse][[/noparse]pc]++ makes sense?
I've not seen you *accidentally* post anything before [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system

How close is the propeller v2 ?

Comments