We'd have to put flops in to separate the hub timing from the instruction timing. It could easily create a critical path, since those flops would have to be mux'd with the live circuits which are already in the critical-path zone.
Everything able to boot can boot from SD these days, so if the PII can boot but not from SD it would be an exception that could possibly keep away some interest. Maybe I misunderstood but when I read Cluso99's suggestion it didn't seem to need any hiding code in secret places or anything - what's needed is something with only rudimental FAT filesystem knowledge that can drag in enough from the card to get the rest going (aka bootstrapping..;))
Or even not that. The good old LiLo (Linux Loader) is only a few bytes for example, it doesn't know anything about filesystems. Instead it uses a staged bootstrapping process. The so-called 'master boot record' (MBR) of the storage medium has room for a small bootloader. If that is the LiLo loader then it has been stored there together with some sector numbers. The first-stage bootloader reads the second-stage bootloader from those sectors (or enough of it so that it can read the rest by itself). There's also a 'map' file somewhere with sector numbers for (in Linux' case) the kernel image file. The catch, and it's a small one, is that you need to prepare the floppy/harddisk/SD card with a user-level program which figures out the sector numbers of the files, then updates the MBR with a) the first-stage bootloader, and b) a short list of sector numbers. And you need to re-run that whenever you move around or update those files (as that will usually mean different sector numbers).
Update: Of course it could also be done by having a first-stage bootloader in onboard flash that could then optionally boot from SD. Admittedly this is how some of those tons of small ARM- or MIPS boards get their SD boot capability. But you'll need that flash then..
Chip, I have an appreciation of all the factors at play here.
I have concern that this is too specialized and only applicable in .1% of uses. It's better to have a well thought out architectural plan around this, and perhaps with more thought you'd come up with something more imaginative than just per-clock task instructions.
I'm also concerned about the QA aspect, since it would be necessary to rush this, I'm worried that proper exploration and depth of analysis can't be done and it could contribute a fatal flaw that could set you back.
You've already done so much to make the TASKSW really simple and transparent, I don't really feel this extra level of hackery is necessary and would only benefit 1 or 2 applications where the coder REALLY grokked it's use and could exploit it to the fullest. The truth is that the audience for this is VERY small.
I'm sure you could run into all sorts of corner cases around this once you got down to brass tacks.
I was driving and thinking today about some architectural improvements that could make it into the P3, and I think you'd really appreciate how straightforward, non-traditional, and simple they are.
For instance, doing a Multi-Instruction-Single-Data would be pretty simple. You load data into a register and apply multiple, possibly different, instructions to the low and high longs independently.
The assembler would look like:
RDLLONG foo, bar
SHL foo, #4 ADD foo, baz
The restrictions to MISD instructions are 1 destination, 1 source, and 1 or 2 immediate values. This keeps the load/store cycles orthogonal.
That code would simultaneously shift out bits and shift in bits, full duplex. It would be clean to modify SHR and SHL to be able to shift in the Z or C flags.
The premise is that you could issue 2 of the 32bit instructions in parallel, since it would have a 64bit word size. The limitation would be that you can only write to 1 destination (it can be 1 high long or a full long, if the variable is specified twice), 1 source register, the Z and C flags are shared, and immediates are free, since they are part of the instruction anyway.
If you wanted to use the same variable for 2 ops, the way to specify it might be:
ADD foo.h, bar.l, foo.l, bar.h
It would be 1 instruction with the source and destination longs specified. The other way could be:
ADD foo.h, bar.l ADD foo.l, bar.h
In that manner it would resemble the other MISD instructions. I call it MISD, but some might qualify it as MIMD. Some might call the last 2 examples SIMD, since it's a single opcode and multiple data, but it's a single opcode only because the two opcodes are the same, they needn't be the same as long as they work on the same data.
Further, since the memory width would be 64bits, you'd have 256 bits in 4 long-longs, so you could have true SIMD instructions that take the full cache load and apply 1 opcode to long-sized chunks, in 4 cycles, perhaps less cycles depending on the memory architecture. This would be block processing of data and could benefit things like data processing and symmetrical cryptographic algorithms.
Oh, and these 32bit instructions are just freebies that are a subset of the 64bit instructions. Might even open the door to P2 code compatibility by only utilizing the lower 32bits of the long-long.
So, there is a lot more neat stuff that can be accomplished in the next architecture...
Argh, the SD boot thing! For crying out loud, if you want to boot from SD, put a 33 cent SPI flash on there with the "BIOS" and boot from the SD once your "BIOS" has loaded! The SD spec and FAT filesystem support is too much for the space. I draw the line at forcing the customer to repartition an off the shelf SD card to be able to boot from a hidden partition. I already discussed this idea with Chip and dismissed it as too error prone and too limiting. With a "BIOS" you have no such limitation, in fact the Prop 1 does this now with PropGCC.
Ah, I was updating my post at the same time. Yes, you can do it via flash. No, you don't need to have FAT filesystem support. Yes, you need the SD spec. No, you don't need a secret partition, you use what's there already (it's standard).
-Tor
I have concern that this is too specialized and only applicable in .1% of uses. It's better to have a well thought out architectural plan around this, and perhaps with more thought you'd come up with something more imaginative than just per-clock task instructions.
...
You've already done so much to make the TASKSW really simple and transparent, I don't really feel this extra level of hackery is necessary and would only benefit 1 or 2 applications where the coder REALLY grokked it's use and could exploit it to the fullest. The truth is that the audience for this is VERY small.
Whoa! That seems a bit on the nose. People will use it if it makes the difference. Ready made driver sets will get used without having the knowledge of how they were built.
Slicing has speed and low latency on it's side, how well does the switching instruction compare? What is it's lowest latency? What percentage of the MIPS does this tight arrangement consume? I don't know myself. Can someone address this please.
EDIT: And the finer grained use of the switch instruction will really eat up code space too.
In regards to booting, I think the best way is to stick with the way we are doing things now. Getting the propeller2 to bootstrap from SD is pointless IMO. Not only that but with all the out of spec cards out there *I have found many* this could make debugging difficult. You would not only need the basics to bootstrap off the card, you'd also need a way to handle the errors that would prevent booting.
Let's say you've got four tasks running and each task is coded using only single-clock instructions, so that timing is totally determinant. This is great, because now you've got four fast tasks that can behave just like hardware. This isn't realistic, though, because at least one of those tasks will need to do RDxxxx/WRxxxx instructions to communicate with the hub, lest there be no memory communication with other cogs. This will destroy determinancy. Why do this if there can't be practical determinancy?
You need to define determinancy here, and clarify WAITxx.
Truckloads of Micro controllers already have jitter of a few cycles on their interrupts, and code lives with that.
Does this impact WAITxx ?
- I guess the hold-PC side of WAITx s done in hardware, away from the pipeline, but it is the feed of the next opcode that could have jitter, right ?
Did timer capture make the cut ? - that's how Microcontrollers cope with Opcode jitter, they use hardware.
Because this is optional, if someone really has a task that cannot jitter, they can allocate a whole COG to that.
Plenty of other tasks can tolerate some timing jitter.
eg UARTs do not really care is the sample point moves 6.25ns, even up to around 10MBaud.
can anyone make an argument for having this feature, despite it's serious flaw?
The most compelling immediate use I can see for this, is it allows quite serious Debug, almost for free.
A small overclock, and the average throughput would be back where it was.
Now come on guys. If you have an SD card, why do you need a Flash chip to boot it! It is just overcomplicating a hardware design. The P1 already has its detractors in that it needs an eeprom to boot. The SPI Flash chips are SOIC8 atm. I am now designing SOT23 into my prop boards that have SD (microSD). I wish I could avoid them totally.
So, for you supposed experts on SD, we DO NOT REQUIRE PROPER FAT SUPPORT !!! I suggest some of you are smart enought to know this. I am not advocating any risk here... only if flash is not found, does the ROM code look for an SD.
As for the speed of SPI Flash access speed vs SPI SD access, while I have not bothered to check the specs, I severly doubt that Flash is faster. On the contrary, it is more than likely that SD is faster, because of the smart processor in the SD card. Anyway, this is totally irrelevant. The hardware design will have Flash and/or SD. I expect there will be a lot more P2's requiring file storage, and that will be provided by SD and not Flash. The P2 will be capable of loading code that could switch to 4bit SD mode to go even faster.
I am not advocating a raw data SD card access (as apparently the Raspberry Pi does). I am advocating using reading the few sectors that FAT16/32 occupies to locate the sectors where the boot code will be stored. There are two possibilities here... use the older boot sectors reserved for the old DOS boot sectors... or use the first data file.
The SD protocol (SPI access) is quite simple for read only. Apart from the few simple SD commands, IIRC it involves looking at the volume label for FAT then 16 or 32 to determine the offset to get the pointer to the directory, and looking at the first directory entry to get the first file data entry. We can then read up to 32KB, but in reality, we only require a short boot to load a fully functional SD/FAT driver.
I have some other critical work to do, so I don't want to waste my time in working out the ROM code to do this if there is no chance it will be used. However, I sincerely think this will be a huge opportunity missed. And if the ROM code fails, no big loss.
I have concern that this is too specialized and only applicable in .1% of uses. It's better to have a well thought out architectural plan around this, and perhaps with more thought you'd come up with something more imaginative than just per-clock task instructions.
I'm also concerned about the QA aspect, since it would be necessary to rush this, I'm worried that proper exploration and depth of analysis can't be done and it could contribute a fatal flaw that could set you back.
Valid observations, but this is an optional feature, right ?
In dormant mode, it is a low risk change.
If it enabled useful Real Time Debug, that covers a lot more than 0.1% of users.
So, for you supposed experts on SD, we DO NOT REQUIRE PROPER FAT SUPPORT !!! I suggest some of you are smart enought to know this. I am not advocating any risk here... only if flash is not found, does the ROM code look for an SD.
My understanding of FPGA booting, is they are quite position-tolerant, by way of using a distinct start-preamble.
ie I think you can relax more complex access, and simply stream past the invalid stuff.
This is not risk free, and might get complex in a design with multiple PGM blocks, but I guess only one would be ROM launched ?
Other Pgm Blocks could have slightly different preambles.
You need to define determinancy here, and clarify WAITxx.
Truckloads of Micro controllers already have jitter of a few cycles on their interrupts, and code lives with that.
Does this impact WAITxx ?
Lowest level bit-bashing is the primary goal here. To allow drivers to have multiple tasks or have multiple drivers along side each other. Basically, the Cogs in the Prop2 are more capable, and a lot bigger/beefier, so using a whole Cog to get the timing right but spending most of that time waiting is not effective use of that much hardware.
The WAIT issue is resolved in that a two instruction polling loop is close enough.
The current road block is that, with the smallest implementation of slicing, hub instructions stall the singular pipeline and thereby introducing accumulated delays into the shared hardware threads.
EDIT: As for why this is so. I've come to understand that it's because Chip is offering this as a late addition as long as it doesn't affect the "critical path". The ideal version would work beautifully but requires too deep a re-engineering at this late stage. Hence Pedward's no-go input.
So, determinacy is going to suffer with this slicing implementation, end of story. Even using it for debugging is going to have the same result.
The question is, given it's easy to add and doesn't matter if it's not used, is there a useful application for it like this?
... I am not advocating any risk here... only if flash is not found, does the ROM code look for an SD.
... I am not advocating a raw data SD card access (as apparently the Raspberry Pi does). I am advocating using reading the few sectors that FAT16/32 occupies to locate the sectors where the boot code will be stored. There are two possibilities here... use the older boot sectors reserved for the old DOS boot sectors... or use the first data file.
As far as I've seen sectors 1 (second sector) through 62 are not used in MBR partitioning schemes. For some odd reason sector 63 is sometimes the first block allocated after the MBR itself.
However using unallocated space is usually unwise. If repartitioning is considered a pain then I would go with the first file idea. At least this way it can be managed with ordinary file privileges by deleting everything else on the SD card before writing it.
The advantage to using the first file is that no special software is required to get the boot file onto the SD card. Just reformat a card and then copy the sd boot file to the card. Once the boot code is perfected, it should never require changing, unless encryption is used, in which case the encryption program modifies the file first.
Thie boot code is responsible for being able to access the file system on the sd card and then load the user program.
Despite some still-unresolved issues, I'd prefer to see the PropTool continue for the time being -- at least until SimpleIDE has a non-project-oriented mode available.
-Phil
Our cautious goal is to make SimpleIDE the official Parallax tool for programming Propellers, but only if it has enough of the features required (like your non-project-oriented mode) to be a qualified replacement. In keeping with this approach, Steve has been working closely with Daniel, Andy and forum members to make these improvements. Some improvements are a concern, like how the terminal operates across different platforms. The complete PST capability in Windows has proven to be a challenge on the Mac, for example. But the gains of having even a limited terminal are very significant: open-source, multi-platform and multi-lingual. These are absolutely necessary improvements outside of the USA. Thankfully, no professor cornered me at the presentations at Taiwan National University with the questions they really wanted to ask: can it support traditional Chinese comments and menus, is it open-source, etc? These things are a real show-stopper for Parallax, stopping us at the door of prospective customers faster than you could imagine. Whether or not SimpleIDE can overcome some of the challenges remains to be seen, but we're proceeding as if it will be a success.
The second tool we wish to have is a more full-featured version from Eclipse, like we started [and abandoned] last year as the costs went out of control beyond initial estimates.
Roy pointed out that time spent by Jeff towards an improved Propeller Tool around Propeller 2 is not a wise use of our time. That's correct - it's not. Continuing to engineer around the closed-source edit control we've been using for years would be a mistake. And please, nobody ask me if you could re-write the source for that section. We're still stuck in Delphi and last time I really looked it wasn't going to do much for us except allow other developers to work in our code base. The best future for the Propeller Tool ends up being some kind of externally driven life-support program like we had for the SX-Key IDE - anybody remember Peter Montgomery's effort to keep Chip's original code base going for so many years? As far as Jeff is concerned, he's a tremendous asset to our customers and all of us would benefit from anything else he contributes to Propeller 2, such as examples and documentation.
As for gabbing with Chip about single-clock-granular task switching: talk is cheap, for now. If for some reason Chip wants to implement such a feature he will have our support to see that the business can support an elongated design cycle [at almost any cost], but I can say with confidence that further changes don't have a cost-benefit ratio in anybody's favor. I could start a business-oriented thread about why it's a bad idea to continue design additions but nobody would read it because we primarily share technical interests on these forums, never mind that it's just not appropriate to do that anyway.
I think I might have to move in to Chip's house if this discussion goes much further. Today is Sunday so he'll take a break from this thread. Bootloader discussions only [or anything else that doesn't require more hardware design] for you guys.
The current road block is that, with the smallest implementation of slicing, hub instructions stall the singular pipeline and thereby introducing accumulated delays into the shared hardware threads.
...
The question is, given it's easy to add and doesn't matter if it's not used, is there a useful application for it like this?
The below comments could offer hope for finer grained applications than what TASKSW is good at.
I forgot that another advantage of waitcnt over polling is removeing timming jitters when bit banging a serial protocol.
If a thread accessing the HUB causes jitters in its friends a WAICNT would help combat them.
You could square up the timing with a WAITCNT now and then, but it's probably not worth doing.
I'm keen to try a polling of the counter as a workaround for maintaining timing where it's important.
We'll never know if the slicing feature doesn't exist.
Now come on guys. If you have an SD card, why do you need a Flash chip to boot it! It is just overcomplicating a hardware design. The P1 already has its detractors in that it needs an eeprom to boot. The SPI Flash chips are SOIC8 atm. I am now designing SOT23 into my prop boards that have SD (microSD). I wish I could avoid them totally.
I do understand what you mean. I don't necessarily LIKE the idea of using the eeprom, but there are a few benefits IMO. I like the idea of eeprom since it allows you to configure the boot. First file is fine and all, but reformatting an SD card and copying everything back to it seems like quite a bit of work. Especially with large file systems. The file COULD be edited in place, but would it need to remain the same size or smaller? With the eeprom, you just modify the file name and re-program the eeprom. The propeller COULD do this all by itself.
Using a special partition is something I've thought about and decided I REALLY don't like the idea. IF there was a tool to do this, transparently, *I know there are several that could do this* it could be an option. Hiding anything in unallocated space is a VERY bad idea IMO.
Once again, this is just MY opinion. Everyone is entitled to their own. There will always be detractors, they will always have things to point to. I'm sure the complaint will change from "booting from SD" to "lack of interrupts" if the P2 can boot from SD.
For SPI flash is true. Every SPI flash on the market today supports the 0x03 read instruction which lets you transfer a byte per 8 clocks at up to 20 MHz. This is in fact much, much, much, faster than a typical SD card because there is no delay in reading any byte. Additionally, there is usually (but not always) a fast read instruction 0x0B that will let you transfer a byte every 8 clocks around 50 MHz or so. Finally, for specialized quadSPI access there are vendor specific functions which allow up to 4 bit parallel 100 MHz reads.
Now, making a truly compatible SD card loader that works with everything under the sun out there is difficult. Parallax does not need to provide FAE support to people looking for the right SD card to use.
PC boot loaders have smaller code because they do not have to actually preform the low level read and write operations for the SD card. Please understand that this takes more than half of a cogs register space just to implement an SD card controller. Even then, this functionality does not work with every SD card. It might be worth to pursue if it worked with every SD card. But, this is not the case. If Parallax had acess to the code for the operations needed to support every SD card under the sun this could be a plausible idea. But, they do not. The feature cannot be implemented if it is not universally compatible.
FAT support and booting after implementing a working SD card controller is only a second thought. If universally compatibility cannot be made then there is no point. Right now, both FSRW and the FATEngine do not have a perfect track record with SD cards. Even though both of these drivers implement the basic SD card spec.
In truth, there is more stuff I could have done for the FATEngine to support more SD cards. But, I would have spent the entire cog image space handling reading in large registers from the SD card and adjusting device and operational parameters for each SD card. This is hard to do with limited space.
Ken,
Regarding SimpleIDE, I will do what I can to help and support Steve on this effort. Regarding Eclipse, it looks like I will probably be using it as part of my day job soon, so I may be able to contribute on that front also.
Something that might want to be on your list which is hardware, but not inside the chip, is the reference design for a board or two. I think the initial board should be something in the simpler side like the QuickStart (and I would have one connection on it be the same as the 40 pin one on the existing QuickStart) although I think it's vital to have an SDRAM chip on board.
Yes, a vital omission from the task list.
As there is no DIP version of the Prop Two I would like to see a very minimal board at launch. Basically just a break out board but with the absolute minimum of support circuitry on board. Regulator, EEPROM, XTAL and such.
Perhaps with a big DIP foot print so we can stuff it into bread boards or strip boards, proto boards and such.
Ken,
Regarding SimpleIDE, I will do what I can to help and support Steve on this effort. Regarding Eclipse, it looks like I will probably be using it as part of my day job soon, so I may be able to contribute on that front also.
Something that might want to be on your list which is hardware, but not inside the chip, is the reference design for a board or two. I think the initial board should be something in the simpler side like the QuickStart (and I would have one connection on it be the same as the 40 pin one on the existing QuickStart) although I think it's vital to have an SDRAM chip on board.
Roy
Thanks Roy - that list was just a quick one off the top of my head. Reference design is another item that requires engineering and manufacturing.
This time, we won't be producing scads of boards. We'll offer one initially, maybe two long term, and hope that the community and other developers can build a business around hardware.
I have experimental PCB to Propeller II almost clear --- Only things it need is what pins will be used to Flash, Serial --- VGA optionaly
Excellent, Sapieha. This time around we hope to provide early release information to PCB designers such as yourself and Bill (and others) so you can have hardware ready from the beginning. This will give everybody a much greater opportunity to sell them successfully.
Yes, a vital omission from the task list.
As there is no DIP version of the Prop Two I would like to see a very minimal board at launch. Basically just a break out board but with the absolute minimum of support circuitry on board. Regulator, EEPROM, XTAL and such.
Perhaps with a big DIP foot print so we can stuff it into bread boards or strip boards, proto boards and such.
Heater, I think the "minimal DIP package" thing is something better left to the community guys to make since it's primarily for the Hobbyists. I think Parallax needs to focus on a board like the QuickStart maybe just a smidge more on it because of the Prop2's capabilities, and then maybe eventually a Prop2 BOE.
A revised sampling of what lies ahead in no particular order (based on reply from Roy and Perry):
Spin interpreter and C kernel, plus the design team for the latter
GCC port to P2
As you're aware of from my email, I've already volunteered to be part of these two tasks as well as handling any updates that need to be made to propeller-load for P2.
I'm a bit to tired to follow this but are we seeing that a Prop II can fetch code into COG from SDRAM as data and then execute that code in COG. Like a current XMM solution but withou going through a cache driver COG and HUB RAM.
This sounds major.
Has this question been answered? David Betz asked the same thing. (both on page 4)
Comments
Maybe -- Maybe not.
If them have had theirs separate State-machine to run in background of COGs other instructions that can maybe can function.
Or even not that. The good old LiLo (Linux Loader) is only a few bytes for example, it doesn't know anything about filesystems. Instead it uses a staged bootstrapping process. The so-called 'master boot record' (MBR) of the storage medium has room for a small bootloader. If that is the LiLo loader then it has been stored there together with some sector numbers. The first-stage bootloader reads the second-stage bootloader from those sectors (or enough of it so that it can read the rest by itself). There's also a 'map' file somewhere with sector numbers for (in Linux' case) the kernel image file. The catch, and it's a small one, is that you need to prepare the floppy/harddisk/SD card with a user-level program which figures out the sector numbers of the files, then updates the MBR with a) the first-stage bootloader, and b) a short list of sector numbers. And you need to re-run that whenever you move around or update those files (as that will usually mean different sector numbers).
Update: Of course it could also be done by having a first-stage bootloader in onboard flash that could then optionally boot from SD. Admittedly this is how some of those tons of small ARM- or MIPS boards get their SD boot capability. But you'll need that flash then..
-Tor
I have concern that this is too specialized and only applicable in .1% of uses. It's better to have a well thought out architectural plan around this, and perhaps with more thought you'd come up with something more imaginative than just per-clock task instructions.
I'm also concerned about the QA aspect, since it would be necessary to rush this, I'm worried that proper exploration and depth of analysis can't be done and it could contribute a fatal flaw that could set you back.
You've already done so much to make the TASKSW really simple and transparent, I don't really feel this extra level of hackery is necessary and would only benefit 1 or 2 applications where the coder REALLY grokked it's use and could exploit it to the fullest. The truth is that the audience for this is VERY small.
I'm sure you could run into all sorts of corner cases around this once you got down to brass tacks.
I was driving and thinking today about some architectural improvements that could make it into the P3, and I think you'd really appreciate how straightforward, non-traditional, and simple they are.
For instance, doing a Multi-Instruction-Single-Data would be pretty simple. You load data into a register and apply multiple, possibly different, instructions to the low and high longs independently.
The assembler would look like:
The restrictions to MISD instructions are 1 destination, 1 source, and 1 or 2 immediate values. This keeps the load/store cycles orthogonal.
You can parallel process data in 1 clock.
This opens the door to abuses like this:
That code would simultaneously shift out bits and shift in bits, full duplex. It would be clean to modify SHR and SHL to be able to shift in the Z or C flags.
The premise is that you could issue 2 of the 32bit instructions in parallel, since it would have a 64bit word size. The limitation would be that you can only write to 1 destination (it can be 1 high long or a full long, if the variable is specified twice), 1 source register, the Z and C flags are shared, and immediates are free, since they are part of the instruction anyway.
If you wanted to use the same variable for 2 ops, the way to specify it might be:
It would be 1 instruction with the source and destination longs specified. The other way could be:
In that manner it would resemble the other MISD instructions. I call it MISD, but some might qualify it as MIMD. Some might call the last 2 examples SIMD, since it's a single opcode and multiple data, but it's a single opcode only because the two opcodes are the same, they needn't be the same as long as they work on the same data.
Further, since the memory width would be 64bits, you'd have 256 bits in 4 long-longs, so you could have true SIMD instructions that take the full cache load and apply 1 opcode to long-sized chunks, in 4 cycles, perhaps less cycles depending on the memory architecture. This would be block processing of data and could benefit things like data processing and symmetrical cryptographic algorithms.
Oh, and these 32bit instructions are just freebies that are a subset of the 64bit instructions. Might even open the door to P2 code compatibility by only utilizing the lower 32bits of the long-long.
So, there is a lot more neat stuff that can be accomplished in the next architecture...
-Tor
Whoa! That seems a bit on the nose. People will use it if it makes the difference. Ready made driver sets will get used without having the knowledge of how they were built.
Slicing has speed and low latency on it's side, how well does the switching instruction compare? What is it's lowest latency? What percentage of the MIPS does this tight arrangement consume? I don't know myself. Can someone address this please.
EDIT: And the finer grained use of the switch instruction will really eat up code space too.
You need to define determinancy here, and clarify WAITxx.
Truckloads of Micro controllers already have jitter of a few cycles on their interrupts, and code lives with that.
Does this impact WAITxx ?
- I guess the hold-PC side of WAITx s done in hardware, away from the pipeline, but it is the feed of the next opcode that could have jitter, right ?
Did timer capture make the cut ? - that's how Microcontrollers cope with Opcode jitter, they use hardware.
Because this is optional, if someone really has a task that cannot jitter, they can allocate a whole COG to that.
Plenty of other tasks can tolerate some timing jitter.
eg UARTs do not really care is the sample point moves 6.25ns, even up to around 10MBaud.
The most compelling immediate use I can see for this, is it allows quite serious Debug, almost for free.
A small overclock, and the average throughput would be back where it was.
Now come on guys. If you have an SD card, why do you need a Flash chip to boot it! It is just overcomplicating a hardware design. The P1 already has its detractors in that it needs an eeprom to boot. The SPI Flash chips are SOIC8 atm. I am now designing SOT23 into my prop boards that have SD (microSD). I wish I could avoid them totally.
So, for you supposed experts on SD, we DO NOT REQUIRE PROPER FAT SUPPORT !!! I suggest some of you are smart enought to know this. I am not advocating any risk here... only if flash is not found, does the ROM code look for an SD.
As for the speed of SPI Flash access speed vs SPI SD access, while I have not bothered to check the specs, I severly doubt that Flash is faster. On the contrary, it is more than likely that SD is faster, because of the smart processor in the SD card. Anyway, this is totally irrelevant. The hardware design will have Flash and/or SD. I expect there will be a lot more P2's requiring file storage, and that will be provided by SD and not Flash. The P2 will be capable of loading code that could switch to 4bit SD mode to go even faster.
I am not advocating a raw data SD card access (as apparently the Raspberry Pi does). I am advocating using reading the few sectors that FAT16/32 occupies to locate the sectors where the boot code will be stored. There are two possibilities here... use the older boot sectors reserved for the old DOS boot sectors... or use the first data file.
The SD protocol (SPI access) is quite simple for read only. Apart from the few simple SD commands, IIRC it involves looking at the volume label for FAT then 16 or 32 to determine the offset to get the pointer to the directory, and looking at the first directory entry to get the first file data entry. We can then read up to 32KB, but in reality, we only require a short boot to load a fully functional SD/FAT driver.
I have some other critical work to do, so I don't want to waste my time in working out the ROM code to do this if there is no chance it will be used. However, I sincerely think this will be a huge opportunity missed. And if the ROM code fails, no big loss.
So please Chip, am I wasting my time ???
Valid observations, but this is an optional feature, right ?
In dormant mode, it is a low risk change.
If it enabled useful Real Time Debug, that covers a lot more than 0.1% of users.
My understanding of FPGA booting, is they are quite position-tolerant, by way of using a distinct start-preamble.
ie I think you can relax more complex access, and simply stream past the invalid stuff.
This is not risk free, and might get complex in a design with multiple PGM blocks, but I guess only one would be ROM launched ?
Other Pgm Blocks could have slightly different preambles.
Lowest level bit-bashing is the primary goal here. To allow drivers to have multiple tasks or have multiple drivers along side each other. Basically, the Cogs in the Prop2 are more capable, and a lot bigger/beefier, so using a whole Cog to get the timing right but spending most of that time waiting is not effective use of that much hardware.
The WAIT issue is resolved in that a two instruction polling loop is close enough.
The current road block is that, with the smallest implementation of slicing, hub instructions stall the singular pipeline and thereby introducing accumulated delays into the shared hardware threads.
EDIT: As for why this is so. I've come to understand that it's because Chip is offering this as a late addition as long as it doesn't affect the "critical path". The ideal version would work beautifully but requires too deep a re-engineering at this late stage. Hence Pedward's no-go input.
So, determinacy is going to suffer with this slicing implementation, end of story. Even using it for debugging is going to have the same result.
The question is, given it's easy to add and doesn't matter if it's not used, is there a useful application for it like this?
As far as I've seen sectors 1 (second sector) through 62 are not used in MBR partitioning schemes. For some odd reason sector 63 is sometimes the first block allocated after the MBR itself.
However using unallocated space is usually unwise. If repartitioning is considered a pain then I would go with the first file idea. At least this way it can be managed with ordinary file privileges by deleting everything else on the SD card before writing it.
Thie boot code is responsible for being able to access the file system on the sd card and then load the user program.
This is how I lload ZiCog now.
Our cautious goal is to make SimpleIDE the official Parallax tool for programming Propellers, but only if it has enough of the features required (like your non-project-oriented mode) to be a qualified replacement. In keeping with this approach, Steve has been working closely with Daniel, Andy and forum members to make these improvements. Some improvements are a concern, like how the terminal operates across different platforms. The complete PST capability in Windows has proven to be a challenge on the Mac, for example. But the gains of having even a limited terminal are very significant: open-source, multi-platform and multi-lingual. These are absolutely necessary improvements outside of the USA. Thankfully, no professor cornered me at the presentations at Taiwan National University with the questions they really wanted to ask: can it support traditional Chinese comments and menus, is it open-source, etc? These things are a real show-stopper for Parallax, stopping us at the door of prospective customers faster than you could imagine. Whether or not SimpleIDE can overcome some of the challenges remains to be seen, but we're proceeding as if it will be a success.
The second tool we wish to have is a more full-featured version from Eclipse, like we started [and abandoned] last year as the costs went out of control beyond initial estimates.
Roy pointed out that time spent by Jeff towards an improved Propeller Tool around Propeller 2 is not a wise use of our time. That's correct - it's not. Continuing to engineer around the closed-source edit control we've been using for years would be a mistake. And please, nobody ask me if you could re-write the source for that section. We're still stuck in Delphi and last time I really looked it wasn't going to do much for us except allow other developers to work in our code base. The best future for the Propeller Tool ends up being some kind of externally driven life-support program like we had for the SX-Key IDE - anybody remember Peter Montgomery's effort to keep Chip's original code base going for so many years? As far as Jeff is concerned, he's a tremendous asset to our customers and all of us would benefit from anything else he contributes to Propeller 2, such as examples and documentation.
As for gabbing with Chip about single-clock-granular task switching: talk is cheap, for now. If for some reason Chip wants to implement such a feature he will have our support to see that the business can support an elongated design cycle [at almost any cost], but I can say with confidence that further changes don't have a cost-benefit ratio in anybody's favor. I could start a business-oriented thread about why it's a bad idea to continue design additions but nobody would read it because we primarily share technical interests on these forums, never mind that it's just not appropriate to do that anyway.
I think I might have to move in to Chip's house if this discussion goes much further. Today is Sunday so he'll take a break from this thread. Bootloader discussions only [or anything else that doesn't require more hardware design] for you guys.
The below comments could offer hope for finer grained applications than what TASKSW is good at.
I'm keen to try a polling of the counter as a workaround for maintaining timing where it's important.
We'll never know if the slicing feature doesn't exist.
I do understand what you mean. I don't necessarily LIKE the idea of using the eeprom, but there are a few benefits IMO. I like the idea of eeprom since it allows you to configure the boot. First file is fine and all, but reformatting an SD card and copying everything back to it seems like quite a bit of work. Especially with large file systems. The file COULD be edited in place, but would it need to remain the same size or smaller? With the eeprom, you just modify the file name and re-program the eeprom. The propeller COULD do this all by itself.
Using a special partition is something I've thought about and decided I REALLY don't like the idea. IF there was a tool to do this, transparently, *I know there are several that could do this* it could be an option. Hiding anything in unallocated space is a VERY bad idea IMO.
Once again, this is just MY opinion. Everyone is entitled to their own. There will always be detractors, they will always have things to point to. I'm sure the complaint will change from "booting from SD" to "lack of interrupts" if the P2 can boot from SD.
Okay, look here's the issue with SD card booting:
It has to work!
For SPI flash is true. Every SPI flash on the market today supports the 0x03 read instruction which lets you transfer a byte per 8 clocks at up to 20 MHz. This is in fact much, much, much, faster than a typical SD card because there is no delay in reading any byte. Additionally, there is usually (but not always) a fast read instruction 0x0B that will let you transfer a byte every 8 clocks around 50 MHz or so. Finally, for specialized quadSPI access there are vendor specific functions which allow up to 4 bit parallel 100 MHz reads.
Now, making a truly compatible SD card loader that works with everything under the sun out there is difficult. Parallax does not need to provide FAE support to people looking for the right SD card to use.
PC boot loaders have smaller code because they do not have to actually preform the low level read and write operations for the SD card. Please understand that this takes more than half of a cogs register space just to implement an SD card controller. Even then, this functionality does not work with every SD card. It might be worth to pursue if it worked with every SD card. But, this is not the case. If Parallax had acess to the code for the operations needed to support every SD card under the sun this could be a plausible idea. But, they do not. The feature cannot be implemented if it is not universally compatible.
FAT support and booting after implementing a working SD card controller is only a second thought. If universally compatibility cannot be made then there is no point. Right now, both FSRW and the FATEngine do not have a perfect track record with SD cards. Even though both of these drivers implement the basic SD card spec.
In truth, there is more stuff I could have done for the FATEngine to support more SD cards. But, I would have spent the entire cog image space handling reading in large registers from the SD card and adjusting device and operational parameters for each SD card. This is hard to do with limited space.
Thanks,
Regarding SimpleIDE, I will do what I can to help and support Steve on this effort. Regarding Eclipse, it looks like I will probably be using it as part of my day job soon, so I may be able to contribute on that front also.
Something that might want to be on your list which is hardware, but not inside the chip, is the reference design for a board or two. I think the initial board should be something in the simpler side like the QuickStart (and I would have one connection on it be the same as the 40 pin one on the existing QuickStart) although I think it's vital to have an SDRAM chip on board.
Roy
As there is no DIP version of the Prop Two I would like to see a very minimal board at launch. Basically just a break out board but with the absolute minimum of support circuitry on board. Regulator, EEPROM, XTAL and such.
Perhaps with a big DIP foot print so we can stuff it into bread boards or strip boards, proto boards and such.
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=139&No=593
Thanks Roy - that list was just a quick one off the top of my head. Reference design is another item that requires engineering and manufacturing.
This time, we won't be producing scads of boards. We'll offer one initially, maybe two long term, and hope that the community and other developers can build a business around hardware.
I have experimental PCB to Propeller II almost clear --- Only things it need is what pins will be used to Flash, Serial --- VGA optionaly
Excellent, Sapieha. This time around we hope to provide early release information to PCB designers such as yourself and Bill (and others) so you can have hardware ready from the beginning. This will give everybody a much greater opportunity to sell them successfully.
Heater, I think the "minimal DIP package" thing is something better left to the community guys to make since it's primarily for the Hobbyists. I think Parallax needs to focus on a board like the QuickStart maybe just a smidge more on it because of the Prop2's capabilities, and then maybe eventually a Prop2 BOE.
Parallax has a list of enhancements for SimpleIDE.
Has this question been answered? David Betz asked the same thing. (both on page 4)
Rick