Given the Prop II has almost 128K RAM to load I take it that means you don't want Intel HEX or S Record loader formats in the PROM.
You may want it but there is no such thing. So you hit the button, all your code gets beamed to the Prop, over whatever medium, and if successful it starts to run. But I'm sure you will want to know if it was successful and did start to run. There is your user interaction. No way around it if you want to be confident.
I conclude you might as well use Chips monitor. A simple one page script on any OS will do the job.
Reminds me of the evolution of file transfer in the CP/M days. Started out with PIP: moved on to XMODEM with some rude and crude error checking over dial up lines. Flaws in that were fixed with YMODEM and then ZMODEM. All the while what they really wanted was TCP/IP and FTP.
Except for launching and completion, constant user interaction is of no use when downloading firmware over wireless or other half-duplex links. My method outlined previously would allow a good image to be transferred without relying on a protocol, just dumb chance. Now Xmodem would be fine for sure but I'm just looking at something simple that Chip might be willingly to squeeze in and protocols mean memory. We can talk all we like and wish all we want. The question is, what would Chip consider as acceptable?
I don't get it. There is always a protocol even if it's as simple as:
1) Send file in one go.
2) Get response, OK or FAIL.
3) User decides to retry at 1) or give up.
Now of course you might want to script all that for automatic checking and possible retries. At the end of the session the user will want to know if it worked or not.
Chip's monitor may add a bit more toing and froing to that but it's basically the same. I would script it and forget it.
Also I don't understand what would happen if the load works OK but the program turns out to be buggy and crashes or is otherwise faulty. How do you get back into the monitor to recover the situation? Even if you reset the device it will run your EEPROM code again and fail again. Or have I missed a point here somewhere?
what about s.th. like on windows - if the last boot failed system comes up in monitor
this would require a single bit - in EEPROM or in the PROP itself ??
during early boot it is set, and the application if successfully loaded will reset it.
next time if it is 0 then normal boot, if it is 1 system comes up with monitor
What happens when the application is successfully loaded, starts to run, resets that bit and then fails due to some unforeseen problem? Perhaps you are no longer able to communicate with your Prop. Even if you are able to hit your remote Prop with a reset it will come up in the app again, and fail again.
At some point, this stuff needs to just be right*. NASA deals with problems like this, and they do so by working really, really, really, really hard on just about anything they do.
I have seen fail safe remote loads done on embedded Linux boxes used in Cell base stations and other places.
They had a fixed boot loader and two Linux OS images in FLASH.
You start out running the system from one of those images.
Remote loads go to the other image and the system rebooted by command from the host.
The system boots into the new image. A hardware watchdog would then reboot again if it did not get a command from the host within a given time.
If the new image comes up and is deemed good a command from the host disables that watchdog and things proceed as normal.
If it does not come up or the host does not send the magic command then the watchdog reboots and the original image comes up again.
This could be done on the Props I guess with a bit of external hardware assistance. Two EEPROMs, a means of switching from one to the other and a watchdog timer.
I think it might be as easy to use two Props. One is never upgraded and only performs the task of communicating with the host and loading/reloading the second which does the actual work.
I know this could have been asked a lot of times, and since the Prop 2 specs are rapidly changing since 2010, does the processor contains hardware based SPI/I2C and UART? Any DMA controller inside?
How about the CORDIC system mentioned? How many cycles it should take if the thing calculate one sample of sine(x) (in radians) ? Or, how could a Prop 2 handle transcendental functions using hardware?
The concern for the CORDIC is, a number of programmers here could be musicians as a primary/secondary/hobby job, and sine wave oscillator/generation becomes a primary topic if they want to build a synthesizer.
Plus, I assume that Prop 2 is not SIMD (Single instruction Multiple data) capable. It would be fun to have these, but the circuitries on these are complicated and takes up a lot of silicon space. (That's from my limited assumption)
A fundamental philosophical point or guiding principal in the Prop designs is that they do not have specialized SPI, I2C,USB, UART hardware. Rather than fixing features in silicon everything should be "soft". As such, I am not surprised such hardware support is not included in the Prop II, indeed I am glad to see the philosophy continue.
I believe there are new instructions and features in the Prop II that will help with CORDIC. I am no expert on such things.
If you read the preliminary feature list on the Parallax Semiconductor site, you'll see that each cog has a CORDIC state machine along with a multiply, divide, and square root unit. There's no SPI/I2C or UART block since the whole design philosophy of the Propellers is to use software-defined peripherals. There are some features that make it easier to combine several relatively simple I/O drivers into one cog. There's no DMA controller, but there are also features that make some of the functions of a DMA controller easier to write. There is some simple hardware to make it easier to handle external SDRAM.
CORDIC is still alive and well. If you use MATLAB Stateflow to automatically generate Verilog HDL code for an FPGA then you need to use CORDIC blocks to get the fastest SIN/COS/TAN functionality.
In general fast CORDIC hardware is actually very nice. Just a few conversion functions need to be made to translate an INT to a FLOAT and then back again to use the CORDIC hardware.
For GCC, you will probably see the CORDIC hardware used to speed up floating point operations.
If you read the preliminary feature list on the Parallax Semiconductor site, you'll see that each cog has a CORDIC state machine along with a multiply, divide, and square root unit. There's no SPI/I2C or UART block since the whole design philosophy of the Propellers is to use software-defined peripherals. There are some features that make it easier to combine several relatively simple I/O drivers into one cog. There's no DMA controller, but there are also features that make some of the functions of a DMA controller easier to write. There is some simple hardware to make it easier to handle external SDRAM.
Mike:
There is special hardware to generate the video stream so why not a general purpose shift block that can be use for all the other serial protocols like SPI,I2C, NRZI, Manchester, UART, Irda, and etc?
In the PASM uarts drivers it had been see that jitter can be produced when assemble had to bounce between task. So i would think a bit of shift hardware would go a long ways.
There are always tradeoffs in adding special purpose hardware. It takes silicon area to implement, needs special instructions or registers to control. Often some new variant comes along that the hardware doesn't quite handle properly and you still have to do it in software, etc. The video stream just can't be done without some hardware assistance. There are all sorts of pieces of new functionality that have been added to make it easier to implement things like SPI, I2C, UARTs, etc. at higher speeds in the Prop II with better reliability, less jitter, etc. One example is the multithreading where the pipeline can be used to execute a single or multiple separate instruction streams. There are now autoincrement data pointers that will speed up data access. You can use the special CLUT memory for I/O buffers if you're not trying to do video in the same cog.
The general philosophy of the design of the Propeller microcontrollers has been to provide for high level I/O functionality via software while adding small special purpose functional blocks to make it possible to do higher performance in software. A general purpose I/O shift block is way down on the priority list.
Thank you for clearing up some of this, Mike. I'm a relative newbie to Propellers but have been trying to keep up with the Prop II developments.
Needless to say, I've seen a great deal of talk, speculation, and suggestions for/about Prop II, all of which left me unsure as to what direction it was heading. I'm glad to see that a change in design philosophy isn't planned, as I think it would be a mistake to attempt to make the Propeller resemble other microcontrollers. It seems to me, and again, I'm only a newbie, that the main weakness the Propeller I has is an underwhelming amount of memory, making it difficult to use it for projects with many hardware features and, say, a robust user interface as well. What about more cogs (eg 16) and a faster clock? Those things, too, might be useful. I've also seen some people complain about the software tools, but that always seemed a non-issue to me, and more so now that GCC tools are available.
A long while back Chip put the question to the forum, "Do you want more COGs or more memory in the Prop II?"
There is only so much silicon avaiable and you have to make the tradeoff. I seem to remember the majority wanted to go with more memory. I voted for not going to 16 COGs as that means halving the bandwidth between COG and HUB as they all have to share access in a round robin fashion. Not good.
As it happens we now have the possibility to run multiple threads in a COG with hardwae scheduling so the need for more COGs is lessened substantially.
A fundamental philosophical point or guiding principal in the Prop designs is that they do not have specialized SPI, I2C,USB, UART hardware. Rather than fixing features in silicon everything should be "soft". As such, I am not surprised such hardware support is not included in the Prop II, indeed I am glad to see the philosophy continue.
Not quite, yes you DO want minimal hardware, but it is important not to make it so minimal, that you cripple user choices.
The COGS do allow much more minimal HW than most, but that does not mean you can ignore HW entirely.
eg On Prop II, the Timer now has added additional Silicon support added to fix some blindspots, and there IS support showing for a Serial Shifter for Prop-Prop arrays.
i2c and UART you can do in SW to most practical speeds, but SPI certainly could use a Silicon shift register for full application coverage.
There are SPI parts over 100MHz now
The Prop II Serial shifter may have enough control, to allow fast SPI links. Details are sketchy.
If it was my Silicon, I would have SPI support to QuadSPI, DDR, in hardware - basically option bits on the already defined Serial Link, and do i2c/UARTs in SW. USB may need a differential Sense/Drive, but I think USB is on the 'will be possible' list ?
The new threading allows you to pack-out a GOG now, so some very serious peripheral packing can be done.
I agree. The trick is to be able to povide hardware support for things like serial and parallel busses that is flexible and general purpose not dedicated to a single task. That general pupose hardware has to be available to all COG equally.
The XMOS approach to this is to group pins into "ports" and provide ports with serializers and clocked I/O. In that way they can provide for all kinds of software driven peripheral support including usb. This approach has it's own penalties in complexity and lack of flexibility. For example the four cores on a chip all have their own set of I/O pins and port supporting hardware. You cannot drive any pin on the device from any core as you can with the Prop.
For now I'm happy that the Prop II has not gotten to that state and seems to hace stuck to simplicity as a guiding rule.
Seems to me that the video generator on the prop 1 is already part way to providing hardware assist to serial I/O. It can currently shift 1 or 2 bits to the color lookup hardware.Adding the capability of outputting 4 bits and a clock signal would be a big assist on the output side, and adding the equivalent for the input side would complete the package.
You can already output the clock signal if it's 1:1 with the PLL driving it. Contrary to popular belief, any PLL mode can be used to drive the video circuitry, not just the one without a clock output.
Seems to me that the video generator on the prop 1 is already part way to providing hardware assist to serial I/O. It can currently shift 1 or 2 bits to the color lookup hardware.Adding the capability of outputting 4 bits and a clock signal would be a big assist on the output side, and adding the equivalent for the input side would complete the package.
Another instruction that's shown up up the boot code is CMPR. Based on the code, it seems that the carry flag is set only when D > S. Is this just the CMP reversed (for carry only)?
Does WAITPEQ/WAITPNE now have a timeout capability? I see a comment in the boot code "wait_rx" block that indicates as much. If so, how does this work?
wait_rx getcnt time 'ready timeout
add time,timeout
:waitpxx waitpne rx_mask,rx_mask wc 'wait for rx low/high with timeout
notb :waitpxx,#23 'toggle waitpeq/waitpne
wait_rx_ret if_nc ret 'return if not timeout (boot_flash follows)
The "wc" effect is latching the value in in "time" and internally performing a "while getcnt < time"? The original WAITPNE documentation indicated that WC was used to indicate which port the mask was applied to. So how is the port selected now?
CMPR means compare-reverse, where D and S are swapped going into the comparator so that S-D is computed, instead of the usual (CMP) D-S.
WAITPEQ/WAITPNE now have timeouts. If the WC (write carry) bit is set, the last ALU result is used as a value to compare CNT to and upon exit of the instruction, C=1 if a timeout occurred, or C=0 if the WAITPxx condition was met. The port to watch is set using SETPORT D/#n (D or immediate).
The ROM is all done and installed into the memories. All we are waiting for now is the synthesized block to be re-routed to accommodate a new keep-out region which will make the signals between it and our pad frame (w/memories) more correct-by-design so that we are more confident about those 7,000+ connections. After that, it's ready to go to the foundry. The next foundry shuttle is in December and we plan to be on it.
What I am going to start next is a detailed description of all the assembly instructions, grouped by function. Then, I'll document the ROM and the three programs inside it. At that point, anyone who wants to start making their own tools will have sufficient data.
Lately at Parallax, Daniel Harris discovered Terasic DE2-115 FPGA boards that we've all been using to emulate the Prop II. These boards are $599 (or $299 "student" price) and can emulate 6 of the 8 cogs. They are based on Altera Cyclone IV chips, and not as quick as the Stratix III that I've been using, but they are 1/8th the cost. We could publish a config file that would turn that board into a Prop II (minus the analog I/O pins), if anyone was interested. We should probably make a simple I/O board that plugs into it and gives a Prop Plug connection and a few 9-bit DACs for video. Oh, and an 8-pin Flash chip for program storage. These Cyclone IV chips can run cogs at up to 80MHz. The Stratix III can go 200MHz!
Lately at Parallax, Daniel Harris discovered Terasic DE2-115 FPGA boards that we've all been using to emulate the Prop II. These boards are $599 (or $299 "student" price) and can emulate 6 of the 8 cogs. ...These Cyclone IV chips can run cogs at up to 80MHz. The Stratix III can go 200MHz!
Interesting. That should buy some software lead time.
If 6 fills the FPGA, will you be offering a 4 COG version, that leaves users with more useful FPGA resource ?
With only Configure file from Parallax -- You can't add any more features to that FPGA --- To that You need programming files in VHLD, Vreilog NOT Configuration file
Interesting. That should buy some software lead time.
If 6 fills the FPGA, will you be offering a 4 COG version, that leaves users with more useful FPGA resource ?
Comments
Except for launching and completion, constant user interaction is of no use when downloading firmware over wireless or other half-duplex links. My method outlined previously would allow a good image to be transferred without relying on a protocol, just dumb chance. Now Xmodem would be fine for sure but I'm just looking at something simple that Chip might be willingly to squeeze in and protocols mean memory. We can talk all we like and wish all we want. The question is, what would Chip consider as acceptable?
1) Send file in one go.
2) Get response, OK or FAIL.
3) User decides to retry at 1) or give up.
Now of course you might want to script all that for automatic checking and possible retries. At the end of the session the user will want to know if it worked or not.
Chip's monitor may add a bit more toing and froing to that but it's basically the same. I would script it and forget it.
Also I don't understand what would happen if the load works OK but the program turns out to be buggy and crashes or is otherwise faulty. How do you get back into the monitor to recover the situation? Even if you reset the device it will run your EEPROM code again and fail again. Or have I missed a point here somewhere?
this would require a single bit - in EEPROM or in the PROP itself ??
during early boot it is set, and the application if successfully loaded will reset it.
next time if it is 0 then normal boot, if it is 1 system comes up with monitor
What happens when the application is successfully loaded, starts to run, resets that bit and then fails due to some unforeseen problem? Perhaps you are no longer able to communicate with your Prop. Even if you are able to hit your remote Prop with a reset it will come up in the app again, and fail again.
At some point, this stuff needs to just be right*. NASA deals with problems like this, and they do so by working really, really, really, really hard on just about anything they do.
**right enough to be recoverable.
They had a fixed boot loader and two Linux OS images in FLASH.
You start out running the system from one of those images.
Remote loads go to the other image and the system rebooted by command from the host.
The system boots into the new image. A hardware watchdog would then reboot again if it did not get a command from the host within a given time.
If the new image comes up and is deemed good a command from the host disables that watchdog and things proceed as normal.
If it does not come up or the host does not send the magic command then the watchdog reboots and the original image comes up again.
This could be done on the Props I guess with a bit of external hardware assistance. Two EEPROMs, a means of switching from one to the other and a watchdog timer.
I think it might be as easy to use two Props. One is never upgraded and only performs the task of communicating with the host and loading/reloading the second which does the actual work.
How about the CORDIC system mentioned? How many cycles it should take if the thing calculate one sample of sine(x) (in radians) ? Or, how could a Prop 2 handle transcendental functions using hardware?
The concern for the CORDIC is, a number of programmers here could be musicians as a primary/secondary/hobby job, and sine wave oscillator/generation becomes a primary topic if they want to build a synthesizer.
Plus, I assume that Prop 2 is not SIMD (Single instruction Multiple data) capable. It would be fun to have these, but the circuitries on these are complicated and takes up a lot of silicon space. (That's from my limited assumption)
A fundamental philosophical point or guiding principal in the Prop designs is that they do not have specialized SPI, I2C,USB, UART hardware. Rather than fixing features in silicon everything should be "soft". As such, I am not surprised such hardware support is not included in the Prop II, indeed I am glad to see the philosophy continue.
I believe there are new instructions and features in the Prop II that will help with CORDIC. I am no expert on such things.
CORDIC is still alive and well. If you use MATLAB Stateflow to automatically generate Verilog HDL code for an FPGA then you need to use CORDIC blocks to get the fastest SIN/COS/TAN functionality.
In general fast CORDIC hardware is actually very nice. Just a few conversion functions need to be made to translate an INT to a FLOAT and then back again to use the CORDIC hardware.
For GCC, you will probably see the CORDIC hardware used to speed up floating point operations.
Mike:
There is special hardware to generate the video stream so why not a general purpose shift block that can be use for all the other serial protocols like SPI,I2C, NRZI, Manchester, UART, Irda, and etc?
In the PASM uarts drivers it had been see that jitter can be produced when assemble had to bounce between task. So i would think a bit of shift hardware would go a long ways.
Regards,
Rich
The general philosophy of the design of the Propeller microcontrollers has been to provide for high level I/O functionality via software while adding small special purpose functional blocks to make it possible to do higher performance in software. A general purpose I/O shift block is way down on the priority list.
Thanks for the explanation.
Regards,
Rich
Needless to say, I've seen a great deal of talk, speculation, and suggestions for/about Prop II, all of which left me unsure as to what direction it was heading. I'm glad to see that a change in design philosophy isn't planned, as I think it would be a mistake to attempt to make the Propeller resemble other microcontrollers. It seems to me, and again, I'm only a newbie, that the main weakness the Propeller I has is an underwhelming amount of memory, making it difficult to use it for projects with many hardware features and, say, a robust user interface as well. What about more cogs (eg 16) and a faster clock? Those things, too, might be useful. I've also seen some people complain about the software tools, but that always seemed a non-issue to me, and more so now that GCC tools are available.
My two cents anyway.
There is only so much silicon avaiable and you have to make the tradeoff. I seem to remember the majority wanted to go with more memory. I voted for not going to 16 COGs as that means halving the bandwidth between COG and HUB as they all have to share access in a round robin fashion. Not good.
As it happens we now have the possibility to run multiple threads in a COG with hardwae scheduling so the need for more COGs is lessened substantially.
Not quite, yes you DO want minimal hardware, but it is important not to make it so minimal, that you cripple user choices.
The COGS do allow much more minimal HW than most, but that does not mean you can ignore HW entirely.
eg On Prop II, the Timer now has added additional Silicon support added to fix some blindspots, and there IS support showing for a Serial Shifter for Prop-Prop arrays.
i2c and UART you can do in SW to most practical speeds, but SPI certainly could use a Silicon shift register for full application coverage.
There are SPI parts over 100MHz now
The Prop II Serial shifter may have enough control, to allow fast SPI links. Details are sketchy.
If it was my Silicon, I would have SPI support to QuadSPI, DDR, in hardware - basically option bits on the already defined Serial Link, and do i2c/UARTs in SW. USB may need a differential Sense/Drive, but I think USB is on the 'will be possible' list ?
The new threading allows you to pack-out a GOG now, so some very serious peripheral packing can be done.
The XMOS approach to this is to group pins into "ports" and provide ports with serializers and clocked I/O. In that way they can provide for all kinds of software driven peripheral support including usb. This approach has it's own penalties in complexity and lack of flexibility. For example the four cores on a chip all have their own set of I/O pins and port supporting hardware. You cannot drive any pin on the device from any core as you can with the Prop.
For now I'm happy that the Prop II has not gotten to that state and seems to hace stuck to simplicity as a guiding rule.
-Phil
Kanin,
Take a look at this thread:
http://forums.parallax.com/showthread.php?143679-Looking-for-methods-for-fast-SDI-or-SQI/page2
C.W.
So in the following bit of code:
The "wc" effect is latching the value in in "time" and internally performing a "while getcnt < time"? The original WAITPNE documentation indicated that WC was used to indicate which port the mask was applied to. So how is the port selected now?
WAITPEQ/WAITPNE now have timeouts. If the WC (write carry) bit is set, the last ALU result is used as a value to compare CNT to and upon exit of the instruction, C=1 if a timeout occurred, or C=0 if the WAITPxx condition was met. The port to watch is set using SETPORT D/#n (D or immediate).
The ROM is all done and installed into the memories. All we are waiting for now is the synthesized block to be re-routed to accommodate a new keep-out region which will make the signals between it and our pad frame (w/memories) more correct-by-design so that we are more confident about those 7,000+ connections. After that, it's ready to go to the foundry. The next foundry shuttle is in December and we plan to be on it.
What I am going to start next is a detailed description of all the assembly instructions, grouped by function. Then, I'll document the ROM and the three programs inside it. At that point, anyone who wants to start making their own tools will have sufficient data.
Lately at Parallax, Daniel Harris discovered Terasic DE2-115 FPGA boards that we've all been using to emulate the Prop II. These boards are $599 (or $299 "student" price) and can emulate 6 of the 8 cogs. They are based on Altera Cyclone IV chips, and not as quick as the Stratix III that I've been using, but they are 1/8th the cost. We could publish a config file that would turn that board into a Prop II (minus the analog I/O pins), if anyone was interested. We should probably make a simple I/O board that plugs into it and gives a Prop Plug connection and a few 9-bit DACs for video. Oh, and an 8-pin Flash chip for program storage. These Cyclone IV chips can run cogs at up to 80MHz. The Stratix III can go 200MHz!
Interesting. That should buy some software lead time.
If 6 fills the FPGA, will you be offering a 4 COG version, that leaves users with more useful FPGA resource ?
With only Configure file from Parallax -- You can't add any more features to that FPGA --- To that You need programming files in VHLD, Vreilog NOT Configuration file