It looks like DJNS keeps going until the counter is less than zero. If so, I suggest renaming it to DJNC, which keeps it more consistent with the DJNZ (and their association to the C/Z flags).
It looks like DJNS keeps going until the counter is less than zero.
If so, I suggest renaming it to DJNC, which keeps it more consistent with the DJNZ (and their association to the C/Z flags).
The name for this family of jumps could be clearer: if they began with DEC it would be really obvious what they do.
In Prop world, the letter D's most frequent association is with a register identified in the Destination field.
Names like DECJZ, DECJNZ, ... would parallel DECMOD (consistently using DEC for "Decrement").
So, if you want to use a branch instruction with a 20-bit immediate address, it looks like you use "#". If you want to use an instruction with a 9-bit immediate address, you also use "#". If you want to use AUGx, you use "##".
So my question is... why bother with "##" at all? Just use "#" for all of them. The assembler can determine whether to add AUGx by the value of the immediate. This also avoids an inadvertent "JMP ##addr".
(by the way, what happens if AUGx precedes a 20-bit branch instruction?)
Now that you're moving on to smart-pins, how hard would it be to current-isolate one GPIO bank (maybe the last in the daisy-chain?)
Interesting, but power islands are tricky, and next thing users expect is some means to wake-up the rest of the chip, which complicates even more.
( Pushing to 1uA needs special buffer designs & I've seen one chip where the designers forgot that. They had a nice 1uA 32KHz oscillator feeding a more generic schmitt, and the 1uA became > 100uA thanks to the transition currents of that stage. )
Getting RTC support on P2 probably depends on OnSemi having a proven cell they can drop-in.
It would be useful to know the 0MHz predicted Icc on the P2 die ?
Starting and stopping clocks is easier than power-management and switching.
I just want something simple, you set the counter with utc-seconds since 2000 when ever it have access to that source.
With power failures that may last up to 72hrs I just want the seconds to tick away on the regular cog counter,
so when power comes back up I just want seconds since 2000 to be still accurate.
GPIO state is not that important and 8KB hub ram-retention if is it's to much work skip it, cog0 ram retention maybe?
If the new IRQ system can be set to wait for the smartpin/cog counter to reach a certain number software rtc-alarm(s) is easy.
GPIO state is not that important and 8KB hub ram-retention if is it's to much work skip it, cog0 ram retention maybe?
Partial HUB save would be too difficult, but perhaps a COG can be ring-fenced enough to have a low power island ?
This still comes down to the Static ICC expected for the P2, and how that stacks up against a separate RTC chip with maybe TCXO and RAM.
The streamer can write data directly to the i/o pins, not just to the DACs, up to 32 bits per clock, from hub or LUT.
It captures bytes, words, or longs. I like the idea of one, two, or four bits, as well, getting written as bytes! The rate is already programmable by SETXFRQ: $8000000 = every clock, $40000000 = every 2nd clock, $2AAAAAAB = every 3rd clock. In that case of every third clock, the LSB must be set to ensure that it rolls over (reaches $80000000+) on the initial third clock. Bit 31 is not kept by the phase accumulator.
24 bits might be handy too. Or a counter, so bits can be 1..32, possibly causing an interrupt when the count is done.
I think chip was meaning in Parallel, for those sizes.
(4bW covers QuadSPI for example & 24bW could be useful for LCDs )
I guess x1 effectively means DMA into a Serial engine
Serial is slightly different, managed in the Smart-Pins, but yes a bit-size field allowing 1,,32 would be useful. I think an Infineon MCU has that feature, along with good FIFOs on the serial.
I know this breaks the idea that all cogs are just cookie cutters of themselves.
But why not make each cog really good at one thing?, don't use the extra feature if you want them all to be plain.
Of course there needs to be some way cognew can specify what cog it wants.
One can have hardware AES, one can be good at usb2.0
and one at Ethernet, Fourier transform math and so on.... for 16 really useful features.
Maybe just one or two very specific op-code to boost encoding/decoding by 8x or some type of hardware assist so 480P hdmi is possible etc
Say each feature adds ~20% in gate logic, giving all the cogs the same 16 features would add 300%, but if keeping to just to one cog it's manageable.
Though coming up with gate-logic will take time and we don't want to wait, unless there is open source blocks already out there to just to drop in.
But why not make each cog really good at one thing?,
Because at that point you have totally destroyed any idea of somebody being able to mix and match that really smart code you have written with the really smart code I have written into the project of their dreams.
What you are suggesting is basically equivalent to having customised hardware for whatever task. Like a normal SoC,
16 plain cogs: Jack of all trades, master of none.
It's highly unlikely you would have the need to mix and match two of the same "type"
They are 95% still the same, and the complier will handle it, using ~8x slower emulated version of a specific op-code if it's not available in this cog.
If an engineer see a diagram of 16cores, showing hardware-assisted-"acronym" in each box, that will get their attention.
So, you are suggesting the compiler generates 8 times slower code for my object when I give it to you because you are using the special hardware my code expects.
And so your program does not work because it does not have the speed for my component.
And then you have to analyse the whole source code of all the objects you are using to find out what the problem is.
This is chaos.
No thanks. I like predictability and determinism. Even at the cost of extreme performance. If I need extreme performance I can find that elsewhere if I want and except that it will be more work on my part to attain.
Maybe USA specific?, many handyman claim they can do little bit of everything but then work turns out subpar.
So you are better of hiring someone that specialize in just one area.
It's not a SOC, it just a boost in specific areas, that is likely only needed for one routine.
I's probably to late to add anything to P2 now.
They are 95% still the same, and the complier will handle it, using ~8x slower emulated version of a specific op-code if it's not available in this cog.
I think making an opcode divergence would cause more problems than it would solve.
One can have hardware AES, one can be good at usb2.0
and one at Ethernet, Fourier transform math and so on.... for 16 really useful features.
Maybe just one or two very specific op-code to boost encoding/decoding by 8x or some type of hardware assist so 480P hdmi is possible etc
Chip has this already in P2, in the way MathOPS and Cordic are managed,
ie That resource is not duplicated in each COG and is available for any COG to use.
If any special HW intensive opcodes are needed, they can best go into that common 'Math pool', following the model that exists already.
Serial items like USB2 could certainly benefit from some small HW level helpers, but that is best mapped into SmartPin resource, not pulled into any one COG.
There are already pin-grouping implied in some operations, so that can continue.
>If any special HW intensive opcodes are needed, they can best go into that common 'Math pool'
15-20 cycles?, sometimes you need something that is done in 1-2 cycles for immediate use.
Just because one cog is good at something, nobody should have it as to make all equally slow?
I'm pretty sure the compiler could have a "don't use hardware assist" for that.
But probably something for the P3.
>If any special HW intensive opcodes are needed, they can best go into that common 'Math pool'
15-20 cycles?, sometimes you need something that is done in 1-2 cycles for immediate use.
You need to give some real use case examples.
eg I can think of USB, so let's follow that :
That could benefit form bit-stuff/unstuff, but rather than a special opcode, that is best done within the smart pins.
There was talk of a CRC opcode, I've not kept up with where that is, but that could go into the math-block, as USB is byte-based, 15~20 cycles is fine.
Notice how, by using existing P2 flows and blocks, this avoids needing any COG specific divergence ?
Hey, could the "specialized cog" conversation be moved to a separate thread? It would be nice to keep the topic of this thread primarily about the FPGA image that was just released.
I saw that, but how is it specified that the S field means that and not it's usual (P1) meaning? How are three possibilities for the address (S/#/PTRx) encoded in just one bit (I), or is the documentation wrong?
Immediate source values are invalid for RDLONG so the I flag changes the SSSSSSSSS values to mean the following.
Note: Chip indicated that has changed slightly but this should show how it all works.
Belated answer: Those names were chosen to convey that pending requests aren't forgotten, ie: Any IRQ that fires while STALLIed will still mark as pending ... and immediately generate a call to it's related ISR upon ALLOWI.
In the new Prop2, for RDxxxx/WRxxxx instructions, #0..255 is allowed for S. Values 256..511 are used by PTRx expressions. This means we have 5, not 6, relative index bits.
Comments
Permits absolute timing between props for data exchange at max clock frequency, as well as slave SPI.
The name for this family of jumps could be clearer: if they began with DEC it would be really obvious what they do.
In Prop world, the letter D's most frequent association is with a register identified in the Destination field.
Names like DECJZ, DECJNZ, ... would parallel DECMOD (consistently using DEC for "Decrement").
So my question is... why bother with "##" at all? Just use "#" for all of them. The assembler can determine whether to add AUGx by the value of the immediate. This also avoids an inadvertent "JMP ##addr".
(by the way, what happens if AUGx precedes a 20-bit branch instruction?)
( Pushing to 1uA needs special buffer designs & I've seen one chip where the designers forgot that. They had a nice 1uA 32KHz oscillator feeding a more generic schmitt, and the 1uA became > 100uA thanks to the transition currents of that stage. )
Getting RTC support on P2 probably depends on OnSemi having a proven cell they can drop-in.
It would be useful to know the 0MHz predicted Icc on the P2 die ?
Starting and stopping clocks is easier than power-management and switching.
With power failures that may last up to 72hrs I just want the seconds to tick away on the regular cog counter,
so when power comes back up I just want seconds since 2000 to be still accurate.
GPIO state is not that important and 8KB hub ram-retention if is it's to much work skip it, cog0 ram retention maybe?
If the new IRQ system can be set to wait for the smartpin/cog counter to reach a certain number software rtc-alarm(s) is easy.
This is the 0.9uA 1hz mems-osc, as implementing a internal 32khz crystal osc is then not needed.
http://www.mouser.com/ds/2/3/ASTMK-604412.pdf
This still comes down to the Static ICC expected for the P2, and how that stacks up against a separate RTC chip with maybe TCXO and RAM.
In either case, if you aren't designing a battery run solution then it would be advised to use a separate RTC chip.
24 bits might be handy too. Or a counter, so bits can be 1..32, possibly causing an interrupt when the count is done.
I like the clocking.
I think chip was meaning in Parallel, for those sizes.
(4bW covers QuadSPI for example & 24bW could be useful for LCDs )
I guess x1 effectively means DMA into a Serial engine
Serial is slightly different, managed in the Smart-Pins, but yes a bit-size field allowing 1,,32 would be useful. I think an Infineon MCU has that feature, along with good FIFOs on the serial.
But why not make each cog really good at one thing?, don't use the extra feature if you want them all to be plain.
Of course there needs to be some way cognew can specify what cog it wants.
One can have hardware AES, one can be good at usb2.0
and one at Ethernet, Fourier transform math and so on.... for 16 really useful features.
Maybe just one or two very specific op-code to boost encoding/decoding by 8x or some type of hardware assist so 480P hdmi is possible etc
Say each feature adds ~20% in gate logic, giving all the cogs the same 16 features would add 300%, but if keeping to just to one cog it's manageable.
Though coming up with gate-logic will take time and we don't want to wait, unless there is open source blocks already out there to just to drop in.
Could be pretty great too. But it's not the project at hand.
Maybe someone else can attempt this, or some ideas get considered after P2 is done.
What you are suggesting is basically equivalent to having customised hardware for whatever task. Like a normal SoC,
It's highly unlikely you would have the need to mix and match two of the same "type"
They are 95% still the same, and the complier will handle it, using ~8x slower emulated version of a specific op-code if it's not available in this cog.
If an engineer see a diagram of 16cores, showing hardware-assisted-"acronym" in each box, that will get their attention.
Besides, what is this master of none business?
Good software will do the job well, and it can improve over time. This chip will do a lot more.
Getting good at things in hardware meand dealing with IP, testing, etc... all of which will come at considerable time and expense.
That is time software can be developed on real chips.
There are a ton of SoC devices out there now.
And so your program does not work because it does not have the speed for my component.
And then you have to analyse the whole source code of all the objects you are using to find out what the problem is.
This is chaos.
No thanks. I like predictability and determinism. Even at the cost of extreme performance. If I need extreme performance I can find that elsewhere if I want and except that it will be more work on my part to attain.
Maybe USA specific?, many handyman claim they can do little bit of everything but then work turns out subpar.
So you are better of hiring someone that specialize in just one area.
It's not a SOC, it just a boost in specific areas, that is likely only needed for one routine.
I's probably to late to add anything to P2 now.
If these hardware assists are in fact minor, good software is needed anyway.
Chip has this already in P2, in the way MathOPS and Cordic are managed,
ie That resource is not duplicated in each COG and is available for any COG to use.
If any special HW intensive opcodes are needed, they can best go into that common 'Math pool', following the model that exists already.
Serial items like USB2 could certainly benefit from some small HW level helpers, but that is best mapped into SmartPin resource, not pulled into any one COG.
There are already pin-grouping implied in some operations, so that can continue.
15-20 cycles?, sometimes you need something that is done in 1-2 cycles for immediate use.
Just because one cog is good at something, nobody should have it as to make all equally slow?
I'm pretty sure the compiler could have a "don't use hardware assist" for that.
But probably something for the P3.
eg I can think of USB, so let's follow that :
That could benefit form bit-stuff/unstuff, but rather than a special opcode, that is best done within the smart pins.
There was talk of a CRC opcode, I've not kept up with where that is, but that could go into the math-block, as USB is byte-based, 15~20 cycles is fine.
Notice how, by using existing P2 flows and blocks, this avoids needing any COG specific divergence ?
Yes.
EDIT: actually read the entirety of that post
So, should in the (maybe) latest docs read ?
Note: Chip indicated that has changed slightly but this should show how it all works.
Belated answer: Those names were chosen to convey that pending requests aren't forgotten, ie: Any IRQ that fires while STALLIed will still mark as pending ... and immediately generate a call to it's related ISR upon ALLOWI.