Pins' outputs can be inverted via the whole smartpin arrangement. Any any cog wrote a '1', the pin would go low.
Just remember that if smart pins are to be "streamlined" that we have the inverter available although IMO the AND arrangement is far more sensible and in keeping with how signals are actually used between cogs.
On the matter of streamlined smartpins if that is what is needed to induce a P2 birth then we would still want some form of serdes and if analog complicates everything then at least have the counter modes from the P1 that allowed us to create DACs easily and output selectable frequencies.
Also, mentioned before about upgrading the bootloader to at least load from SPI Flash. This would be a really big help for me and allow me to build up Tachyon with all the filesystem and networking present at boot.
P.S. I came across this post while looking for information on the new ALTx instructions to include in my assembler document. Would be nice if each of these instructions had their own thread rather than getting buried deep and then not knowing which thread it was discussed in.
EDIT: Found it in it's own thread!
Dave,
Chip said the extras he put in the other day just for NTSC took IIRC 250+ blocks and a couple of multipliers which will need to be converted to gates. And if I understand correctly, this is duplicated in every cog. This is not the streamer which was already in the done.
I suppose my comment was based on evanh's responses to your posts. Those were specific to the streamer, and didn't mention NTSC. So I apologize if you were only referring to the NTSC feature. I agree with you that all cogs would not need this, but I don't like the idea of having cogs with different functionality. The NTSC feature should be a single resource that is shared by all the cogs.
However, I wonder whether an NTSC feature is needed at all. I would much rather see HDMI.
... The streamer just consists of a 16-long FIFO and the logic needed to make it work with the rest of the cog.
That's only the FIFO. The Streamer makes use of that FIFO when fetching from HubRAM but they are independent hardware otherwise.
With the lack of documentation it is difficult to understand what is contained in the streamer. Could someone please post a list of its features? My understanding is that the streamer consists of the FIFO and the logic that controls the FIFO. This is used with hubexec and the read/write fast instructions. I understand it is also used to stream ADC and DAC data. The actual ADC and DAC circuitry is separate from the streamer since these are a shared resource. Or maybe not. Could somebody please post to some documentation on this?
True, the FIFO was only for the Streamer initially. I consider it separate now mainly because it can be, and mostly will be, used for non-Streamer activities. Secondly, the Streamer can function from the LUT alone. It doesn't use the FIFO then. Thirdly, I don't know if this was always so, the FIFO has to be initialised with it's own RD/WRFAST instructions.
Streamer documentation: The only one I know of is page 3 of Chip's doc where there is passing mention of what can use the FIFO.
There is probably code examples in Chip's PNUT/FPGA releases.
T... the FIFO has to be initialised with it's own RD/WRFAST instructions.
That's intriguing. That RDFAST dictates the HubRAM addresses. It hadn't really dawned on me until now that the Streamer has no say in addressing it's Hub data. On the other hand, if it is to fetch from the LUT it has to perform it's own addressing.
Obviously, I don't know what Chip has done for this.
Oddity with V4 FPGA files release 7th Nov. My test harness has stopped transmitting to the PST.
start
mov dira, ##$ffff
mov outa, #0
setb outb, #TX_PIN
setb dirb, #TX_PIN
setint1 %000 ' disable interrupt
setint2 %000 ' disable interrupt
setint3 %000 ' disable interrupt
loc ptra ,#main_stack' set stack for framework
' load COG code for direct calling - do not Coginit it
loc ptrb, #cog_start
setq #(cog_end-cog_start) ' thank you Searith
rdlong 0, PTRB
calla #rcv_char ' kickoff on receive char
Executes if I sprinkle some LED driving ops in there, but nothing comes out on the serial terminal. Some facts: I am running in Hubexec and using ptra for the stack exclusively, thus using calla/reta/pusha/popa instructions.
when I press CR on PST the leds which indicate a cog is running start lighting up. How does this happen on the bare board Nano when there are only two cogs?!
I have changed some code to reflect ALTDS change to ALTS. It picks up a registers address in cog ram and loads the value into scratch. It looks correct to me, but is the only thing I can think of. Original code was:
mov modifier, ptrb ' == register address
send_get_inst altds modifier, #000_000_100 ' use modifier as source reg
mov scratch, 0-0 ' read from [ptrb]
loc ptrb, #send_hex_value
wrlong scratch, ptrb ' write it to hub
new code is :-
mov modifier, ptrb ' == register address
send_get_inst alts modifier, #0 ' use modifier as source reg
mov scratch, 0-0 ' read from [ptrb]
loc ptrb, #send_hex_value
wrlong scratch, ptrb ' write it to hub
The Prop-Plug is functional, and if I run a modified version of mindrobot's comm_test_hi.spin, which is a hubexec version, characters I enter at PST are echoed back correctly.
So it must be in my code somewhere, but I am just not seeing where the problem is?
Having problems with V4 too.
Code that runs fine in V3 is broken in V4.
Spent the last 3 hours diving into my code (it's large)
Can't put a finger on the problem yet.
Having problems with V4 too.
Code that runs fine in V3 is broken in V4.
Spent the last 3 hours diving into my code (it's large)
Can't put a finger on the problem yet.
Aha, I shall wait a while on Chip before delving further.
I thought my Prop-Plug had died originally.
My code is also large, over 5000 lines. It was working without obvious error in V3.
Is your code cog, lut or hub exec, or a mixture?
Do you use a ptra or b stack at all?
Did you use altds and now use altd/s/i ?
True, the FIFO was only for the Streamer initially. I consider it separate now mainly because it can be, and mostly will be, used for non-Streamer activities. Secondly, the Streamer can function from the LUT alone. It doesn't use the FIFO then. Thirdly, I don't know if this was always so, the FIFO has to be initialised with it's own RD/WRFAST instructions.
Streamer documentation: The only one I know of is page 3 of Chip's doc where there is passing mention of what can use the FIFO.
There is probably code examples in Chip's PNUT/FPGA releases.
So the streamer is just a very simple DMA controller. This just requires an address counter and a small state machine. It can't be more than a few hundred gates per cog, which would be a very small fraction of a cog.
@78rpm
Both cog and hub exec.
PTRA and PTRB stacks.
ALTDS (now ALTI)
Pnut appears to be compiling ALTx Ok.
Program with data >500K in hub.
Mine is mainly hub withsome calls into cog.
I use PTRA stack, but I use ptrb a lot for indirect addressing.
ALTDS (now ALTS) in my case
Program with data c.9-10KB mainly in hub, some cog.
I also load the initial cog but do not issue a coginit, just call directly into the cog.
I do use a lot of rdlong / wrlong and byte varients to get at data in hub. Of course, a "PUSHA my_reg" is an alias for "WRLONG my_reg,PTRA++".
I changed the CALLA #send_char in my previous pst to a CALL and modified the RETA to RET, but that made no difference, so I'm not sure it is a stack problem, unless perhaps stack direction has suddenly changed, ie downards instead of up?
I am grabbing at straws here, or perhaps thin air. Without being able to see inside easily with the serial terminal I can really only resort to driving leds to find how far I get.
All my little demos work with v4 so far but none of them are large, use much in the way of memory or do any ALT or instruction modification. I've mostly been playing with interrupts (I know, I'm a heathen!) and execution modes.
I get to play some today, so I'll see if I can find any issues that aren't mine.
Wowzer! I'm not sure I'll ever be able to write 5000+ lines of PASM again!
All my little demos work with v4 so far but none of them are large, use much in the way of memory or do any ALT or instruction modification. I've mostly been playing with interrupts (I know, I'm a heathen!) and execution modes.
I get to play some today, so I'll see if I can find any issues that aren't mine.
Wowzer! I'm not sure I'll ever be able to write 5000+ lines of PASM again!
1236 lines of the program where generated by the program itself, then cut'n'pasted from PST. It is all part of my rd/wr byte/word/long unit test with every permissible use of index value and --/++. It's also to verify that PNut and the test, which synthesises each instruction, agree on the binary representation.
206 indvidual forms of the ptr expression
* 3 for byte, word, long instruction forms
* 2 for read and write
gives us 1236, but
* 3 for execution of the instruction in cog, lit and hub exec modes. The instructions of course only rd/wr between cog and hub, but their execution has to be verified in all exec memory spaces. So a total of 3708 tests.
At least you now know it's not just your code experiencing problems, not that it is a tremendous help, but hopefully we can get things resolved soon and get back on track.
Enjoy you kip, I think I may tackle my soldering that cries out for attention.
I have done more testing with the LEDs and it looks to me that rcv_char does not execute as before. In fact it gets stuck at WAITEDG, as LEDs 5 and 4 are on, all others are off. Yet the same code works in the comm_test_hi.spin example. The only difference in my code is the CALLA and RETA. Changing those to CALL and RET make no difference.
rcv_char
setedg #%0_10_000000 | RX_PIN 'select negative edge on p63
setb outa,#5
polledg 'clear edge detector
setb outa,#4
waitedg 'wait for start bit
setb outa,#3
waitx bit_time 'wait for middle of 1st data bit
setb outa,#2
rep @.rep,#8 'ready for 8 bits
testb inb,#RX_PIN wc 'sample rx
rcr rx_char,#1 'rotate bit into byte
waitx bit_time 'wait for middle of nth data bit
.rep
setb outa,#1
shr rx_char,#32-8 'justify received byte
setb outa,#0
reta
My program is larger, is that a clue? rcv_char is @ $43d2 in my code.
Seriously, put an ALIGNL before rcv_char to bring it back to LONG aligned memory allocation.
In the code below, it works fine with the ORGH before rcv_char set to $43D0 but breaks if set to $43D2. Either byte aligned code execution got broken or we lost that feature someplace and you need to make sure you manually bring any code into alignment after you've defined WORD or BYTE data in HUBRAM.
con
SYS_CLK = 50_000_000
BAUD_RATE = 115_200
RX_PIN = 63
TX_PIN = 62
dat
orgh 0
org 0
jmp #@cog_entry
' these need to stay here (under org 0) so they are in COGRAM and get initialized
bit_time long SYS_CLK / BAUD_RATE
tx_char res 1
timer res 1
rx_char res 1
' plenty of room to play in High HUBRAM (COG0 in HUBEXEC at this point)
orgh $4300
cog_entry
setb outb, #TX_PIN
setb dirb, #TX_PIN
'*******************************************************
'********* TEST CODE - PUT YOUR CODE HERE **************
' ***** try some input/output
loopback
call #@rcv_char
mov tx_char, rx_char
call #@send_char
jmp #@loopback
'********* TEST CODE - REPLACE WITH YOUR CODE ***********
'********************************************************
'*******************************************************************************
' Get one character from the input port.
' Input none
' Changes parm, temp, temp1, temp2
' Output parm
'*******************************************************************************
orgh $43d0
rcv_char
setedg #%0_10_000000 | RX_PIN 'select negative edge on p63
polledg 'clear edge detector
waitedg 'wait for start bit
waitx bit_time 'wait for middle of 1st data bit
rep @.rep,#8 'ready for 8 bits
testb inb,#RX_PIN wc 'sample rx
rcr rx_char,#1 'rotate bit into byte
waitx bit_time 'wait for middle of nth data bit
.rep
shr rx_char,#32-8 'justify received byte
ret
'*******************************************************************************
' Output a single character to the tx_pin.
' executes in COG mode
' Input: txchar - character to be sent
' Changes parm, temp1, temp2
' Output none
'*******************************************************************************
send_char setb tx_char,#8
shl tx_char,#1
getct timer
rep @.txrep,#10
testb tx_char,#0 wz
setbnz outb,#TX_PIN
addct1 timer,bit_time
waitct1
shr tx_char,#1
.txrep
ret
See? This is fun!!!! (OK, I had fun playing detective!)
Trouble is, there are byte strings near routines all over the place at present, it is convienient when developing and debugging.
What a pain, so it's alignment which has broken for some reason.
YUP! Confirmed by placing one ALIGNL, but there are millions, ok, perhaps a couple of dozen, other places to add them. Well at least we have a workaround for the moment.
I can narrow that down to either PNUT or the FPGA image! As I think about it more, I think PNUT is in the clear. Chip was working in the streamer which also plays a part in Hub exec, maybe?
Since my little test program doesn't use anything that has changed recently, I think I can go backward and use old PNUTs against the new FPGA image.
If I narrow it down, I'll start another thread with just this problem.
Ah, but the LED never gets set after WAITEDG which puts the cog into a low power state whilst it waits for the event to occur. Perhaps Chip now only saves or restores a PC with the two lsbs clear? Maybe it is a bug introduced with the event/polling/getint update, just laying there dormant until now?
'SETDACS D/#' is used to set the four 8-bit DACs associated with each cog. For cog0, this mean pins 0..3, while for cog1, this means pins 4..7, etc.
For any clock on which the streamer is outputting, it selectively mux's its four outputs in lieu of original SETDACS four-byte data, on a per-byte/DAC basis, as the streamer can be configured to affect only certain bytes/DACs.
The colorspace converter, when enabled, grabs the DAC-bound data from the streamer and outputs new DAC-bound data 4 clocks later, mux'ing its output data in lieu of what was going to the DACs from the streamer/SETDACS.
So, if you don't use the streamer, but turn on the colorspace converter, any SETDACS instruction will directly establish the inputs to the colorspace converter. For something as low-bandwidth as NTSC, it's not even necessary to use the streamer, but it can help a lot, and allow you to move video updating entirely into an interrupt.
One thing that has changed with PNut is that it only downloads up to where the last code was emitted. It used to download the whole memory image, but that was wasting time.
If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?
One thing that has changed with PNut is that it only downloads up to where the last code was emitted. It used to download the whole memory image, but that was wasting time.
If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?
There has been no change in alignment rules.
I don't think so, it's narrowed down later on. Mindrobots has started a new thread about a _v4 bug, which links to this one.
One thing that has changed with PNut is that it only downloads up to where the last code was emitted. It used to download the whole memory image, but that was wasting time.
If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?
There has been no change in alignment rules.
I could see that being the case except the ALIGNL makes it work.
I'll try more testing, I have an even smaller program that exhibits the problem.
con
main_led = 0
isr_led = 1
dat
orgh 0
org 0
start
setb dirb,#main_led
setb dirb,#isr_led
getct isrticks
addct1 isrticks,isr_wait
loc adra,#@isr
mov ijmp1,adra
setint1 #1
blink
notb outb,#main_led 'flip its output state
waitx main_wait ' WAITX blocks interrupts
jmp #blink 'do it again
isr_wait long 50_000_000
main_wait long 5_000_000
isrticks long 0
isr_in_hub long 0
orgh $402
ALIGNL ' needed to work on V4 FPGA
isr
notb outb,#isr_led
addct1 isrticks, isr_wait
reti1
(I never thought I'd need a SECOND 1-2-3 board for testing!!! Don't tell Ken!)
So I apologize if you were only referring to the NTSC feature. I agree with you that all cogs would not need this, but I don't like the idea of having cogs with different functionality. The NTSC feature should be a single resource that is shared by all the cogs.
Not sure how practical that is, as it needs to bolt onto streamer flow ?
Adding a 17th streamer means a 17th slot = not possible ?
One channel of NTSC may be too light, but 16 is unlikely to be used.
It comes down to Logic cost, and if something else more important can fit, then COG peripherals do not have to be all equal.
Perhaps with the device routing, a NTSC cell can be MUX'd between two COGS, halving the NTSC count, and every COG can access one.
However, I wonder whether an NTSC feature is needed at all. I would much rather see HDMI.
Yes, but 180nm is not going to manage HDMI.
You can do HDMI with add-on parts.
However, note that everyone does HDMI in the Big Iron ARM parts, and P2 would be lost in the noise.
I think the Composite Video space (and direct LCD drive) are going to be important enough for a long time, and P2 can excel there.
Comments
That's only the FIFO. The Streamer makes use of that FIFO when fetching from HubRAM but they are independent hardware otherwise.
Just remember that if smart pins are to be "streamlined" that we have the inverter available although IMO the AND arrangement is far more sensible and in keeping with how signals are actually used between cogs.
On the matter of streamlined smartpins if that is what is needed to induce a P2 birth then we would still want some form of serdes and if analog complicates everything then at least have the counter modes from the P1 that allowed us to create DACs easily and output selectable frequencies.
Also, mentioned before about upgrading the bootloader to at least load from SPI Flash. This would be a really big help for me and allow me to build up Tachyon with all the filesystem and networking present at boot.
P.S. I came across this post while looking for information on the new ALTx instructions to include in my assembler document. Would be nice if each of these instructions had their own thread rather than getting buried deep and then not knowing which thread it was discussed in.
EDIT: Found it in it's own thread!
I guess the powers that be will state what goes in or not anyway
However, I wonder whether an NTSC feature is needed at all. I would much rather see HDMI.
Streamer documentation: The only one I know of is page 3 of Chip's doc where there is passing mention of what can use the FIFO.
There is probably code examples in Chip's PNUT/FPGA releases.
That's intriguing. That RDFAST dictates the HubRAM addresses. It hadn't really dawned on me until now that the Streamer has no say in addressing it's Hub data. On the other hand, if it is to fetch from the LUT it has to perform it's own addressing.
Obviously, I don't know what Chip has done for this.
Now, if I add the following to the code above: when I press CR on PST the leds which indicate a cog is running start lighting up. How does this happen on the bare board Nano when there are only two cogs?!
I have changed some code to reflect ALTDS change to ALTS. It picks up a registers address in cog ram and loads the value into scratch. It looks correct to me, but is the only thing I can think of. Original code was:
new code is :-
The Prop-Plug is functional, and if I run a modified version of mindrobot's comm_test_hi.spin, which is a hubexec version, characters I enter at PST are echoed back correctly.
So it must be in my code somewhere, but I am just not seeing where the problem is?
Any thoughts?
Code that runs fine in V3 is broken in V4.
Spent the last 3 hours diving into my code (it's large)
Can't put a finger on the problem yet.
Aha, I shall wait a while on Chip before delving further.
I thought my Prop-Plug had died originally.
My code is also large, over 5000 lines. It was working without obvious error in V3.
Is your code cog, lut or hub exec, or a mixture?
Do you use a ptra or b stack at all?
Did you use altds and now use altd/s/i ?
I wonder if it is a PNut problem or FPGA?
Both cog and hub exec.
PTRA and PTRB stacks.
ALTDS (now ALTI)
Pnut appears to be compiling ALTx Ok.
Program with data >500K in hub.
@Chip: How about it? Just scribble it onto some paper or other. We can make it nice and pretty.
Mine is mainly hub withsome calls into cog.
I use PTRA stack, but I use ptrb a lot for indirect addressing.
ALTDS (now ALTS) in my case
Program with data c.9-10KB mainly in hub, some cog.
I also load the initial cog but do not issue a coginit, just call directly into the cog.
I do use a lot of rdlong / wrlong and byte varients to get at data in hub. Of course, a "PUSHA my_reg" is an alias for "WRLONG my_reg,PTRA++".
I changed the CALLA #send_char in my previous pst to a CALL and modified the RETA to RET, but that made no difference, so I'm not sure it is a stack problem, unless perhaps stack direction has suddenly changed, ie downards instead of up?
I am grabbing at straws here, or perhaps thin air. Without being able to see inside easily with the serial terminal I can really only resort to driving leds to find how far I get.
I get to play some today, so I'll see if I can find any issues that aren't mine.
Wowzer! I'm not sure I'll ever be able to write 5000+ lines of PASM again!
206 indvidual forms of the ptr expression
* 3 for byte, word, long instruction forms
* 2 for read and write
gives us 1236, but
* 3 for execution of the instruction in cog, lit and hub exec modes. The instructions of course only rd/wr between cog and hub, but their execution has to be verified in all exec memory spaces. So a total of 3708 tests.
Time to step away and have another look after some sleep.
Enjoy you kip, I think I may tackle my soldering that cries out for attention.
My program is larger, is that a clue? rcv_char is @ $43d2 in my code.
Don't put your rcv_char at address $43d2!
Seriously, put an ALIGNL before rcv_char to bring it back to LONG aligned memory allocation.
In the code below, it works fine with the ORGH before rcv_char set to $43D0 but breaks if set to $43D2. Either byte aligned code execution got broken or we lost that feature someplace and you need to make sure you manually bring any code into alignment after you've defined WORD or BYTE data in HUBRAM.
See? This is fun!!!! (OK, I had fun playing detective!)
Trouble is, there are byte strings near routines all over the place at present, it is convienient when developing and debugging.
What a pain, so it's alignment which has broken for some reason.
YUP! Confirmed by placing one ALIGNL, but there are millions, ok, perhaps a couple of dozen, other places to add them. Well at least we have a workaround for the moment.
The question to ask is, why has it broken?
I can narrow that down to either PNUT or the FPGA image! As I think about it more, I think PNUT is in the clear. Chip was working in the streamer which also plays a part in Hub exec, maybe?
Since my little test program doesn't use anything that has changed recently, I think I can go backward and use old PNUTs against the new FPGA image.
If I narrow it down, I'll start another thread with just this problem.
Now, get back to your testing!!!
New thread coming.
For any clock on which the streamer is outputting, it selectively mux's its four outputs in lieu of original SETDACS four-byte data, on a per-byte/DAC basis, as the streamer can be configured to affect only certain bytes/DACs.
The colorspace converter, when enabled, grabs the DAC-bound data from the streamer and outputs new DAC-bound data 4 clocks later, mux'ing its output data in lieu of what was going to the DACs from the streamer/SETDACS.
So, if you don't use the streamer, but turn on the colorspace converter, any SETDACS instruction will directly establish the inputs to the colorspace converter. For something as low-bandwidth as NTSC, it's not even necessary to use the streamer, but it can help a lot, and allow you to move video updating entirely into an interrupt.
If your .spin file left off emitting hub data at an address below some other data, it only loads up to the lower address. Could that be the problem?
There has been no change in alignment rules.
I could see that being the case except the ALIGNL makes it work.
I'll try more testing, I have an even smaller program that exhibits the problem.
(I never thought I'd need a SECOND 1-2-3 board for testing!!! Don't tell Ken!)
Adding a 17th streamer means a 17th slot = not possible ?
One channel of NTSC may be too light, but 16 is unlikely to be used.
It comes down to Logic cost, and if something else more important can fit, then COG peripherals do not have to be all equal.
Perhaps with the device routing, a NTSC cell can be MUX'd between two COGS, halving the NTSC count, and every COG can access one.
Yes, but 180nm is not going to manage HDMI.
You can do HDMI with add-on parts.
However, note that everyone does HDMI in the Big Iron ARM parts, and P2 would be lost in the noise.
I think the Composite Video space (and direct LCD drive) are going to be important enough for a long time, and P2 can excel there.