Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 206 — Parallax Forums

Propeller II update - BLOG

1203204206208209223

Comments

  • ctwardellctwardell Posts: 1,716
    edited 2014-03-09 17:44
    Ahle2,

    A problem with the solution you have suggested is that it makes things much more difficult on the scheduler side because it no longer has a simple absolute test for a non-zero value or a specific value in a flag bit.

    With Bill's approach the scheduler needs to watch for the register loaded by TPAUSE to be non-zero.
    Bill puts any return value to the client in a separate register from the register loaded by TPAUSE and then clears the register loaded by TPAUSE to signal the TPAUSE to resume.
    The scheduler can immediately resume polling.

    With jmg's approach the scheduler needs to watch for bit31 to be set in the register loaded by TPAUSE. A value is returned to the client using the same register used by TPAUSE, but bit31 must be cleared to tell the TPAUSE to resume.
    The scheduler can immediately resume polling.

    With your approach I don't see a simple way for the scheduler to operate.

    If we want to do simple non-zero testing in the scheduler the value in the register to be loaded by TPAUSE must be zero until TPAUSE causes it to change.
    Whenever the scheduler writes a non-zero value back to a client the register is not in the proper state for polling, so the scheduler would need to wait some amount of time to allow the client to grab the return value and then clear the register so it could start polling again.

    C.W.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 18:07
    ctwardell wrote: »
    Ahle2,

    A problem with the solution you have suggested is that it makes things much more difficult on the scheduler side because it no longer has a simple absolute test for a non-zero value or a specific value in a flag bit.

    With Bill's approach the scheduler needs to watch for the register loaded by TPAUSE to be non-zero.
    Bill puts any return value to the client in a separate register from the register loaded by TPAUSE and then clears the register loaded by TPAUSE to signal the TPAUSE to resume.
    The scheduler can immediately resume polling.

    With jmg's approach the scheduler needs to watch for bit31 to be set in the register loaded by TPAUSE. A value is returned to the client using the same register used by TPAUSE, but bit31 must be cleared to tell the TPAUSE to resume.
    The scheduler can immediately resume polling.

    With your approach I don't see a simple way for the scheduler to operate.

    If we want to do simple non-zero testing in the scheduler the value in the register to be loaded by TPAUSE must be zero until TPAUSE causes it to change.
    Whenever the scheduler writes a non-zero value back to a client the register is not in the proper state for polling, so the scheduler would need to wait some amount of time to allow the client to grab the return value and then clear the register so it could start polling again.

    C.W.
    Am I missing something really basic with Ahle2's suggestion???
    His would work with both Bill's suggestion and jmg's suggestion, as well as perhaps a lot of others.

    For Bill,
    TCHECK D, #somevalue
    ...
    MOV D, #0 'release the TCHECK because 0 <> #somevalue

    For jmg,
    AUGS #bit31
    TCHECK D, #somevalue 'or S=b31 | somevalue
    ...
    CLRB D,#31 'clear bit31: release the TCHECK because 0 | #somevalue <> 1 | somevalue

    Apart from the extra silicon to hold a copy of the value set by TCHECK, what could be simpler and more powerful ??? What am I missing???
  • ctwardellctwardell Posts: 1,716
    edited 2014-03-09 18:29
    Cluso99 wrote: »
    Am I missing something really basic with Ahle2's suggestion???
    His would work with both Bill's suggestion and jmg's suggestion, as well as perhaps a lot of others.

    For Bill,
    TCHECK D, #somevalue
    ...
    MOV D, #0 'release the TCHECK because 0 <> #somevalue

    For jmg,
    AUGS #bit31
    TCHECK D, #somevalue 'or S=b31 | somevalue
    ...
    CLRB D,#31 'clear bit31: release the TCHECK because 0 | #somevalue <> 1 | somevalue

    Apart from the extra silicon to hold a copy of the value set by TCHECK, what could be simpler and more powerful ??? What am I missing???

    Got it now! I was mixing Bill's and jmg's when working through Ahle2's.

    Looks like that is a good solution for TPAUSE (TCHECK?).

    While it does leave a more flexible TPAUSE, there is still the need for the additional jump negative instruction to fully implement jmg's approach.

    C.W.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-03-09 18:47
    jmg wrote: »
    You can just use positive and negative, but I think that is also a new opcode.

    Hi All
    Having finished coffee #7 (or was it #8?) I just realised something.
    Mindrobots was on the right track. We don't need new instructions.

    We already have JP (Jump positive) and JNP (jump not positive) instructions and JPD,JNPD (delayed).
    The JNP instruction is actually the "JB31" instruction jmg suggested.

    Cheers
    Brian
  • cgraceycgracey Posts: 14,133
    edited 2014-03-09 18:54
    The problem with implementing the 'current == original' test is that it requires another 32 bits of state storage per task (128 more flops per cog, or 1024 per chip), of which 32 bits must also be stored in the WIDEs for T3LOAD/T3SAVE. There are only 12 more bits available there, though. So, this could work if the value was only tracked to 12 bits. For what extra benefit this would give, it seems way too expensive, even if we limited the value checking to 9 bits, all around. You can always use another register to convey a feedback value.
  • ctwardellctwardell Posts: 1,716
    edited 2014-03-09 18:56
    ozpropdev wrote: »
    Hi All
    Having finished coffee #7 (or was it #8?) I just realised something.
    Mindrobots was on the right track. We don't need new instructions.

    We already have JP (Jump positive) and JNP (jump not positive) instructions and JPD,JNPD (delayed).
    The JNP instruction is actually the "JB31" instruction jmg suggested.

    Cheers
    Brian

    Those test pin states, not bit31.

    From Prop2_Docs.txt:
    To avoid excessively stalling the pipeline during multi-tasking, the WAITCNT/WAITPEQ/WAITPNE
    instructions can be substituted with non-stalling alternatives:
    
      PASSCNT D/#        jumps to itself if some amount of time has not passed, use instead of WAITCNT
      JP/JNP  D/#,S/#    jumps based on pin states, use instead of WAITPEQ/WAITPNE
    

    C.W.
  • roglohrogloh Posts: 5,171
    edited 2014-03-09 18:59
    I really like how Ahle2's idea can be worked both ways and is method agnostic. Having the ability for a automatically getting a return value (aka jmg method) could be quite useful in some cases. The extra scheduler task overhead of needing multiple cycles for polling a set bit (eg bit31) in the written value by the TPAUSE instruction (or whatever it is called nowadays) instead of using quick and atomic JZ (or TJNZ) may not necessarily be a big problem.. After all if we are giving 1/16's of the CPU to the scheduler and it is more than likely scheduling threads at a rate much slower than this there will be plenty of time for extra more complex instructions to do checks like this. And we can still do the zero based approach too so we give up nothing from a software point of view.

    I guess we will see if Chip likes it too vs what he already has now.

    UPDATE: Oops, too late. Chip has already responded. I gotta remember to check before submitting so I am not out of date. :frown:
  • cgraceycgracey Posts: 14,133
    edited 2014-03-09 19:00
    ctwardell wrote: »
    Those test pin states, not bit31.

    C.W.


    When we had lots of 'D,S/#' opcode space, I added Jump-positive/negative instructions, but they've since been removed to make room for other things. They were freebies at one time, because they take almost zero logic to implement, given the opcode space is available.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-03-09 19:30
    ctwardell wrote: »
    Those test pin states, not bit31.

    C.W
    Yikes! :lol:

    Clearly I need to change my "brand" of coffe!

    Where I went wrong on that one was a quick test of JP,JNP gave me the correct results.
    Looking at it further I just happened to meet the "right states" in all of my tests.
    1,2,3,4 ok positive
    -1,-2,-3 ok negative
    garbage in = garbage out!

    Cheers
    Brian
  • roglohrogloh Posts: 5,171
    edited 2014-03-09 19:32
    cgracey wrote: »
    The problem with implementing the 'current == original' test is that it requires another 32 bits of state storage per task (128 more flops per cog, or 1024 per chip), of which 32 bits must also be stored in the WIDEs for T3LOAD/T3SAVE. There are only 12 more bits available there, though. So, this could work if the value was only tracked to 12 bits. For what extra benefit this would give, it seems way too expensive, even if we limited the value checking to 9 bits, all around. You can always use another register to convey a feedback value.

    I really hope we don't find too many more state resources per task to add from here on in. Those 12 free bits you still have left in the WIDE are going to get pretty precious now. :smile: We will have to be selective. Any new hardware we request like CRC etc might want to get targetted per COG not per task.
  • cgraceycgracey Posts: 14,133
    edited 2014-03-09 19:36
    rogloh wrote: »
    I really hope we don't find too many more state resouces per task to add from here on in. Those 12 free bits you still have left in the WIDE are going get pretty precious now. :smile: We will have to be selective. Any new hardware we request like CRC etc might want to get targetted per COG not per task.

    If we can keep any new instructions ATOMIC, so that they work on an addressable register, it will be smooth sailing.
  • ctwardellctwardell Posts: 1,716
    edited 2014-03-09 19:40
    ozpropdev wrote: »
    C.W
    Yikes! :lol:

    Clearly I need to change my "brand" of coffe!

    Where I went wrong on that one was a quick test of JP,JNP gave me the correct results.
    Looking at it further I just happened to meet the "right states" in all of my tests.
    1,2,3,4 ok positive
    -1,-2,-3 ok negative
    garbage in = garbage out!

    Cheers
    Brian

    lol, no worries! I think I must have been drinking that same coffee that past few weeks, some would say a lot longer...

    C.W.
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-03-09 20:39
    ozpropdev wrote: »
    Hi All
    Having finished coffee #7 (or was it #8?) I just realised something.
    Mindrobots was on the right track. We don't need new instructions.

    We already have JP (Jump positive) and JNP (jump not positive) instructions and JPD,JNPD (delayed).
    The JNP instruction is actually the "JB31" instruction jmg suggested.

    Cheers
    Brian

    I knew you had the wrong brand of coffee when you said I was on the right track.

    I still go back to my earlier statement that Chip's latest proposal was simple, elegant and propeller-like and gives enough hardware support to let the software play.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 21:38
    cgracey wrote: »
    The problem with implementing the 'current == original' test is that it requires another 32 bits of state storage per task (128 more flops per cog, or 1024 per chip), of which 32 bits must also be stored in the WIDEs for T3LOAD/T3SAVE. There are only 12 more bits available there, though. So, this could work if the value was only tracked to 12 bits. For what extra benefit this would give, it seems way too expensive, even if we limited the value checking to 9 bits, all around. You can always use another register to convey a feedback value.
    I wasn't even thinking that they would require saving.

    Agreed, way too expensive. The current TCHECK (or whatever) waiting for =0 is a fine solution.

    How close is a release?
  • cgraceycgracey Posts: 14,133
    edited 2014-03-09 21:51
    Cluso99 wrote: »
    How close is a release?


    I've just got to test the T3SAVE/T3LOAD instructions, then update the documentation. So, hopefully early this week.
  • Ahle2Ahle2 Posts: 1,178
    edited 2014-03-10 00:06
    cgracey wrote: »
    The problem with implementing the 'current == original' test is that it requires another 32 bits of state storage per task (128 more flops per cog, or 1024 per chip), of which 32 bits must also be stored in the WIDEs for T3LOAD/T3SAVE. There are only 12 more bits available there, though. So, this could work if the value was only tracked to 12 bits. For what extra benefit this would give, it seems way too expensive, even if we limited the value checking to 9 bits, all around. You can always use another register to convey a feedback value.
    Ouch! :(
    It seems so easy when the sausage maker adds the ingredients. It is hard beeing a naive sausage trainee with dirty little fingers.

    /Johannes
  • BaggersBaggers Posts: 3,019
    edited 2014-03-10 02:26
    cgracey wrote: »
    I've just got to test the T3SAVE/T3LOAD instructions, then update the documentation. So, hopefully early this week.

    Awesome news Chip, can't wait to play :)
  • Heater.Heater. Posts: 21,230
    edited 2014-03-10 02:50
    Ahle2,
    It is hard being a naive sausage trainee with dirty little fingers.
    Yep. It's hard to not get your fingers caught in the sausage machine!
  • MJBMJB Posts: 1,235
    edited 2014-03-10 05:53
    jmg wrote: »
    I still do not like the inefficiency of using 32 bits as a boolean.., and the asymmetry of message passing.

    mov task3result,result ' optionally pass back result
    mov task3req,#0 ' release task if PC not incremented past TPAUSE

    - but I also do not see an opcode that neatly allows compact mixing of flags and params

    I think we don't have any
    JBITZ D, addr, #0..31
    JBITNZ D, addr, #0..31
  • jmgjmg Posts: 15,148
    edited 2014-03-10 13:02
    MJB wrote: »
    I think we don't have any
    JBITZ D, addr, #0..31
    JBITNZ D, addr, #0..31

    :) Those would be nice.

    in #6159 Chip said there were Sign-bit testing jump-bit opocodes, but they were removed to make room for others, and there are a couple of USB opcodes still to fit in somewhere...
  • cgraceycgracey Posts: 14,133
    edited 2014-03-10 19:48
    Today I got the preemptive tasking proven and added a few new instructions:

    TPOP D,S/# 'pop task S/#'s LIFO stack into D
    TPUSH D/#,S/# 'push D/# into task S/#'s LIFO stack
    TJMP D/#,S/# 'set task S/#'s PC to D/# and reset all task-related states for a clean task restart

    TPOP and TPUSH were needed to facilitate storing task 3's LIFO stack, in conjunction with T3SAVE/T3LOAD.

    TJMP replaces the old JMPTASK instruction, but with reversed operands, so that it agrees with TPOP/TPUSH.

    Right now, I'm rearranging the bit order of data that get stored into the WIDEs for T3SAVE/T3LOAD, so that it will be very simple to start up a thread by pre-setting the WIDE values: PC goes into the lower word of WIDE[0] and all other bits, through WIDE[7], can be cleared to 0's to give a sensible startup configuration for a thread.
  • roglohrogloh Posts: 5,171
    edited 2014-03-10 20:03
    Sounds good. I guess for TJMP you didn't want a new task to startup thinking it was it the middle of a REP loop or something crazy like that so had to clear things.

    What are the known defaults at startup as far as task specific registers? For example are the PTRA/B/X/Y going to be cleared here as well?
  • cgraceycgracey Posts: 14,133
    edited 2014-03-10 20:13
    rogloh wrote: »
    Sounds good. I guess for TJMP you didn't want a new task to startup thinking it was it the middle of a REP loop or something crazy like that so had to clear things.

    What are the known defaults at startup as far as task specific registers? For example are the PTRA/B/X/Y going to be cleared here as well?


    When you do a TJMP, none of the pointers or flags are affected, though everything indicating that some special state was in progress is reset, like REPS/REPD, delayed jump pending, TLOCK pending, THALT post first iteration, and a few other things I can't remember at the moment. It just makes sure, like you said, that the task starts off cleanly, without its shoe laces tied together or some hitch-up in its get-along.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-10 20:23
    Sounds good!
    cgracey wrote: »
    Today I got the preemptive tasking proven and added a few new instructions:

    TPOP D,S/# 'pop task S/#'s LIFO stack into D
    TPUSH D/#,S/# 'push D/# into task S/#'s LIFO stack
    TJMP D/#,S/# 'set task S/#'s PC to D/# and reset all task-related states for a clean task restart

    TPOP and TPUSH were needed to facilitate storing task 3's LIFO stack, in conjunction with T3SAVE/T3LOAD.

    TJMP replaces the old JMPTASK instruction, but with reversed operands, so that it agrees with TPOP/TPUSH.

    Right now, I'm rearranging the bit order of data that get stored into the WIDEs for T3SAVE/T3LOAD, so that it will be very simple to start up a thread by pre-setting the WIDE values: PC go into the lower word of WIDE[0], and all other bits, through WIDE[7] can be cleared to 0's to give a sensible startup configuration for a thread.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 01:38
    Chip,
    Would you mind posting the Verilog for one of the simple P2 instruction sections? Perhaps an ADD would be nice.

    That way I can try and keep the USB instruction Verilog using some of the same variable names, and also perhaps understand Verilog a little more.
  • cgraceycgracey Posts: 14,133
    edited 2014-03-11 02:06
    Cluso99 wrote: »
    Chip,
    Would you mind posting the Verilog for one of the simple P2 instruction sections? Perhaps an ADD would be nice.

    That way I can try and keep the USB instruction Verilog using some of the same variable names, and also perhaps understand Verilog a little more.


    All those sections are in the context of the whole design, with special flop declarations to expedite decoding, and output aimed at the result mux, so I think it might only turn you off. Whatever you come up with I'll have to implement, anyway, to make it fit into the overall design.

    For what it's worth, I'll post a submodule that stands alone, without any greater context, so that you can see the whole picture of a functional section. The main cog Verilog code makes all kinds of global references everywhere, so any piece of it won't make much sense. Here is the 32x32 multiplier, though. This can be compiled all by itself:
    // MLT
    
    module		mlt
    (
    input		clk,
    input		ena,
    input		set,
    input		sign,
    input	[31:0]	d,
    input	[31:0]	s,
    
    output		done,
    output	[63:0]	p
    );
    
    
    reg  [5:0]	n;
    reg [33:0]	t;
    reg [33:0]	m;
    reg [69:0]	a;
    
    
    // multiply
    
    wire busy	= n[5];
    
    always @(posedge clk or negedge ena)
    if (!ena)
    	n <= 6'b0;
    else if (set || busy)
    	n <= set ? 6'b101111
    		 : n + 1'b1;
    
    always @(posedge clk)
    if (set || busy)
    	t <= set ? {sign && d[31], d, 1'b0}
    		 : {{2{t[33]}}, t[33:2]};
    
    always @(posedge clk)
    if (set)
    	m <= {{2{sign && s[31]}}, s};
    
    
    // booth functions
    
    wire onex	= t[1] ^ t[0];
    wire twox	= t[2] ^ t[1] && !onex;
    wire notx	= t[2];
    
    wire [35:0] b	= {36{notx}} ^ {{2{(twox || onex) && m[33]}} ^ 2'b10, {34{twox}} & {m[32:0], 1'b0} | {34{onex}} & m[33:0]};
    
    wire [37:0] sum	= {2'b00, a[69:34]} + {2'b01, b};
    
    always @(posedge clk)
    if (set || busy)
    	a <= set ? {3'b100, {{2{sign && d[31]}}, d} & 34'h2AAAAAAAA, 33'b0}
    		 : {sum[37:0], a[33:2]};
    
    
    // result
    
    assign done	= !busy;
    
    assign p	= a[63:0];
    
    endmodule
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 03:03
    Thanks Chip.

    I was trying to keep the naming common, and perhaps some other things. Of course you will need to verify what I write anyway as I am such a novice here.
    Do you just call the Z & C flags z and c?

    I have posted a possible USB instruction over on the USB thread. It still has some bugs and I need to add a bit counter to count up an incoming byte.
  • cgraceycgracey Posts: 14,133
    edited 2014-03-11 03:26
    Cluso99 wrote: »
    Thanks Chip.

    I was trying to keep the naming common, and perhaps some other things. Of course you will need to verify what I write anyway as I am such a novice here.
    Do you just call the Z & C flags z and c?

    I have posted a possible USB instruction over on the USB thread. It still has some bugs and I need to add a bit counter to count up an incoming byte.


    Yep, they're just called 'z' and 'c'.

    That's great that you are coming up with an instruction!!! I've glanced through that thread and I can see that you guys are working out the parameters of what must be done. I think that is the hard part of anything - to qualify it with its parameters. The coding is always the easy part. Once you know WHAT to do, you're almost done.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 03:36
    cgracey wrote: »
    Yep, they're just called 'z' and 'c'.

    That's great that you are coming up with an instruction!!! I've glanced through that thread and I can see that you guys are working out the parameters of what must be done. I think that is the hard part of anything - to qualify it with its parameters. The coding is always the easy part. Once you know WHAT to do, you're almost done.
    Thanks Chip.

    I am pretty sure about the CRC now. Looks like I can get the whole bit receive including crc and unstuffing to work in 1 instruction. Just need to verify the unstuffing and then finish the byte counter. That way, after calling the instruction 8+ times I will have a byte assembled, together with the crc accumulation. I am also using the z and c flags to return a completed byte flag, and the SE0/SE1 condition. This should really help shorten the USB receive routine.
    Hopefully the instruction can be used for transmit as well (after outputting the pins).
  • cgraceycgracey Posts: 14,133
    edited 2014-03-11 05:07
    Cluso99 wrote: »
    Thanks Chip.

    I am pretty sure about the CRC now. Looks like I can get the whole bit receive including crc and unstuffing to work in 1 instruction. Just need to verify the unstuffing and then finish the byte counter. That way, after calling the instruction 8+ times I will have a byte assembled, together with the crc accumulation. I am also using the z and c flags to return a completed byte flag, and the SE0/SE1 condition. This should really help shorten the USB receive routine.
    Hopefully the instruction can be used for transmit as well (after outputting the pins).


    Wow! It sounds like you've really nailed what needs to be done. Good job! This is going to be something really valuable.
Sign In or Register to comment.