Multiple Cogs accessing the same object
jsaddiction
Posts: 84
Ok so I have done some searching and can not find an understandable answer to my question.
I am working on an ethernet project using a WIZnet for quickstart board. I am currently working on an object called "ethernet". Ethernet has one reference to the object "W5200" which starts an spi interface to the chip.
In my "Ethernet" object I have three methods that have the ability to access the W5200 object and communicate with the WIZnet chip.
The first one is called get_data and is launched with cognew command and runs in a repeat loop which "mines" data from a database using http POST requests and a PHP script. This should give me quick access time to a global array in which the "mined" data resides.
The second one is called get_Internet_time and runs when a parent object calls it. It returns SNTP time stamp used to sync the RTC.
The third one is called get_Sunset and also runs when a parent object calls it. It makes another http POST request to another PHP script that calculates sunset for a given day.
Here is the problem if you haven't figured it out already. With the first method running in a new cog and in a repeat loop I was experiencing some data collisions as multiple cogs tried to access the same spi port at the same time. I tried adding some code to each method which checked the status of a global variable called WIZlocked. The code is a simple repeat while WIZlocked and then set WIZlocked to true then do some task then when done with task set WIZlocked to false.
All of this "to me" should work but for some reason it doesn't and it has some weird symptoms. Firstly all of the methods work independantly as long as the cognew isn't issued. During boot time i would like to get all of the data so that i have something to start with so I make calls to both get_internet_time and get_sunset. After a short delay I launch get_data into a new cog. After that point I can not successfully retrieve any data from the other two methods. My understanding is that because of the repeat while WIZlocked code, I should be able to call the get_internet_time method and it should wait for WIZlocked to become false and then set WIZlocked to true causing the get_data method to wait for get_internet_time to finish. Firstly the get_internet_time will return some data immediatly but the data is incorrect.???
is there a better way? should i pass the WIZlocked or the address of WIZlocked as a parameter for the cognew command? Not sure where to start with this problem. I would just not have the cognew command in there but get_data takes to long to process (250ms)
Thanks in advance for any suggestions!
I am working on an ethernet project using a WIZnet for quickstart board. I am currently working on an object called "ethernet". Ethernet has one reference to the object "W5200" which starts an spi interface to the chip.
In my "Ethernet" object I have three methods that have the ability to access the W5200 object and communicate with the WIZnet chip.
The first one is called get_data and is launched with cognew command and runs in a repeat loop which "mines" data from a database using http POST requests and a PHP script. This should give me quick access time to a global array in which the "mined" data resides.
The second one is called get_Internet_time and runs when a parent object calls it. It returns SNTP time stamp used to sync the RTC.
The third one is called get_Sunset and also runs when a parent object calls it. It makes another http POST request to another PHP script that calculates sunset for a given day.
Here is the problem if you haven't figured it out already. With the first method running in a new cog and in a repeat loop I was experiencing some data collisions as multiple cogs tried to access the same spi port at the same time. I tried adding some code to each method which checked the status of a global variable called WIZlocked. The code is a simple repeat while WIZlocked and then set WIZlocked to true then do some task then when done with task set WIZlocked to false.
All of this "to me" should work but for some reason it doesn't and it has some weird symptoms. Firstly all of the methods work independantly as long as the cognew isn't issued. During boot time i would like to get all of the data so that i have something to start with so I make calls to both get_internet_time and get_sunset. After a short delay I launch get_data into a new cog. After that point I can not successfully retrieve any data from the other two methods. My understanding is that because of the repeat while WIZlocked code, I should be able to call the get_internet_time method and it should wait for WIZlocked to become false and then set WIZlocked to true causing the get_data method to wait for get_internet_time to finish. Firstly the get_internet_time will return some data immediatly but the data is incorrect.???
is there a better way? should i pass the WIZlocked or the address of WIZlocked as a parameter for the cognew command? Not sure where to start with this problem. I would just not have the cognew command in there but get_data takes to long to process (250ms)
Thanks in advance for any suggestions!
Comments
One way around these problems is to only have one cog access the ethernet object.
While I haven't experimented with ethernet much I do use a serial object in most of my programs. I generally have other cogs set flags the cog monitoring the serial object checks. The other cogs set these flags when wanting to send data or when requesting data.
The alternative is to use locks. Kye uses locks in several of his demo programs, I followed his example when using them myself.
You have a flag, WIZlocked, so you look at it, see its not locked, and blithely go ahead, set the flag and use the object. Bzzzzt, FAIL!
Every cog is running in parallel and all looking at the flag - they all see its free, they then all set it, and all enter the critical
sections together - you haven't protected against simultaneous access at all, and you'll never achieve this with a flag.
You need either an atomic read-modify-write primitive such as a lock or semaphore,
or a synchronization primitive that works without atomic read-modify-write such
as an event-counter. Here the former is appropriate - when you initialize the object
use locknew to get a lock, make all the critical sections use this to prevent multiple
access using lockset and lockclr - I think this is explained in the Prop manual.
I generally run a main loop that checks the status of scheduled tasks.
Dealing with 3 top level locks is complicated and difficult to troubleshoot. Implement locks in the driver, SpiCouterPasm.spin, if you can't live without locks.
(probably just the SPI method), and at the start of the critical section you busy-wait to claim the lock. Once you are through
the critical section you free the lock for the next cog. Have you read the Prop manual documentation, it explains it with
Spin examples
This method:
Doesn't appear to make sure you've got at least 56 bytes in the time server response. It just waits for at least 1 byte and then uses whatever it has to get the transmit time stamp. That would cause some problems with incorrect data for your RTC if the number of bytes waiting to be received were less than a full response from the server (56 bytes at least.) I think.
If I were you, I'd forget about locks and use a main loop to control buffer and SPI bus access. That's what I do and it works great.
This is a way more stable way of receiving the Time stamps. Thanks for pointing that out. One question though, why the 56 bytes? My first server listed in the dat table only returns 56 bytes. Looking at the message formats, it seams to me that 56 bytes puts me 4 bytes into the message digest just after the Key ID field. I don't really comprehend what the significance is but it works???
The above is a comment from the RxUDP method.
This means the transmit timestamp will be located at bytes 48 thru 55. Anything after byte 55 is either a key identifier, kiss o' death or something else. Either way, as long as you have 56 bytes you can be pretty sure that you've got the data needed to figure out what time it is.
Without ensuring you've got at least 56 bytes, it means you "could" throw 0's into the humantime method, and that of course would get things all wonky and cattawampus and off-kilter and such in your calculations.