But I already have a suspiction. I think the drive falls back from fully operational state to the "safe-op" state for some reason (watchdog timeout or some similar reason). ... In safe-op state the outputs and also motion are disabled...
I have to monitor state-change events and all of the status and error registers to catch such unexpected state changes.
Unfortunatelly, my assumption was false. I verified that the drive is in fully operational state and stays there as long as the power stage is enabled. So it must be something else.
I'll check the tuning parameters. Maybe there is a register that defaults to zero unless it is explicitely set.
It's been almost a month and I'm still struggeling. I have bought a brand new servo amplifier from Sanyo Denki in the hope that their implementation is better than the Chinese. Well, the good news is that, at least, it returns an error code where the JMC servo just doesn't work without error message. The bad news, however, is that it fails even earlier. It refuses to switch state from pre-op to safe-op and complains about "no valid outputs". The Beckhoff docs say that this error should occur only in the "O" state (operational) or at the transition from "S" to "O" (safe-op -> op). I have no idea why it occurs at the transition from "P" to "S" (pre-op -> safe-op). It is not possible to transmit output states (RxPDOs) before the state machine is in the safe-op state, so this makes no sense to me.
At the moment I feel totally lost. I don't want to give up as I have invested so much work. I feel that I'm only inches away from the ultimate solution but there is always one more bug...
At the moment I feel totally lost. I don't want to give up as I have invested so much work. I feel that I'm only inches away from the ultimate solution but there is always one more bug...
Heck, don't even think about giving up. You know how it goes, you'll be somewhere having a beer with friends and it suddenly hits you. Been happening to me since forever.
Today I had a video chat with the RTA support team (a dealer selling SanyoDenki servo drives). We didn't find a real solution, unfortunatelly. But we got one more clue that there seems to be a fundamental problem with my communication. Somehow the drive is responding to my commands and acknowledges it. I can even write to registers and read the correct values back. But those values seem to come from some sort of shadow memory or cache. It looks like they are written but they have no effect at all to the rest of the behaviour of the drive. It's like there is some sort of firewall and everything I do stays inside the EtherCat "sandbox".
For example I set PDO mappings. I don't have a real debugger to see what's going on inside the drive but at least the SANMOTION motor setup software displays the number of PDO mapped in its monitor display window. This number stays at zero the whole time while my software is running and communicates with the drive. The displayed number updates correctly if I use EC-Engineer (Ethercat diagnosis software running on the PC). It stays at the setting of EC-Engineer if I run my P2 software afterwards. There must be some secret switch or authorisation protocol that unlocks access.
I'll follow Craig's advice and have a beer while discussing this with my friend Ingolf, tonight.
To debug my communication I have to compare the Ethernet frames transmitted with WireShark. The problem is that with EtherCAT I can't use standard network switches so connect the PC to the network. EtherCat doesn't like that because it adds delay and can cause collisions. All frames use broadcast mode because there are no real MAC addresses. Adressing is done by the physical order the devices appear in the daisy chain.
To solve this I simply connected another Ethernet accessory board to the P2 KISS board, added a second driver object to the software and simply send all packets to both ports. I think this wouldn't have been possible with any other processor. It's just so cool.
With the Prop1 project I have I decided to move the 485 link to the other side of the chip to make the ground plane a little more chunky. That meant also shuffling a few of the other pins along to make room on that side. No problem, I was doing a new board layout anyway.
I didn't know EtherCAT couldn't use a generic Ethernet switch. It's barely valid to have "Ether" in its name then.
@evanh said:
I didn't know EtherCAT couldn't use a generic Ethernet switch. It's barely valid to have "Ether" in its name then.
I was told by a Beckhoff rep that this was possible (although maybe it doesn't work with every switch), and how he found out the hard way that plugging a fourth port into a corporate network was an excellent way to flood the entire network and make it useless for everyone else...
TherCat is relatively tolerant. The PC still finds all connected devices even if they are connected via switches. But I think some switches are confused by the high broadcast load and throttle the traffic. At least I get errors sometimes when trying to run full speed realtime. And I don't want unnecessary error sources when I debug problems, so I prefer to connect the EtherCAT devices directly. Some EtherCat vendors sell special switches optimized for EtherCat.
By debugging with WireShark I actually found another difference. The mailbox frame sequence counter of EC-Engineer starts at zero while mine has started at 1. I fixed that but it didn't change anything. There are some more minor differences, mainly the order in which actions are performed. But I don't see anything really relevant. I doubt that I can get it to work this way. So I have two options:
give up. No, that's not an option.
Buy the sample master source code from Beckhoff and try to port that to the P2.
Implement the slave device code. If I have a working slave I can look inside and monitor what's going on and why it doesn't behave like expected.
There is a thing called SSCT (slave stack code tool) that can generate C code from an XML file that describes the features of the EtrherCat slave device. I think nearly all EtherCat slave devices are based on code generated by this tool so it is the reason why they all behave (or misbehave) nearly the same way. I hope that if I create a slave device with the SSCT it will show the same problem of either not switching to safe-op state or not correctly updating the PDO table in the same way as the JMC and Sanyo servo drives. So I could be able to find the actual reason.
So this sounds like a valid plan. Unfortunatelly, my time budget is running low. I have to do some payed work, soon. I was trying to implement the master first so I can show the running motors to customers to justify investing more time and money. Now I'm stuck in that trap where something is 80 or 90% finished but will end up in a box in my stock with so many other unfinished projects if no miracle happens....
ManAtWork,
I expect you've at one time or another had confusing behaviour from SETSEn events when attempting to use level sensitive triggering. I've just worked out why sometimes they don't behave nicely, while other times they're fine.
Simply put, the SETSEn instructions fail to correctly clear the prior event trigger. In more detail, an actively triggering condition retriggers after it is cleared but before the new mode and source is switched over to. I believe even edge triggering can be glitched too, but this is only a guess.
I took a brief look at the code generated by the SSCT for an evaluation board with a PIC MCU. As expected the code is VERY complicated and contains a lot of macro definitions and switches to configure it to all kinds of uCs, big endian little endian etc... It will be very hard to port this to the P2 and Flexprop. I fear, many special features of C are not supported, especially pragmas for memory alignment and struct layout.
The good news, although, is that I get some insight to how the slaves handle the state transitions and what conditions are checked at which stages. This is nowhere to be found in the original protocol specification (or too cryptical to be human readable). So once again, the "quasi standard" is defined by the implementation rather than a real specification.
\brief This function is called in case of the state transition from PREOP to SAFEOP.
\brief the areas of the Sync Managers will be checked for overlapping,
\brief the synchronization mode (Free Run, Synchron, Distributed Clocks) is selected,
\brief the requested cycle time will be checked, the watchdog is started
\brief and the AL Event Mask register will be set
...
UINT16 StartInputHandler(void)
\brief This function is called in case of the state transition from SAFEOP to OP.
\brief It will be checked if outputs had to be received before switching to OP
\brief and the state transition would be refused if outputs are missing
...
UINT16 StartOutputHandler(void)
Unfortunatelly, no big miracle, so far. But at least a small miracle happened: The support of JMC has responded, eventually. They have sent me a WireShark recording of a working sample configuration. So I have some more homework to do, comparing the network communication against my own, figuring out their purpose and hopefully find the difference.
Comments
Oh, wow, so the CIA402 protocol is some sort of memory map for motion control that is common across all comms protocols. That's intriguing.
Unfortunatelly, my assumption was false. I verified that the drive is in fully operational state and stays there as long as the power stage is enabled. So it must be something else.
I'll check the tuning parameters. Maybe there is a register that defaults to zero unless it is explicitely set.
It's been almost a month and I'm still struggeling. I have bought a brand new servo amplifier from Sanyo Denki in the hope that their implementation is better than the Chinese. Well, the good news is that, at least, it returns an error code where the JMC servo just doesn't work without error message. The bad news, however, is that it fails even earlier. It refuses to switch state from pre-op to safe-op and complains about "no valid outputs". The Beckhoff docs say that this error should occur only in the "O" state (operational) or at the transition from "S" to "O" (safe-op -> op). I have no idea why it occurs at the transition from "P" to "S" (pre-op -> safe-op). It is not possible to transmit output states (RxPDOs) before the state machine is in the safe-op state, so this makes no sense to me.
At the moment I feel totally lost. I don't want to give up as I have invested so much work. I feel that I'm only inches away from the ultimate solution but there is always one more bug...
Heck, don't even think about giving up. You know how it goes, you'll be somewhere having a beer with friends and it suddenly hits you. Been happening to me since forever.
Craig
Today I had a video chat with the RTA support team (a dealer selling SanyoDenki servo drives). We didn't find a real solution, unfortunatelly. But we got one more clue that there seems to be a fundamental problem with my communication. Somehow the drive is responding to my commands and acknowledges it. I can even write to registers and read the correct values back. But those values seem to come from some sort of shadow memory or cache. It looks like they are written but they have no effect at all to the rest of the behaviour of the drive. It's like there is some sort of firewall and everything I do stays inside the EtherCat "sandbox".
For example I set PDO mappings. I don't have a real debugger to see what's going on inside the drive but at least the SANMOTION motor setup software displays the number of PDO mapped in its monitor display window. This number stays at zero the whole time while my software is running and communicates with the drive. The displayed number updates correctly if I use EC-Engineer (Ethercat diagnosis software running on the PC). It stays at the setting of EC-Engineer if I run my P2 software afterwards. There must be some secret switch or authorisation protocol that unlocks access.
I'll follow Craig's advice and have a beer while discussing this with my friend Ingolf, tonight.
To debug my communication I have to compare the Ethernet frames transmitted with WireShark. The problem is that with EtherCAT I can't use standard network switches so connect the PC to the network. EtherCat doesn't like that because it adds delay and can cause collisions. All frames use broadcast mode because there are no real MAC addresses. Adressing is done by the physical order the devices appear in the daisy chain.
To solve this I simply connected another Ethernet accessory board to the P2 KISS board, added a second driver object to the software and simply send all packets to both ports. I think this wouldn't have been possible with any other processor. It's just so cool.
With the Prop1 project I have I decided to move the 485 link to the other side of the chip to make the ground plane a little more chunky. That meant also shuffling a few of the other pins along to make room on that side. No problem, I was doing a new board layout anyway.
I didn't know EtherCAT couldn't use a generic Ethernet switch. It's barely valid to have "Ether" in its name then.
I was told by a Beckhoff rep that this was possible (although maybe it doesn't work with every switch), and how he found out the hard way that plugging a fourth port into a corporate network was an excellent way to flood the entire network and make it useless for everyone else...
TherCat is relatively tolerant. The PC still finds all connected devices even if they are connected via switches. But I think some switches are confused by the high broadcast load and throttle the traffic. At least I get errors sometimes when trying to run full speed realtime. And I don't want unnecessary error sources when I debug problems, so I prefer to connect the EtherCAT devices directly. Some EtherCat vendors sell special switches optimized for EtherCat.
By debugging with WireShark I actually found another difference. The mailbox frame sequence counter of EC-Engineer starts at zero while mine has started at 1. I fixed that but it didn't change anything. There are some more minor differences, mainly the order in which actions are performed. But I don't see anything really relevant. I doubt that I can get it to work this way. So I have two options:
There is a thing called SSCT (slave stack code tool) that can generate C code from an XML file that describes the features of the EtrherCat slave device. I think nearly all EtherCat slave devices are based on code generated by this tool so it is the reason why they all behave (or misbehave) nearly the same way. I hope that if I create a slave device with the SSCT it will show the same problem of either not switching to safe-op state or not correctly updating the PDO table in the same way as the JMC and Sanyo servo drives. So I could be able to find the actual reason.
So this sounds like a valid plan. Unfortunatelly, my time budget is running low. I have to do some payed work, soon. I was trying to implement the master first so I can show the running motors to customers to justify investing more time and money. Now I'm stuck in that trap where something is 80 or 90% finished but will end up in a box in my stock with so many other unfinished projects if no miracle happens....
ManAtWork,
I expect you've at one time or another had confusing behaviour from SETSEn events when attempting to use level sensitive triggering. I've just worked out why sometimes they don't behave nicely, while other times they're fine.
Simply put, the SETSEn instructions fail to correctly clear the prior event trigger. In more detail, an actively triggering condition retriggers after it is cleared but before the new mode and source is switched over to. I believe even edge triggering can be glitched too, but this is only a guess.
Further reading - https://forums.parallax.com/discussion/comment/1558762/#Comment_1558762
I took a brief look at the code generated by the SSCT for an evaluation board with a PIC MCU. As expected the code is VERY complicated and contains a lot of macro definitions and switches to configure it to all kinds of uCs, big endian little endian etc... It will be very hard to port this to the P2 and Flexprop. I fear, many special features of C are not supported, especially pragmas for memory alignment and struct layout.
The good news, although, is that I get some insight to how the slaves handle the state transitions and what conditions are checked at which stages. This is nowhere to be found in the original protocol specification (or too cryptical to be human readable). So once again, the "quasi standard" is defined by the implementation rather than a real specification.
Unfortunatelly, no big miracle, so far. But at least a small miracle happened: The support of JMC has responded, eventually. They have sent me a WireShark recording of a working sample configuration. So I have some more homework to do, comparing the network communication against my own, figuring out their purpose and hopefully find the difference.