Multi axis motion controller project

ManAtWork · 2019-12-09 17:12

I'm planning to start a new project which I think could be a real killer application: a complete CNC motion controller with multiple built in servo axes for brushless motors. I already sell a P1 based motion controller and P1 based single axis servos. But I don't sell many of the servos because wiring and parameter tuning is very complicated. Most users don't care much about safety and use cheap chinese servos. I DO a sell a lot of motion controllers but technical support is painful because so many things can go wrong... For this and some other reasons I'd like to integrate everything into a single box.

The P1 was not powerful enough for this and the P2 has been unavailable for a long time. So I started a few experiments with ARM controllers. Those chips are quite powerful buit some details are implemented in such an awfully bad way that renders all the rest pretty useless. First, you can't use every pin for any purpose. For most of the pins you have only two choices: a) use it as GPIO or b) assign it to some dedicated peripheral unit. So you run out of pins very quickly allthough you have 128 IOs or something. And the worst thing is you have to decide what pin to use for what before doing the PCB layout. Changing a serial protocol from UART to SPI means you have to use different pinouts. Some peripherals have conflicting pinouts, so although the data sheet advertises 6 UARTs you can only use 3 of them because of pin and DMA channel conflicts, or clock speed limitations or other restrictions.

And the timers are so complicated! Registers are only 16 bit wide. So you have to program different clocks and prescalers because 16 bit roll over in less than 1ms at 150MHz. Why?! A god damn timer in a god damned 32 bit processor has to be 32 bits wide! All the gates saved are wasted again by all the workarounds you need.

You can't even use an input pin for two different purposes at the same time. If you want to count pulses from an encoder and additionally trigger an interrupt for each pulse you have to use two pins. In theory, the PIO controller could assign an interrupt to any pin but not if that pin is assigned to a peripheryal function (timer/counter) How brain damaged is this?! It actually takes more resources to limit use of signals in this way instead of routing a signal source to all of the possible sinks. I mean an output has to be multiplexed if there are more than one possible source. But an input can be connected to multiple loads that use that signal without extra cost. Those unnecessary and stupid limitations have caused me a lot of grey hairs.

To sum it up, I WANT THE PROPPELLER BACK!

Now, I have to find out if the P2 fits my needs. Unfortunatelly, I haven't followed all the discussions and I haven't found any good documentation (I have to admit I haven't searched very hard recently). Is there a data sheet that already covers smart pins? I hope some of you guys can help me finding out if the P2 is the right choice for this project.

ManAtWork · 2019-12-09 17:32

Well, before anyone can answer this, I have to explain what I really need:

1. ethernet interface. I used the ENC28J60 for this in the past. Although I don't really need the data rate I'd prefer a 100Mbit MAC theese days. Current switches still support 10Mbit but possibly not for much longer.

2. five UARTs to talk to the encoders of the servo motors. I plan not to use incremental/quadrature encoders but absolute encoders with serial data interface

3. one quadrature counter for speed measurement and indexing of the main spindle

4. either another 6 UARTs to talk to the power stages or one single really fast SPI interface (daisy chain them all at 10+MBd)

5. around 20 to 30 general purpose IOs. If the CPU doesn't have enough pins they could be connected through shift registers (as I did often with the P1).

6. two ADC inputs and two DAC outputs, could be done with PWM which is better for isolation anyway

7. an FPU or at least some fast software emulation. I don't want to code the main motion control algorithms and PID loops in assembler using fixed point math. It's so time consuming avoiding overflows and quantisation problems.

8. a good and convenient debugger. Breakpoints, single stepping and variable watches is mandatory.

9. enough memory and computing power to handle 6 axes trajectory planner and PID control for position and velocity.

Mickster · 2019-12-09 17:33

You're just the guy to bring BiSS to the P2 :cool:

ManAtWork · 2019-12-09 17:41

point 4 and 9 probaly need some more detailed explanation...

Handling 6 power stages for brushless motors would require much too much IO capabilities for a single CPU. The power stages need to be galvanically isolated from the main CPU so this woulöd require a lot of expensive opto isolators, some sophisticated and even more expensive isolated ADCs. Crosstalk and EMI would be a great issue and PCB routing would be difficult.

For those reasons I thoght about putting a small ARM processor into each power stage. An SAMC21 or something similar costs less than $2 and can handle all the analogue stuff and PWM for a single motor. It could be connected to the main CPU over a single isolated RX/TX pair. It would also handle the current PID loop. So the main CPU wozuld only send a nominal current vector command every PWM cycle.

ManAtWork · 2019-12-09 17:46

Mickster wrote: »

You're just the guy to bring BiSS to the P2 :cool:

No, too complicated and needs too much pins. I'd prefer encoders from Tamagawa or SanyoDenki. They communicate over a single RS485 pair at 2.5MBd asynchronously. This makes it possible to use a single 8 pin connector for both the motor and encoder. And the protocol is quite easy. The description fits on two pages instead of 200 for BiSS.

ManAtWork · 2019-12-09 18:00

And yes, I have read the thread "What is an ideal application for the P2". It is a great summary of what is possible with the P2. However, I fear my application is not so "ideal". Because of the need to "outsource" the analogue front end to some slave processors I don't really need the great ADC-at-any-pin feature of the P2.

However, 11 UARTs is also an interesting challange that would be difficult for any processor other than the propeller. And the propeller is not limited to standard UART protocols (8N1...) it can do any protocol you can imagine. I've used that for SanyoDenki encoders which use a 1 start bit, 16 data bits, 1 stop bit frame format.

Mickster · 2019-12-09 18:19

Had a Baumer Profibus absolute encoder failure, recently. 21 days for a replacement. Absolute disaster for the client. Incremental encoder failure...I have them back in business in 24hrs.

The highest spec general purpose motion controller that I am aware of:

22MHz quadrature count
16bit motor command
30usec servo loop (default is 1ms)

$2,000 for 8 axes.

I believe that the P2 can eat this for breakfast

msrobots · 2019-12-10 01:38

I wrote a serial driver with buffers in LUT. It currently does two uarts per COG with 512 bytes each direction/uart buffer. It is self contained in PASM and one calls it via HUB mailboxes.

It does support async calling interface and also supports string/block moves from serial to HUB and vice versa.

Codewise it might be possible to squeese more uarts in, but the buffer will get smaller and I do use interrupts so 4 ports might be a small challenge.

I do not have settings for bit width but the smartpins have, so its a minimal change to use 16 bit data not 8 bit.

So currently you can have 12 UARTS with 512 Byte buffer in each direction using 6 COGs and have two COGs left and most of the HUB since the UARTS just need 2 longs HUB each.

Since I am moving my office I have lots of stuff in Boxes and am not sure where the current latest version is. It is contained in FullDuplexSerial2.Spin which provides the FastSpin compatible routines around the PASM. Search the forum and take the latest you find.

@localroger wrote a driver for the enc chip, but I think just for UDP, not sure.

FastSpin in C and Basic support Floats, so you might need to write your floating point math in basic or c and use it from Spin. Fastspin compiles to PASM and allows you to mix objects from different languages in one project

P2 supports debugging in HW, @ozpropdev has written a debugger, but I have not used it yet, I very seldom need debuggers.

Overall I think with ARMs in the power stage the P2 will be perfectly usable for that. There is also a Parallax Hyper Ram/Flash P2 module, not sure about the current state of drivers for it, that might help if HUB memory gets to small.

Mike

cgracey · 2019-12-10 02:05

ManAtWork, I think the P2 will be quite adequate for your needs and I'm certain you'll have a lot more fun.

kbash · 2019-12-10 05:30

ManAtWork,

I have a P2 controlling one my 5 axis ( step and direction) machines right now ( code is written for 6).

My P2 code is upgraded from 6 axis (step and direction only) code I wrote for the P1. I've been using it to run several of my machines for years.

I have a simple encoder feedback loop driving a brushed DC motor on the P1. It was good for precision of about +-10 (4K) encoder counts. Good enough for the application so I didn't refine it any further.

I had intended to turn my p2-Eval board into a 6 axis, encoder feedback DC drive system, but of course the encoder function didn't work in the first chips. I have 4 of the "production" chips now but haven't had time to verify my "Indy" board design and have them made yet.

I plan to couple my "Drive" code running in one cog to a second cog doing an encoder-feed-back loop with a +-10V drive signal to run up to 6 axis DC drives. ( leaving 6 cogs and a lot of memory to do a LOT of other stuff. ( my step rate on the P2 is around 2 million steps per second for 6 axis linear interpolation moves. ( at 160 mhz You can nearly double this now. ) I'm not sure if you would have enough I/o left to do a transistor drive design for 6 AXIS BRUSHLESS, but I agree with Mike that cheap little ARM's in the power stage under the P2's guidance should work great.

ManAtWork · 2019-12-10 08:31

I don't know if it's a good idea to sacrifice 6 cogs for the UARTs alone. Let's estimate the required overall bandwidth. The current control loops run at 24kHz. They require 12 bytes nominal values every 42µs and 12 bytes status and diagnostics data in return. 6 axes means 72 bytes. If we do this either with time multiplexing (one UART multiple pin pairs) or daisy chaining (SPI with a MOSI -> Slave 1 -> slave 2... ->MISO loop) we would need something around 20Mbit/s. If the P2 can handle this then one cog should be enough to handle all axes if it's done synchronously (SPI), maybe two for full duplex if done asynchronously.

Encoder communication could be multiplexed with a much lower data rate. The velocity control loops run at 4kHz. So one position value per axis in 250µs is enough. I think this should also be possible with one or two cogs.

ManAtWork · 2019-12-10 09:01

The next question is, do I really need an FPU? Well not really, but it would make things a lot easier.

The trajectory planner needs to calculate position, velocity and acceleration. Velocity is the deviation of position, acceleration is the deviation of velocity. We have to accumulate an acceleration value to get velocity and accumulate velocity to get position. So when calculating in small time steps in the sub millisecond range this means that the acceleration value must be in the order of one million smaller than the position values.

This is no problem if you have a known, simple machine configuration, for exampe a 3D printer: always nearly the same size of motors, same resolution, same speed range etc. Everything can be done with fixed point math and some shifting.

Things become complicated when designing an universal motion controller. Plsama and wood cutting machines have resolutions in the 0.1mm range. A watchmaker's lathe has sub µm resolution. A pick&place machine or a large router runs >1m/s whereas a large bridgeport mill has top speeds of only 3m per minute.

Most formulas for motion control are classical newton's mechanics. If you have floatingpoint math even the most complex algorithms can be coded in three lines. However, if you have to do this with fixed point you have to be careful to avoid overflows and quantisation artefacts. Things become really tricky and hidden bugs can show up years after you thought you have nailed everything down just because a customer exchanges a motor with a bit higher resolution...

So my wish for an FPU is actually more laziness than real technical requirement. But I think it would save a lot of development and debugging time which can be spent somewhere else to do something creative.

Maybe this could be handled in software. I don't need IEEE compatibility with those odd bit numbers and special encoding cases. A simple 32 bit mantissa and 32 bit exponent format should be a lot faster and easier than full featured IEEE floating point.

ManAtWork · 2019-12-10 10:24

cgracey wrote: »

ManAtWork, I think the P2 will be quite adequate for your needs and I'm certain you'll have a lot more fun.

Yes, that's surely one of the strongest arguments for the P2 and against an ARM solution.

However, evrything also depends a little bit on money. The ARM can't do what I want alone, and even if I found a way it could I don't know if I'd like doing it that way. Too much restrictions and limitations as I said. But I think one possible competitive solution would be this: ARM + FPGA. The FPGA could handle everything the ARM can't do internally and I'd had the freedom to build timers, registers and COM ports the way I like. So the BOM would be

* ARM Cortex-7 with FPU and built in ethernet, around $7..8
* external MAC + crystal ~$1
* FPGA, RAM based, 6k logic cells, ~$6
(I have to check if the ARM has enough flash memory to boot the FPGA)

That stands against

* P2, price unknown, I heared roumors $12..15?
* external ethernet controller + crystal $2.50
* external flash memory ~$1

Although the price of the P2 solution is a bit higher I think that doesn't matter much for a controler that costs $200+ for production and even more for the end user. So in the end it all depends on the quality of the available tools.

cgracey · 2019-12-10 11:13

ManAtWork wrote: »

cgracey wrote: »

ManAtWork, I think the P2 will be quite adequate for your needs and I'm certain you'll have a lot more fun.

Yes, that's surely one of the strongest arguments for the P2 and against an ARM solution.

However, evrything also depends a little bit on money. The ARM can't do what I want alone, and even if I found a way it could I don't know if I'd like doing it that way. Too much restrictions and limitations as I said. But I think one possible competitive solution would be this: ARM + FPGA. The FPGA could handle everything the ARM can't do internally and I'd had the freedom to build timers, registers and COM ports the way I like. So the BOM would be

* ARM Cortex-7 with FPU and built in ethernet, around $7..8
* external MAC + crystal ~$1
* FPGA, RAM based, 6k logic cells, ~$6
(I have to check if the ARM has enough flash memory to boot the FPGA)

That stands against

* P2, price unknown, I heared roumors $12..15?
* external ethernet controller + crystal $2.50
* external flash memory ~$1

Although the price of the P2 solution is a bit higher I think that doesn't matter much for a controler that costs $200+ for production and even more for the end user. So in the end it all depends on the quality of the available tools.

You can certainly achieve a lot with an FPGA coupled to an ARM, but the P2 would be 100x faster to work with during development, as it would let you try out more ideas more quickly, without losing focus as compilers churned, device programmers were waited on, and you obeyed lots of arbitrary strictures, along the way. With the P2, you'd have a single, much simpler, and more flexible system to work with.

Spin2 is going to turn code changes in under a second and ultimately provide thorough debugging. I think once people get used to that flow, these other solutions will seem quite drudgerous.

It's interesting to me to hear about your floating-point needs. That's the best case I've heard of it being really needed. You'd have to build it in software. Would maybe just using 64-bit fixed-point math solve the problem? Those calculations would be deterministic and fast, anyway.

evanh · 2019-12-10 11:57

My own experience with servo loops was using a 80486 with FPU and I did use the FPU including inside the ISR. However, I feel the FPU use was more of a convenience then a necessity.

Maybe more importantly, I made sure all positional changes were handled as 32-bit integers. Only the velocities and accelerations were handled as floats. The main reason for this focus on integers for position was because it was a gear-locked profile between multiple motors in a continuous rotating machine that would eventually roll over. This meant that all rounding errors had to be carried into future. Using integers to track this felt safest to me.

ErNa · 2019-12-10 12:53

There is no need to do floating point calculations, as in the end it comes to µm to control a machine. 32 Bits address 4*10^9 IIRC. Calculations are mostly adding increments. So you can get velocity by adding accelaration and shift right 10 bits e.g. The same for position from velocity. Trigonometrics come from the cordic.

evanh · 2019-12-10 13:08

The I/O and electrical considerations are valid though. Over engineering (Using separate controllers for each drive for example) for rugged reliability is worth investing it. But then price of the key components should not be of significant concern.

Rayman · 2019-12-10 15:25

For ethernet, my plan is to attach Raspberry Pi Zero to P2 and then plug ethernet adapter into USB OTG port.
I think it just magically works at that point...

ManAtWork · 2019-12-10 16:13

cgracey wrote: »

You can certainly achieve a lot with an FPGA coupled to an ARM, but the P2 would be 100x faster to work with during development, as it would let you try out more ideas more quickly, without losing focus as compilers churned, device programmers were waited on, and you obeyed lots of arbitrary strictures, along the way. With the P2, you'd have a single, much simpler, and more flexible system to work with.

Spin2 is going to turn code changes in under a second and ultimately provide thorough debugging. I think once people get used to that flow, these other solutions will seem quite drudgerous.

Thanks, nobody could say it better. We all say "coding for the propeller is fun" but that doesn't convince managers and marketing people who decide what chips their company buys. But your statement could also make it to the title page of some glossy brochure.

...and I've learned two new words, "drudgerous" and "churned" :-)

It's interesting to me to hear about your floating-point needs. That's the best case I've heard of it being really needed. You'd have to build it in software. Would maybe just using 64-bit fixed-point math solve the problem? Those calculations would be deterministic and fast, anyway.

Yes also true. I'm pretty sure 64 bit math could do the job and if not some fixed or variable shifts should fix it. However, this makes the code look ugly. I also used floating point math (OBJ "FloatMath") in my P1 based servo controller. It's too slow for the real time calculations but it makes calculation of the PID coefficients much easier. There's nothing more cool and elegant than having all tuning parameters in SI units instead of "motor steps per time slice" or something.

And it would be nice if the compiler could take care of 64 bit or floating point math.

  Kp:= fm.FMul (GetParameter (indexVKP), velLoopGain)
  Kp:= fm.FDiv (fm.FMul (Kp, J), Kt)

is better than assembler but not as elegant and clear as

  Kp:= GetParameter (indexVKP)*velLoopGain*J / Kt

I signed for the P2 Webinar. I hope I learn something about the P2 development tools.

ManAtWork · 2019-12-10 16:27

ErNa wrote: »

There is no need to do floating point calculations, as in the end it comes to µm to control a machine. 32 Bits address 4*10^9 IIRC. Calculations are mostly adding increments. So you can get velocity by adding accelaration and shift right 10 bits e.g. The same for position from velocity. Trigonometrics come from the cordic.

Tecnically correct. But our math prof often said "a good mathematicien has to be lazy" meaning every formula has to be reduced to the minimum possible length. Or "The eye is also calculating" in analogy to the german proverb "Das Auge isst mit" meaning a meal not only has to taste good but also has to look appetizing. Short code is better code.

Yes, the cordic unit can also help a lot. Some calculations (acceleration and braking distances) include square roots.

evanh wrote: »

... I feel the FPU use was more of a convenience then a necessity.

Maybe more importantly, I made sure all positional changes were handled as 32-bit integers.

Yes of course, the final position output has to be integer to avoid summing up rounding errors.

Rayman · 2019-12-10 16:44

If you use FastSpin, you can add C code with floating point to your Spin code...

evanh · 2019-12-10 17:06

ManAtWork wrote: »

Yes of course, the final position output has to be integer to avoid summing up rounding errors.

A significant number of motion profile calculations were position only, and therefore integer only. Velocity would be derived just for the feed-forward component at the PID.

Mickster · 2019-12-10 17:36

This is what I use...

evanh · 2019-12-10 17:45

Mickster,
Good units those. I started on a Smartmove back well before Baldor bought them out. It taught me a lot about servo'ing. Used its cam'ing feature to solve the maths required before moving to a more custom solution a couple years later.

ManAtWork · 2019-12-10 18:08

evanh wrote: »

A significant number of motion profile calculations were position only, and therefore integer only. Velocity would be derived just for the feed-forward component at the PID.

Mickster wrote: »

The remote processors are kinda self interpolating. Let's say I want a vector position of x=2000, y=4000.

My trajectory generator is running at 2KHz. I issue a global command "go"

For each generator update, the X is incremented by 1 count and the Y is incremented by 2 counts. Voila, I have linear interpolation.

This is what most motion controllers do wrong! Bresenham interpolation is good for evenly divisible numbers like 4000/2000. (Or for inertia-less things like raster pixels) But problems start when you have random slopes at any angle. There is always a speed ratio when one axis becomes so slow that it only moves discrete intervals of alternating 0 and 1 steps per time slice. If your slope ratio is 1/3 you have to do: no step, no step, one step, repeat... And even worse if the slope is 2/5. The pattern now is 01010 01010... The time between pulses of the step signal is no longer constant. This is called jitter. Jitter means your velocity command is discontinous. Even if your step size is as low as 1µm you see that on the surface of the workpiece.

Of course you can let a slow PID loop or lots of inertia average that error out. But this would make it impossible to rapidly follow a complex path.

The motion controller has to calculate position with sub-step resolution to avoid velocity jitter.
Look at this workpiece: http://www.cncecke.de/forum/attachment.php?attachmentid=169032&d=1570311791
This is machined with my controller but not on a 100k+ servo machine but with good old stepper motor drives. And BTW, it is done with a P1 and without FPU. But most of the code is assembler and is hard to maintain.

evanh · 2019-12-10 18:38

The encoder resolution is a real limit. Filtering will afford some relief but it's certainly not a fix all answer. I got away with doing an 8x box FIR (it had to be lossless) software resolution multiplier on the master reference simply for velocity smoothing purpose. We didn't need a higher precision, just the reduced noise of finer step size was needed.

Mickster · 2019-12-10 18:48

Your link didn't work but here's mine from when I supplied controls to a Spanish machine tool manufacturer.

They also had to offer a Fanuc alternative because it was the industry standard but the end result was the same.

This wasn't Windows based, as the name suggests. It was DOS and the program was 100% QuickBASIC (1993)

Electrodude · 2019-12-10 19:59

It isn't necessary to spend a whole cog on a UART TX/RX pair. The smartpins do the job that the UART cog did on the P1. Instead, you can use coroutines and interrupts to do bufferless TX and RX from the same cog that's using the UART. It shouldn't be hard to manage multiple UARTs from one cog that's also juggling other things, but I haven't had a need for this yet.

So e.g. if you use a cog per axis, you'd manage the UART for each axis from that axis's cog.

mrchillin · 2019-12-10 20:10

a 32 bit unsigned integer is 2,000,000 big...so a 31 signed is only 1,000,000

so using integer math and a unit of .0001 inches u can have a range of -100 inches to +100 inches

plenty big for almost all the cnc machines that are inexistence

that means that linear interpolation can be in 1 cog using T-base positioning with changable feedrate

trapozoidal accelerations are just a ramp up to the feedrate...and remeber how long it took so u can ramp down

pasm does it NO PROBLEM

the follower is only 10 lines of code in another cog per axis

because its t-base axis abc are just slaved to interpolation

the problem is where the m/g code is to be retrieved from?

cgracey · 2019-12-10 20:21

mrchillin wrote: »

a 32 bit unsigned integer is 2,000,000 big...so a 31 signed is only 1,000,000

so using integer math and a unit of .0001 inches u can have a range of -100 inches to +100 inches

plenty big for almost all the cnc machines that are inexistence

that means that linear interpolation can be in 1 cog using T-base positioning with changable feedrate

trapozoidal accelerations are just a ramp up to the feedrate...and remeber how long it took so u can ramp down

pasm does it NO PROBLEM

the follower is only 10 lines of code in another cog per axis

because its t-base axis abc are just slaved to interpolation

the problem is where the m/g code is to be retrieved from?

Unsigned 32 bits goes from 0 to 4,294,967,295, while signed goes from -2,147,483,648 to +2,147,483,647.

32 bits can resolve the earth's equator to units of ~0.37 inches.

mrchillin · 2019-12-10 20:30

damn...i can get better resolution

when i get it going on prop2 i will give the linear interpolator to obex

Multi axis motion controller project

Comments