Interrupts And Porting Code To The P2

jmg · 2018-10-07 19:30

idbruce wrote: »

Can the P2 "easily duplicate" the behavior of the interrupts used in existing OpenSource C code?

For some values of 'easy', yes

Let's get grbl, and search for interrupt
That gives 60 hits, across all files, not too bad, many are comments.

Active files look to be:

INT use : Enable Pin Change Interrupt
File: (361 lines) limits.c - code pertaining to limit-switches and performing the homing cycle

INT use : enable rx, tx, and interrupt on complete reception of a byte
File: (205 lines) serial.c - Low level functions for sending and receiving bytes via the serial port

Pin change and Serial, are somewhat generic and should port for 'better' values of easily.

INT use : ISR() entries finds 3 HW vectors
File (1023 lines) stepper.c - stepper motor driver: executes motion plans using stepper motors

interrupt mentioned 33 times in this file The 3 ISR() entries are below
float is mentioned 34 times, but the float code is not inside the interrupts.

some lines of code from stepper.c

/* The Stepper Port Reset Interrupt: Timer0 OVF interrupt handles the falling edge of the step
pulse. This should always trigger before the next Timer1 COMPA interrupt and independently
finish, if Timer1 is disabled after completing a move.
NOTE: Interrupt collisions between the serial and stepper interrupts can cause delays by
a few microseconds, if they execute right before one another. Not a big deal, but can
cause issues at high step rates if another high frequency asynchronous interrupt is
added to Grbl.
*/
ISR(TIMER0_OVF_vect)
#ifdef STEP_PULSE_DELAY
// This interrupt is used only when STEP_PULSE_DELAY is enabled. Here, the step pulse is
// initiated after the STEP_PULSE_DELAY time period has elapsed. The ISR TIMER2_OVF interrupt
// will then trigger after the appropriate settings.pulse_microseconds, as in normal operation.
// The new timing between direction, step pulse, and step complete events are setup in the
// st_wake_up() routine.
ISR(TIMER0_COMPA_vect)
{
STEP_PORT = st.step_bits; // Begin step pulse.
}
#endif
/* "The Stepper Driver Interrupt" - This timer interrupt is the workhorse of Grbl. Grbl employs
the venerable Bresenham line algorithm to manage and exactly synchronize multi-axis moves.
Unlike the popular DDA algorithm, the Bresenham algorithm is not susceptible to numerical
round-off errors and only requires fast integer counters, meaning low computational overhead
and maximizing the Arduino's capabilities. However, the downside of the Bresenham algorithm
is, for certain multi-axis motions, the non-dominant axes may suffer from un-smooth step
pulse trains, or aliasing, which can lead to strange audible noises or shaking. This is
particularly noticeable or may cause motion issues at low step frequencies (0-5kHz), but
is usually not a physical problem at higher frequencies, although audible.
To improve Bresenham multi-axis performance, Grbl uses what we call an Adaptive Multi-Axis
Step Smoothing (AMASS) algorithm, which does what the name implies. At lower step frequencies,
AMASS artificially increases the Bresenham resolution without effecting the algorithm's
innate exactness. AMASS adapts its resolution levels automatically depending on the step
frequency to be executed, meaning that for even lower step frequencies the step smoothing
level increases. Algorithmically, AMASS is acheived by a simple bit-shifting of the Bresenham
step count for each AMASS level. For example, for a Level 1 step smoothing, we bit shift
the Bresenham step event count, effectively multiplying it by 2, while the axis step counts
remain the same, and then double the stepper ISR frequency. In effect, we are allowing the
non-dominant Bresenham axes step in the intermediate ISR tick, while the dominant axis is
stepping every two ISR ticks, rather than every ISR tick in the traditional sense. At AMASS
Level 2, we simply bit-shift again, so the non-dominant Bresenham axes can step within any
of the four ISR ticks, the dominant axis steps every four ISR ticks, and quadruple the
stepper ISR frequency. And so on. This, in effect, virtually eliminates multi-axis aliasing
issues with the Bresenham algorithm and does not significantly alter Grbl's performance, but
in fact, more efficiently utilizes unused CPU cycles overall throughout all configurations.
AMASS retains the Bresenham algorithm exactness by requiring that it always executes a full
Bresenham step, regardless of AMASS Level. Meaning that for an AMASS Level 2, all four
intermediate steps must be completed such that baseline Bresenham (Level 0) count is always
retained. Similarly, AMASS Level 3 means all eight intermediate steps must be executed.
Although the AMASS Levels are in reality arbitrary, where the baseline Bresenham counts can
be multiplied by any integer value, multiplication by powers of two are simply used to ease
CPU overhead with bitshift integer operations.
This interrupt is simple and dumb by design. All the computational heavy-lifting, as in
determining accelerations, is performed elsewhere. This interrupt pops pre-computed segments,
defined as constant velocity over n number of steps, from the step segment buffer and then
executes them by pulsing the stepper pins appropriately via the Bresenham algorithm. This
ISR is supported by The Stepper Port Reset Interrupt which it uses to reset the stepper port
after each pulse. The bresenham line tracer algorithm controls all stepper outputs
simultaneously with these two interrupts.
NOTE: This interrupt must be as efficient as possible and complete before the next ISR tick,
which for Grbl must be less than 33.3usec (@30kHz ISR rate). Oscilloscope measured time in
ISR is 5usec typical and 25usec maximum, well below requirement.
NOTE: This ISR expects at least one step to be executed per segment.
*/
// TODO: Replace direct updating of the int32 position counters in the ISR somehow. Perhaps use smaller
// int8 variables and update position counters only when a segment completes. This can get complicated
// with probing and homing cycles that require true real-time positions.
ISR(TIMER1_COMPA_vect)
{
if (busy) { return; } // The busy-flag is used to avoid reentering this interrupt
// Set the direction pins a couple of nanoseconds before we step the steppers
DIRECTION_PORT = (DIRECTION_PORT & ~DIRECTION_MASK) | (st.dir_outbits & DIRECTION_MASK);
// Then pulse the stepping pins
#ifdef STEP_PULSE_DELAY
st.step_bits = (STEP_PORT & ~STEP_MASK) | st.step_outbits; // Store out_bits to prevent overwriting.
#else // Normal operation
STEP_PORT = (STEP_PORT & ~STEP_MASK) | st.step_outbits;
#endif
// Enable step pulse reset timer so that The Stepper Port Reset Interrupt can reset the signal after
// exactly settings.pulse_microseconds microseconds, independent of the main Timer1 prescaler.
TCNT0 = st.step_pulse_time; // Reload Timer0 counter
TCCR0B = (1<<CS01); // Begin Timer0. Full speed, 1/8 prescaler

Addit : another hardware area that needs porting, is in eeprom.c - grbl uses the AVR EEPROM for config settings.

unsigned char eeprom_get_char(unsigned int addr);
void eeprom_put_char(unsigned int addr, unsigned char new_value);
// Extensions added as part of Grbl 
void memcpy_to_eeprom_with_checksum(unsigned int destination, char *source, unsigned int size);
int memcpy_from_eeprom_with_checksum(char *destination, unsigned int source, unsigned int size);

Mickster · 2018-10-07 19:42

idbruce wrote: »

Mickster

If they'd had cogs, they probably wouldn't have needed interrupts at all.

LOL... Yea, but now all those man hours have been spent on the wrong architecture, a bit too late now.

ErNa

There is no secret in GRBL. It's just a lot of code.

Shouldn't it be possible to code this, using TaQOz?

Provided that the issues pertaining to the use of timers and interrupts can be overcome, that would be "a lot of code" to translate.

There are at least two community members who regard the P2 as potentially the ultimate CNC-on-a-chip. I predict that one of these sharp cookies will soon have something to blow Arduino-GRBL away.

potatohead · 2018-10-07 20:22

I do too.

Seriously hoping we see some activity on the CNC front soon. In my opinion, the P2, with all it's I/O, sensible interrupts, overall compute capability in the COGS, math, is shaping up to be a CNC powerhouse.

And this time, unlike P1, which frankly has a lot of what good CNC takes, we've got RAM and enough I/O to put it all together in a more lean, potent way.

P1 could have had 64 I/O's, and would have driven them all just as well as it does today, and having more of all that would have made external RAM viable for more use cases.

RAM and, in general, program size really did keep P1 off the CNC table, again in my opinion.

P2 has more, and more RAM. Enough that a solid g-code parser, along with the movement kernel, can be written, loaded and run, with plenty left for buffers, and other routines to better exploit the I/O.

Tubular · 2018-10-07 20:53

Hoping to apply p2 to a 8x4' blackfoot sheet router before too long, along with some help from Ozpropdev

potatohead · 2018-10-07 21:48

Cool.

I have a plan or two myself along similar lines.

Cluso99 · 2018-10-07 22:16

No conversion is easy. Not even porting P1 to P2 is easy. There are all kinds of gotchas.
I am porting my faster spin Interpreter for P1 to P2. My code is not as convoluted as the original Interpreter.

The big problem that I am facing is the JMPRET instruction that has no exact equivalent in P2. This means I have to look closely at every use, and decide each on its own merits. Sometimes I can use a call using the internal stack. Other times I have to work out how to do it because the P2 can save and restore condition code and iirc 10 address bits instead of 9. I am finding it a shame I didn't find this earlier so we could have an extra P2 instruction that performed JMPRET.

Other problems are the few instructions not carried forward to the P2. These are rarely used and fairly easily replaced by two instructions. ABSNEG, ADDABS, SUBABS, REV, WAITPEQ, WAITPNE, WAITCNT.

So these are issues for a micro with very similar architectures.

Interrupts are a much bigger can of worms. If the program is well understood, then it will be an easier process to carve out the Interrupt code and place into another core. But the interaction between cores needs to be understood and fixed.

Don't underestimate the task, particularly if you don't understand the code being ported.

idbruce · 2018-10-07 22:40

Cluso99

Don't underestimate the task, particularly if you don't understand the code being ported.

I took me quite a while to get a good grasp on the Teacup code. I can only imagine how long it would take to get a good grasp on GRBL.

However, I agree wholeheartedly.

jmg · 2018-10-08 22:19

Mickster wrote: »

I was considering the 7183 and using the P1 counters but I think you mentioned that it was an older device and that EOL might be imminent.

I think the data date on the 7183 is new enough to consider it, but the LS7366R is a much smarter part, and is well suited to P1.
The LS7366R is only slightly more expensive.

One appeal of an interim Quad-UpDn fix chip for P2, is the P2 code is not changed much when it is fixed.
2 Smart pin cells are used in each case.

jmg wrote: »

I found some old code for a SPLD, but given the price/Icc/package of SPLDs, I might also explore using a CLU block in a MCU as a PLD, to copy a LS7183N operation.

I looked briefly at MCU-CLU, and they do not have enough product terms

That leaves a MCU-SW solution for modest speeds (sub MHz), or a SPLD for higher clock speeds.

Re-scanning the PLD code I have here, it looks like 2 channels of Quad to UpDn can pack into a ATF22LV10CQZ PLD
- just needs a CLK of > 2x highest Quad rate.

The SO8 LS7183 looks like a good patch, for modest volumes of P2 R&D.

Mickster · 2018-10-08 23:00

Gonna grab a few 7183s to play with.

jmg · 2018-10-08 23:22

Mickster wrote: »

Gonna grab a few 7183s to play with.

Good idea. What are the quad edge rates you need to handle ?

Mickster · 2018-10-09 02:23

We somehow made a ridiculous mistake, many years ago and the bottom line is that I have to be able to handle 6,000 RPM motors with 4096 line encoders (16,384 counts/rev)...So a bit over 1.6M quad counts/sec. Nothing for these devices.

Apart from the above, I don't even get close to 500K counts/sec.

Interrupts And Porting Code To The P2

Comments