working with mpu9250 sparkfun board

iseries · 2017-01-14 13:13

I have been working with this board and have converted the sparkfun code over to the propeller.

What have just noticed is that the propeller chip is way slower than the STM32L4 chip that I am testing with.

Both chips say they are running at 80 Mhz and I used a test program where I capture the program counter and then pause for 1 second and subtract it to get the count.

Propeller:

int I = CNT;
pause(1000);
I = CNT - I;
I = I / 80; // microseconds

Arduino code:

int I = micros();
delay(1000);
I = micros() - I;

In both cases they return 1,000,000 which is expected.

When I hook them up to the mpu9250 and read 6 continues registers they take vastly different amounts of time to do it.

The Propeller takes 5,330 microseconds and the STM32L4 takes only 857 microseconds.

Why is the i2c protocol so much slower on the propeller?

Mike

RS_Jim · 2017-01-14 14:47

Basicly because spin is so slow . If you e code in pasm you willl have a vastly different result. The Ard... Is probably running on a compiled asm. So you are conspiring a compiled language to an interpreted language.
Jim

iseries · 2017-01-14 14:55

Both are running in C and run about the same code.

      nt = CNT/80;
      readAccelData(&ax, &ay, &az);  // Read the x/y/z adc values
      nt = CNT/80 - nt;

internal code:

  void readAccelData(int16_t* ax, int16_t* ay, int16_t* az)
  {
    uint8_t rawData[6];  // x/y/z accel register data stored here
    readBytes(MPU9250_ADDRESS, ACCEL_XOUT_H, 6, &rawData[0]);  // Read the six raw data registers into data array
    *ax = ((int16_t)rawData[0] << 8) | rawData[1] ;  // Turn the MSB and LSB into a signed 16-bit value
    *ay = ((int16_t)rawData[2] << 8) | rawData[3] ;  
    *az = ((int16_t)rawData[4] << 8) | rawData[5] ; 
  }

void readBytes(uint8_t address, uint8_t subAddress, uint8_t cnt, uint8_t * dest)
{
  i2c_in(mpu, address, subAddress, 1, dest, cnt);
}

Arduino read bytes

void readBytes(uint8_t address, uint8_t subAddress, uint8_t count, uint8_t * dest)
{  
  Wire.beginTransmission(address);   // Initialize the Tx buffer
  Wire.write(subAddress);            // Put slave register address in Tx buffer
  Wire.endTransmission(false);       // Send the Tx buffer, but send a restart to keep connection alive
  uint8_t i = 0;
  Wire.requestFrom(address, count);  // Read bytes from slave register address 
  while (Wire.available())           // Put read results in the Rx buffer
  {
    dest[i++] = Wire.read();
  }

Mike

iseries · 2017-01-14 15:36

I guess the simple answer is that most instructions on the STM32 chip run in one clock cycle where the propeller requires 4 or more clock cycles per instruction.

Mike

pmrobert · 2017-01-14 15:43

This thread may give you some insight. I'll offer a guess that the default I2C speed is slow in order that it works with more devices?

http://forums.parallax.com/discussion/158294/simple-ide-i2c-bus-speed-how-to-change

Mike II

DavidZemon · 2017-01-14 16:36

You may be interested in the I2C object in either PropWare or libpropeller (same assembly code in both, just a slightly different interface). They run at 400kHz (maybe faster? don't know)

JasonDorie · 2017-01-14 19:56

It partially depends on what you are compiling your C code as. The STM32 tools compile to native assembly code. On the Prop, you can compile C to CMM (compressed memory model), which uses an interpreter to decode compressed instructions, or you can compile to LMM (large memory model) where the instructions are compiled to native assembly and executed much quicker. Even in LMM mode the instructions have to be "fetched" into the cog and executed, which is about 1/4 the speed of actual native assembly, and the Prop only executes one assembly instruction for every four clock cycles, so your 80Mhz clock on the STM is probably 80M instructions / sec (MIPS) but on the Prop it's only 20. Native PASM executes at 20MIPS per cog, and LMM executes at about 5MIPS, though short loops can be optimized by the compiler to go even faster.

The short answer is that the STM32 is a much more powerful chip than the Prop. If you were doing the I2C read in native PASM you'd probably bet getting the same speed because of limitations of I2C itself.

iseries · 2017-01-14 20:22

That's some of the information I was looking for. To use the Propeller as a self balancing robot I need to read the MPU9260 fast enough to run a complimentary filter to determine angle. I don't need to use quaternions as the robot is one dimensional in the pitch axes.

I think I can also get away without using floating point numbers as well.

But the propeller is not close to fast enough to even read the registers let along do the simple calculations so that I can update the angle every 10 milliseconds.

I switched to the LMM memory module and got the read speed down to 1058 milliseconds from the initial 5316. A five fold increase in speed that just might make it fast enough to do the work....

I have played a little bit with the assemble code and that does make a difference in some cases. It would be nice if the compiler output some of the assembly code so that one code see what its doing and possible rework the code to something faster.

C code was invented to help developer write assemble code for phone systems at bell labs back in the day. I would hate to think it doesn't do that anymore.

Also I find it very difficult to write COG assembly routines as there doesn't seem to be a function to create a COG program. I guess the only way is to use SPIN.

Mike

JasonDorie · 2017-01-14 20:33

It might help you to look at the Elev8 flight controller firmware. It uses a combination of C in cmm mode and PASM written as Spin. I read all the sensors about 500 times/sec, perform the IMU calculations on quaternions in floating point, read the remote control, and more. The main loop updates 250 times/sec, more than enough to balance a bot.

https://github.com/parallaxinc/Flight-Controller

It's not simple code, but if you use the 8 cores on the Prop to do the work in parallel it's capable of doing a lot.

iseries · 2017-01-14 20:42

Yes, I have been watching your work with that. I would not have attempted to code that one.

I just got my head around quaternions and still have a head ache.

While the code you have there is great, I believe that the point of understanding it has gone by the wayside.

It takes too many side tracks from just plain functions and just loses some of the straight forwardness of the application.

I believe a self balancing robot maybe more straight forward than that and lead the way to better understanding of the math that makes it all work.

Mike

DavidZemon · 2017-01-14 23:19

@iseries, for performance-critical snippets of PropGCC code, you need to either code it as a driver cog (compiled down to raw PASM and then running in a dedicated cog as Jason does) or force it into fcache (as Jason might also do but hasn't specified in this thread

).

PropGCC's fcache is a reserved 64-word (256-byte) section of cog RAM that can be used to execute user code at a high speed, without reading individual instructions or byte codes out of HUB RAM. Any code that is tagged as fcache in the source will, when invoked, be copied into cog RAM prior to execution, and then invoked directly from cog RAM. You end up with a small execution delay (often looks like a hiccup) to load the code into the cog, but then a VERY HIGH performance gain once the execution begins. This scheme is ideal for serial communication like UART, I2C, and SPI, and it is exactly what both PropWare and libpropeller use.

Using PropWare's I2C object isn't hard, either. Here's an example that writes a string to the Propeller's EEPROM and then reads it back and displays it on the standard serial bus (this example requires an EEPROM larger than 32kB)

#include <PropWare/serial/i2c/i2cmaster.h>
#include <PropWare/hmi/output/printer.h>
#include <simpletools.h>

static const uint8_t MAGIC_ARRAY_1[] = "DCBA0";
static const size_t  ARRAY_SIZE_1    = sizeof(MAGIC_ARRAY_1);
static const uint8_t  SHIFTED_DEVICE_ADDR = EEPROM_ADDR << 1;
static const uint16_t TEST_ADDRESS        = 32 * 1024; // Place the data immediately above the first 32k of data

int main () {
    const PropWare::I2CMaster pwI2C;
    pwOut << "EEPROM ack = " << pwI2C.ping(SHIFTED_DEVICE_ADDR) << '\n';

    bool success = pwI2C.put(SHIFTED_DEVICE_ADDR, TEST_ADDRESS, MAGIC_ARRAY_1, ARRAY_SIZE_1);
    pwOut << "Put status: " << success << '\n';

    // Wait for write to finish
    while (!pwI2C.ping(SHIFTED_DEVICE_ADDR));

    uint8_t buffer[ARRAY_SIZE_1];
    success &= pwI2C.get(SHIFTED_DEVICE_ADDR, TEST_ADDRESS, buffer, ARRAY_SIZE_1);
    pwOut << "Get status: " << success << '\n';

    pwOut << "Returned string = '" << (char *) buffer << "'\n";

    return 0;
}

The "#include <simpletools.h>" is only there because "EEPROM_ADDR" is defined there. The rest of the above code is all PropWare. If you want to give this a go for yourself and try it with your MPU-9250, check out PropWare's download/installation instructions here.

Here's the code you'd want (untested) to replace your call to the Simple library's I2C functions:

void readAccelData(int16_t* ax, int16_t* ay, int16_t* az) {
  const PropWare::I2CMaster mpu;
  uint8_t rawData[6];  // x/y/z accel register data stored here
  mpu.get(MPU9250_ADDRESS, ACCEL_XOUT_H, &rawData[0], 6); // Read the six raw data registers into data array
  *ax = ((int16_t)rawData[0] << 8) | rawData[1] ;  // Turn the MSB and LSB into a signed 16-bit value
  *ay = ((int16_t)rawData[2] << 8) | rawData[3] ;  
  *az = ((int16_t)rawData[4] << 8) | rawData[5] ;
}

Documentation for PropWare::I2C can be found here.

Aside from the EEPROM, I don't have many I2C devices around so I can't actually test the speed of this for you. If you're willing to give it a shot, I'm very curious to hear what your results are in both CMM and LMM.

JasonDorie · 2017-01-14 23:50

iseries wrote: »

It takes too many side tracks from just plain functions and just loses some of the straight forwardness of the application.

I believe a self balancing robot maybe more straight forward than that and lead the way to better understanding of the math that makes it all work.

I agree with you - I had to do a lot of things to make it fit, and make it fast enough, that sacrifice readability. That said, the approach of having PASM drivers and C or Spin main code is a common practice on the Prop. The cogs on the Prop were largely intended to replace the peripherals you find on other chips. Instead of having 2 UARTs and a couple I2C or SPI engines on pre-set pins in an Atmel or STM chip, you implement them in PASM (or use code others have already written) and run them in a cog. This allows the parts that need to be fast to be fast, and still lets you have some flexibility with the way things are processed.

In the Sensors driver from the Elev8 code, I read a gyro, accelerometer, magnetometer, and barometer, compensate for temperature drift in the gyro readings, turn the barometer pressure reading into an altitude estimate, and send out values for a string of WS2812 LEDs. You could get away with much less - just reading the data from the IMU, and setting the Y-rotation in a variable for another cog to use, and that would be pretty simple to write in C.

On my Sideway project, I have a cog that reads the sensors, and one that runs the math (like on the Elev8), one to read the controller, one to run the motors (the Servo32 object from the Obex), and one to run the main loop and balance code. Splitting things into tasks like this means that they all run in parallel with each other, and all run quite fast. It started like this:

working with mpu9250 sparkfun board

Comments