The Teacup Port - A Work In Progress - 3D Printer Firmware

idbruce · 2015-03-21 09:48

I am almost ready to admit defeat and start my own firmware based upon this idea: http://forums.parallax.com/showthread.php/159926-Introducing-quot-SyncroStepper-quot-Syncronization-For-Multi-Axis-Machines

Dave Hein · 2015-03-21 09:56

I think the ATOMIC macros can be defined as

#define ATOMIC_START while (lockset(locknum));
#define ATOMIC_END lockclr(locknum);

You just need to allocate locknum at the beginning of the program.

Dave Hein · 2015-03-21 10:01

Or to be safe you could include the memory barrier stuff.

#define ATOMIC_START { \
                       while (lockset(locknum)); \
                       MEMORY_BARRIER();

#define ATOMIC_END   MEMORY_BARRIER(); \
                     lockclr(locknum); \
                   }

idbruce · 2015-03-21 10:03

This solution has already been implemented and still the grief remains, but I have not altered the code yet for the timer and queue_set_timer

idbruce · 2015-03-21 10:05

Sorry, that did not post...

#define ATOMIC_START while (lockset(locknum));
#define ATOMIC_END lockclr(locknum);You just need to allocate locknum at the beginning of the program.

has already been implemented.

or similar I should say.

All ATOMIC_STARTs have been replaced with while (lockset(locknum));
and all ATOMIC_ENDs have been replaced with lockclr(locknum);

idbruce · 2015-03-21 10:45

Dave

Here is a project build outlined as you described, however nothing has been done with cli() or sei() at this point. Still no go.

idbruce · 2015-03-21 11:00

I have a new hunch...

I do believe the true source of my grief is from the following source in timer.c. This timer as you know is running in a seperate cog and I believe the call to dda_clock is screwing things up. The second lock ID is used strictly in dda_clock() and dda_clock() is only called from the timer.

Okay so the timer cog is calling dda_clock, and during this call, variables are supposed to be modified or updated. Since the timer is in a seperate cog, wouldn't all these variables that are being modified or updated need to be "volatile"?

EDIT: Now lets say that most of these items being modified or updated are part of a DDA or TARGET struct/structure, could I just declare the whole struct to be volatile and would everything contained within become volatile???

idbruce · 2015-03-21 21:14

I have been thinking about this project all day long and the problems associated with it. Do I abandon a hopeless cause or do I cut a new path where no man has tread before? It is truly a difficult decision, because I have a lot of time wrapped up in this project. I never like to admit defeat, so I have decided to go another round, before throwing in the towel, however this time, I will be attacking my opponent with a different strategy.

It is my opinion that in it's current state, this firmware is like a stick, with the primary cog whittlling away on one end, while another cog whittles away on the other end, and they end up crashing into each other, or so it seems. I believe that the secondary cog is unable to access the data, because it is not classified as volatile and therefore the cog fails to do it's job correctly.

So the question of the day has been, "How do I make this firmware work with the Propeller chip. Well in it's current state or close to it, I don't believe that it can. However, I do believe that with some serious surgery and modification, it just might work. My theorhetical plan is to cut the firmware in two. The primary cog will operate within the first half and the secondary cog will operate in the other half, and the splitting point will be the dda queue. So the primary thread will read and process the gcode until it gets set in the queue, which will be volatile. And the secondary thread will pop items off the queue and process it after that point.

Of course there are other cogs involved, but they will have special tasks that are unassociated with movement or the processing of the gcode.

D.P · 2015-03-21 23:15

idbruce wrote: »

I have been thinking about this project all day long and the problems associated with it. Do I abandon a hopeless cause or do I cut a new path where no man has tread before? It is truly a difficult decision, because I have a lot of time wrapped up in this project. I never like to admit defeat, so I have decided to go another round, before throwing in the towel, however this time, I will be attacking my opponent with a different strategy.

It is my opinion that in it's current state, this firmware is like a stick, with the primary cog whittlling away on one end, while another cog whittles away on the other end, and they end up crashing into each other, or so it seems. I believe that the secondary cog is unable to access the data, because it is not classified as volatile and therefore the cog fails to do it's job correctly.

So the question of the day has been, "How do I make this firmware work with the Propeller chip. Well in it's current state or close to it, I don't believe that it can. However, I do believe that with some serious surgery and modification, it just might work. My theorhetical plan is to cut the firmware in two. The primary cog will operate within the first half and the secondary cog will operate in the other half, and the splitting point will be the dda queue. So the primary thread will read and process the gcode until it gets set in the queue, which will be volatile. And the secondary thread will pop items off the queue and process it after that point.

Of course there are other cogs involved, but they will have special tasks that are unassociated with movement or the processing of the gcode.

Are you able to produce a simple producer cog and consumer cog program in C that utilizes locks and a "volatile queue" so you thoroughly understand what you are trying to accomplish with respect to queing and dequeing with two cogs? Can you then add in reading from the SD card for queue elements? Can you then add some timer actions to the two cogs to replicated interrupts? When this "simple" program is running and understood you may find your opponent not so difficult.

Rooting for you.

idbruce · 2015-03-22 05:17

D.P

Rooting for you.

I can certainly appreciate that, because a little moral support can goes a long way.

Are you able to produce a simple producer cog and consumer cog program in C that utilizes locks and a "volatile queue" so you thoroughly understand what you are trying to accomplish with respect to queing and dequeing with two cogs? Can you then add in reading from the SD card for queue elements? Can you then add some timer actions to the two cogs to replicated interrupts? When this "simple" program is running and understood you may find your opponent not so difficult.

I am fully confident that I could simulate what I want to achieve, however, doing it with the actual code is quite a different story, but I do think it can be done and I think that forementioned plan is the way to do it. By splitting the program in two, locks will only be required when reading or writing to the queue, and all other locks currently employed, should become useless.

Of course there will be obstacles with my current strategy, but one thing is certain, I am currently at a deadlock with the current programming, in the way that it is written.

So first things first..... On my computer, I will be deleting all copies of my previous attempts, just to get those distractions out of the way and to eliminate any confusion. I will be starting this new strategy with a fresh download of the code in Post #139 and fresh thoughts.

EDIT: See below

I will be starting this new strategy with a fresh download of the code in Post #139

Instead of using the archive in Post #139, I will be using the archive of Post #162

Dave Hein · 2015-03-22 06:13

Bruce, why don't you add prints to determine where your code is hanging? Then you'll have a better idea of what it's waiting for. Also, the program size is something like 31,800 bytes, which leaves you with less than 1K for the stack and heap. That could be the cause of your problem. Try trimming away unnecessary stuff to reduce the program size a bit. Or add a routine like the following that you would call at different points to monitor the amount of stack/heap space.

void PrintSpace(void)
{
    char *ptr = malloc(1);

    if (ptr)
    {
        printf("Space left = %d bytes\n", (int)&ptr - (int)ptr);
        free(ptr);
    }
    else
        printf("No space left\n");
}

idbruce · 2015-03-22 07:27

Dave

Bruce, why don't you add prints to determine where your code is hanging? Then you'll have a better idea of what it's waiting for.

I have added and removed hundreds of print commands

I have a pretty good idea of what is occuring and there are basically two problems. Okay more than two.

1) Two cogs cannot battle for the terminal output without simpleterm_open() and simpleterm_close().
2) Two cogs are calling queue_step and this won't work. As mentioned, they are working against each other.
3) For one cog to be able to access another cogs data, it must be volatile.

Additionally, as mentioned, the addition of the function queue_set_timer within dda_queue is a major source of grief. For example, take the archive of Post #162 and first run it as is, to see the result. Then modify queue_set_timer as shown below and you will see the difference.

/// Wait specified time, then call queue_step()
void queue_set_timer(uint32_t delay)
{
//	waitcnt(CNT + delay);
//	queue_step();
}

Also, the program size is something like 31,800 bytes, which leaves you with less than 1K for the stack and heap.

Yea it's up there! The initial programmers did a real good job of trimming the fat. It will be very difficult to remove substantial sections of code. I don't know what happened, but upon going to the new set of Teacup files, with very few changes, the code size increased dramatically. If I go back to the old files, then I have plenty of room, but then I also have potential errors and I lose any benefit gained from the new files, whatever that may be

void PrintSpace(void)
{
    char *ptr = malloc(1);

    if (ptr)
    {
        printf("Space left = %d bytes\n", (int)&ptr - (int)ptr);
        free(ptr);
    }
    else
        printf("No space left\n");
}

Thanks for the snippet. Looks like that could be quite handy.

idbruce · 2015-03-22 09:46

Referring to my previous posts about splitting the firmware in two, I do believe the proper place for this split would be in the enqueue_home function of dda_queue. The image below indicates the exact position where I intend to make the split.

For those of you who have been following along and studying the source code, and agree that a split is necessary to achieve functionality, please speak your mind if you think I have picked the wrong location, but please be prepared to back up your theories.

attachment.php?attachmentid=113583&d=1427042791

attachment.php?attachmentid=113583&d=1427042791

Heater. · 2015-03-22 10:41

Bruce,

I have no idea not having studied the code so hard.

My approach might start something like this:

1) Draw up a map of the code. A call graph. Starting from main() or whatever it is in Arduino land, list all the functions that it calls, for each one of those list all the functions they call and so on down to the "leaves" of the tree that make no calls or perhaps only make OS/HAL calls.

There are even a munch of free tools that will extract this info and make call graphs for you. http://stackoverflow.com/questions/517589/tools-to-get-a-pictorial-function-call-graph-of-code

2) Find all the other entry points to the program. Interrupt handlers. Things run off timer ticks. Make similar call graphs for those.

Now you have 1 or 2 or 3 or more separate independent trees drawn up.These represent separate threads of execution. I would be looking at getting each of those running in it's own COG.

Clearly different "trees" here will be communicating through that queue mechanism at some point.

Before you start, sign up to github and fork the original project. Clone a copy of your fork to your local machine and work there.

idbruce · 2015-03-22 10:55

Heater

There are even a munch of free tools that will extract this info and make call graphs for you. http://stackoverflow.com/questions/5...-graph-of-code

Now that sounds cool

I was going to make a graph earlier, but decided it would be a huge pain, but a tool for the job would be nice. I am already well into it, but I will use that to check my work.

Before you start, sign up to github and fork the original project. Clone a copy of your fork to your local machine and work there.

Like I said, I am well into the project now and many things have been changed.

For those that may be interested, I was wrong about the split location, but I was only off by a couple lines. The following image shows the new location of the split of the enqueue_home function of dda_queue.

attachment.php?attachmentid=113585&d=1427046924

attachment.php?attachmentid=113585&d=1427046924

Heater. · 2015-03-22 11:01

Bruce,

I am well into the project now and many things have been changed.

I thought I just read you were deleting all that and starting again.

idbruce · 2015-03-22 11:13

I have been working at it.... I am freaky fast

The following code shows the altered enqueue_home function and the next move function has also been slightly altered to remove the while loop. The print codes will still either have to be removed or modified.

/// add a move to the movebuffer
/// \note this function waits for space to be available if necessary, check queue_full() first if waiting is a problem
/// This is the only function that modifies mb_head and it always called from outside an interrupt.
void enqueue_home(TARGET *t, uint8_t endstop_check, uint8_t endstop_stop_cond)
{
	// don't call this function when the queue is full, but just in case, wait for a move to complete and free up the space for the passed target
	while(queue_full())
	{
		waitcnt(us * 100 + CNT);
	}

	uint8_t h = mb_head + 1;
	h &= (MOVEBUFFER_SIZE - 1);

	DDA* new_movebuffer = &(movebuffer[h]);

	// Initialise queue entry to a known state. This also clears flags like
	// dda->live, dda->done and dda->wait_for_temp.
	new_movebuffer->allflags = 0;

	if(t != NULL)
	{
		new_movebuffer->endstop_check = endstop_check;
		new_movebuffer->endstop_stop_cond = endstop_stop_cond;
	}
	else
	{
		// it's a wait for temp
		new_movebuffer->waitfor_temp = 1;
	}

	dda_create(new_movebuffer, t);

	// make certain all writes to global memory
	// are flushed before modifying mb_head.

	MEMORY_BARRIER();

	mb_head = h;
}

/// go to the next move.
/// be aware that this is sometimes called from interrupt context, sometimes not.
/// Note that if it is called from outside an interrupt it must not/can not by
/// be interrupted such that it can be re-entered from within an interrupt.
/// The timer interrupt MUST be disabled on entry. This is ensured because
/// the timer was disabled at the start of the ISR or else because the current
/// move buffer was dead in the non-interrupt case (which indicates that the 
/// timer interrupt is disabled).
void next_move()
{
	// next item
	uint8_t t = mb_tail + 1;
	t &= (MOVEBUFFER_SIZE - 1);
	DDA* current_movebuffer = &movebuffer[t];

	// tail must be set before setTimer call as setTimer
	// reenables the timer interrupt, potentially exposing
	// mb_tail to the timer interrupt routine. 
	mb_tail = t;

	if(current_movebuffer->waitfor_temp)
	{
		print("Waiting for target temp\n");
		current_movebuffer->live = 1;
		queue_set_timer(HEATER_WAIT_TIMEOUT);
	}
	else
	{
		dda_start(current_movebuffer);
	}
}

Source: Teacup Firmware

And here is the code for the other part of the split.


/** \file
	\brief Timer management - system clock
*/

#include "queue_monitor.h"

/// cog variables
static int queue_monitor_cog = 0;
static int queue_monitor_stack[60];

void queue_monitor(void *par);

/// initialise timer and enable system clock interrupt.
/// step interrupt is enabled later when we start using it
void queue_monitor_init()
{
	if(queue_monitor_cog == 0)
	{
		queue_monitor_cog = 1 + cogstart(&queue_monitor, NULL,
			queue_monitor_stack, sizeof queue_monitor_stack);
	}
}

/// stop timers - emergency stop
void queue_monitor_stop()
{
	if(timer_cog > 0)
	{
		cogstop(queue_monitor_cog - 1);
		queue_monitor_cog = 0;
	}
}

/// start the queue monitor
void queue_monitor(void *par)
{
	while(queue_empty() == 0)
	{
		uint8_t isdead;

		ATOMIC_START
 
		isdead = (movebuffer[mb_tail].live == 0);

		ATOMIC_END
  
		if(isdead)
		{
			next_move();
		}
	}                            
}

Dave Hein · 2015-03-22 11:42

Bruce, I don't quite follow your dividing line idea. There is already a dividing line in the code. The main thread queues up moves, and the ISR dequeues it and drives the stepper motors. Maybe the problem is that the main thread and ISR functions are included in the same source files. It might make it easier if you split files that contain both main thread and ISR code into two files.

If I get a chance I'll try changing the simulator mode code so that it runs on two cogs. This might help you understand how to get your code to run. Or maybe you can just use the simulator code. I still don't understand why you don't try that. It will save you a lot of time.

idbruce · 2015-03-22 12:23

Dave

Heck I don't know, I am grasping at straws at this point. I will take a look at the simulator code later tonight and see if I can get a grasp on it, but odds are that I wasted a lot of time trying to get this to work on a Propeller.

Dave Hein · 2015-03-22 19:03

I tried the simulator with a second cog running the timers, and it runs for a few steps, but then it hangs. I added some debug prints, and there are ATOMIC_START/END scatter through the code that runs from the timer. This doesn't make sense to make since the timer ISR routines should have full control over the CPU on the AVR. It seems like the ATOMIC_START/END should only be necessary in the non-ISR code.

So I see two possible approaches. One is to identify the ATOMIC_START/ENDs that are called from the ISR code, and disable them. The second approach would be to run the timer as fast as possible on the same cog as the main code and put the stepper commands on another queue. Then a second cog would read the stepper queue and issue the actual stepper pulses at the correct rate.

idbruce · 2015-03-22 20:45

Dave

Here is the way I see it.....

Locks should only be required in a multi-threaded application, or at least that is what I have always believed, but I could be wrong about that. However, somehow or another, even though it is a single core processor, as you say, I think that the interrupt routines are run as seperate threads. Is this possible?

idbruce · 2015-03-22 21:03

it runs for a few steps, but then it hangs

If you are running into the same problems I have..... Then I believe it is because the queue is not being popped when an item has been processed.

As set in config.h, the queue has a maximum level of eight. In the main, processing only continues if the queue is not full. Just as soon as a move has been made, that move is removed from the queue, and the queue array moves forward, leaving an empty spot. If it is not removed, then the queue eventually fills and processing stops.

idbruce · 2015-03-22 21:42

Dave

After my last post, I toyed around with the "MOVEBUFFER_SIZE" in config.h. First I changed it to 30, but the hub overflowed, then I changed it to 6.

I do believe we have progress

Current position is changing, although not 100% accurate, but getting much closer.

The following build utilizes my dividing line theory, plus locks.

Keep in mind the terminal output limitation for cogs!!!! Only DEBUG_ECHO can be used for output, because other prints are in another cog. The command M114 in the GCODE file, prints the current position for the cog that has access (in our case Cog 0).

However there are a lot of build warnings and discrepancies that will be difficult to overcome.

USE THE ATTACHED DEBUG.TXT because it has the embedded M114 commands.

####WARNING######
If you decide to run this project, please ensure that you set the proper IO pins in config.h!!!

idbruce · 2015-03-22 22:05

Here is another build, with slightly different volatile settings....

I think this one is slightly more accurate.

In this and the previous build, please note the addition of queue_monitor.c + h. Additionally, note changes in enqueue_home and next_move.

EDIT: Additionally, time_clock.c + h have been removed from the project.
EDIT: Really need to get DEBUG_DDA back into play, because that is where the real data is!!!

Also use the same DEBUG.TXT file above.

####WARNING######
If you decide to run this project, please ensure that you set the proper IO pins in config.h!!!

idbruce · 2015-03-22 23:12

I don't know if there is a significant difference between the terminal output of the last build and this one, but after changing the movebuffer size from 6 to 7 for this build, I closely examined the output. In all reality, it is pretty darn close

REMAINING PROBLEMS:

1) The current position of F is not being reported.
2) The E value keeps growing

Besides that, I think we are very close to having a functional Teacup port.

Test with DEBUG.TXT attached in an above post.

EDIT: I got to thinking that DEBUG_DDA is in a different cog, but then I also thought that DEBUG_POSITION was in the primary cog, so I uncommented that debug and I now have a lot more interesting terminal output

Instead of me uploading another build, in the attached build of this post, make the following changes in the main function of Forger.c:

FROM:

//	debug_flags |= DEBUG_POSITION;//gcode_process-M114//

TO:

	debug_flags |= DEBUG_POSITION;//gcode_process-M114//

####WARNING######
If you decide to run this project, please ensure that you set the proper IO pins in config.h!!!

Heater. · 2015-03-22 23:48

Bruce,

However, somehow or another, even though it is a single core processor, as you say, I think that the interrupt routines are run as seperate threads. Is this possible?

I will reiterate.

A background loop and an interrupt are equivalent two two threads. When that interrupt fires off the execution of the background loop is halted, wherever it happens to be, and the interrupt routine is run instead. When the interrupt routine is done control returns to the background code.

Can you see that this is the same as having an operating system suddenly switch execution from one thread to another. And then back again some time later?

Can you see that this is the same as having those two threads run at the same time on two COGs/cores?

The difficulty in all cases is that detail about "wherever it happens to be". If code happens to be in the middle of updating/reading some shared data when control is suddenly transferred elsewhere, and that "elsewhere" happens to want to use the same shared data, then you have a race condition and corrupt data.

To overcome that race condition you need atomic access to the share data, "mutual exclusion". The mechanism to get mutual exclusion is LOCKS in the case of threads or simple enable/disable of interrupts in that case.

Heater. · 2015-03-22 23:59

Dave,

...there are ATOMIC_START/END scatter through the code that runs from the timer. This doesn't make sense to make since the timer ISR routines should have full control over the CPU on the AVR. It seems like the ATOMIC_START/END should only be necessary in the non-ISR code.

I have no idea about interrupts on AVR. Is it possible that interrupts are not automatically disabled on the AVR?

Of course, even if they are, is it possible that the timer ISR re-enables them immediately on entry?

After all mutual exclusion between between ISR code and backgound/other ISR code need only happen when accessing shared data. Interrupts need not be disabled through out the entirety of an interrupt routine.

Of course this approach is only of value if there are multiple interrupts that can fire off at the same time. Or God forbid a re-entrant interrupt handler.

Given the logical correspondence between interrupt handlers and threads it make sense to put ATOMICxx around critical regions in the code. You never know if that code gets used in an environment with threads instead of interrupts. What actually happens inside those ATOMICxxx functions/macros is another matter.

idbruce · 2015-03-23 00:03

Heater

Just wait until you see my next upload

idbruce · 2015-03-23 00:59

I now submit a build with significant progress!

I had forgotten that I commented out several of the "print" commands within dda_queue, therefore DEBUG_ECHO, DEBUG_POSITION, and DEBUG_DDA are all valid.

It is still not 100% functional, but hey, I am close

When testing this build, please use the new and altered DEBUG.TXT file below.

####WARNING######
If you decide to run this project, please ensure that you set the proper IO pins in config.h!!!

Dave Hein · 2015-03-23 06:34

I got the simulator mode to work with 2 cogs. The problem was that there are ATOMIC_START/END statements in the functions called by the timer, and this doesn't work with using locks. This is because I set the lock when calling the ISR functions from the timer code, and then the code tries to set the lock again. The solution is to only do the lockset/lockclr for the main thread, and not for the timer thread. I do this by checking the cog ID, and I only set and clear the lock for cog ID 0. So ATOMIC_START/END macros now look like:

  #define ATOMIC_START { \
                         MEMORY_BARRIER(); \
                         atomic_start();

  #define ATOMIC_END     MEMORY_BARRIER(); \
                         atomic_end(); \
                       }

and I created atomic_start and atomic_end functions as follows:

void atomic_start(void)
{
    if (!cogid())
    {
        while (lockset(locknum));
    }
}

void atomic_end(void)
{
    if (!cogid())
    {
        lockclr(locknum);
    }
}

Now the simulator mode works with 2 cogs. This can be easily adapted to driving a real 3D printer by changing the functions in the simulator directory.

The Teacup Port - A Work In Progress - 3D Printer Firmware

Comments