Kogna debug suggestions

SJHardy · Post by **SJHardy** » Wed Mar 06, 2024 9:39 pm

WNTS will cause too much delay, change execution order, and hide the bug I'm looking for. I just need to delay it for a few instructions so I can handle the circular buffer allocation atomically. E.g. have a function called log(data, datalen) where it allocates the next buffer of size datalen, and updates the circular buffer head and tail pointers.

Whatever the trick used for AtomicSet() etc., I'm going to have to use that, even if it means writing some in-line asm.

Post by **TomKerekes** » Wed Mar 06, 2024 10:52 pm

handle the circular buffer allocation atomically

I'm not sure what that means. Does it only involve several assembly instructions?

I've attached the assembly code we use for Test and Set and so forth. Branches are delayed by 5 cycles and interrupts are disabled for those 5 cycles so if instructions are executed within those 5 cycles they are guaranteed to be uninterrupted.

btw Disabling interrupts on this processor is somewhat tricky as the disable is pipelined and it is possible to be interrupted while interrupts have been turned off. So for example if one Thread attempts to turn off interrupts briefly, but then get interrupted, context might be switched to another Thread without the possibility of getting an interrupt to switch back to the original Thread to turn Interrupts back on. We never disable interrupts.

SJHardy · Post by **SJHardy** » Thu Mar 07, 2024 12:59 am

I'm not sure what that means. Does it only involve several assembly instructions?

Yes. I'm thinking along the lines of having a circular buffer, with head and tail pointers. To allocate space, the head is atomically incremented by the required space (modulo buffer size). If necessary, the oldest record(s) (at tail) are deleted. Records would be small compared with the buffer, so I wouldn't bother splitting the last record - each logging record would be contiguous. You've probably already got something like this to handle console printfs, so it's a solved problem! The difference is that I have a vast quantity of small records that I want to preserve over reboot.

Maybe something like this (although I can see some issues already):

int * CompareAndModify(int ** ptr, int * oldptr, int * newptr, int data);

If and only if *ptr == oldptr, set **ptr = data, then set *ptr = newptr. In either case, return initial *ptr. Caller compares that with oldptr and retries if different.

data will be the length of the record allocated, so that the record chain is always valid.

Of course, it's a big learning curve to be able to write pipelined and parallel C67xx code, but 5 delay slots is a lot of time so my gut feeling is that it's possible.

Post by **TomKerekes** » Thu Mar 07, 2024 1:38 am

That sounds like it would take more than 5 CPU Cycles.

But it's not clear what the problem is? Are you expecting multiple Threads to be logging?

In that case you might use a Mutex.

Code: Select all

int logMutex=0;

	MutexLock(&LogMutex);
	.
	.
	(manipulate Log)
	.
	.
	MutexUnlock(&LogMutex);

Console Messages use an array of fixed length character strings. So its just a matter if incrementing the array index.

btw I don't see why you are concerned about small delays effecting the timing. Would a WNTS significantly effect things? Are you going to be logging thousands of messages per second? If the problem was perfectly consistent then maybe I'd be worried about changing the timing, but if it more or less random why would timing make a difference?

SJHardy · Post by **SJHardy** » Thu Mar 07, 2024 2:13 am

btw I don't see why you are concerned about small delays effecting the timing. Would a WNTS significantly effect things? Are you going to be logging thousands of messages per second? If the problem was perfectly consistent then maybe I'd be worried about changing the timing, but if it more or less random why would timing make a difference?

Well a WNTS takes 180us. I'm trying to find close to the last thing that happened before the crash, which means using the compiler function call/return hook and maybe other explicit log calls, so yeah 10's of thousands per sec. I want to add minimum overhead. I guess a mutex would be ok, so I'll probably use that rather than using a more direct approach. Thanks for the heads-up, I'd forgotten about that.

If mutex uses WNTS under the covers, then that won't work.

It's not really that random. The messages usually stop at or near the same point, but there are many instructions executed between each message, so the message itself is not helping me that much. I don't even know for sure which thread cops it.

Normally, it's not hard to find the cause of a crash: you just ask yourself what what changed since it was working. GIT etc. help to bracket the point where the bug was introduced. In this case, since it's porting the entire code base to a new platform, there is effectively no "history" - backing out the change means not using the Kogna.

Post by **TomKerekes** » Thu Mar 07, 2024 6:05 pm

Here is the mutex logic:

Code: Select all

// used to allow mutual exclusive access to a resource
// (waits until resource is available, then locks it)
// if the thread that locked it is no longer active,
// release the lock

// mutex (32 bit int) consists of:
//     high 16 bits thread number who locked it
//     low 16 bits number of times locked by same Thread 

volatile int ThreadWaiting = -1;

void MutexLock(int *mutex)
{
	int result;
	
	// wait to give others a chance if we don't already own it, someone is waiting, it isn't us, and the one waiting is still running
	while ((*mutex == 0 || (*mutex>>16) != CurrentThread) && ThreadWaiting != -1 && ThreadWaiting != CurrentThread)
	{
		if ((ThreadActive & (1 << ThreadWaiting)) == 0) ThreadWaiting = -1; // no longer active, so no longer waiting
	}

	while (result = TestAndSet(mutex, (CurrentThread<<16) + 1))  // Set Thread  and count to 1 if not in use
	{
		// mutext is not available
		//
		// check if the thread that locked it
		// is still executing
		// or if it was this thread
		
		if (((ThreadActive & (1 << (result>>16))) == 0))  // Thread that locked it no longer executing?
		{
			*mutex = ((CurrentThread + 1) << 16) + 1; // yes take it
			return;
		}
		else if ((result>>16) == CurrentThread) // was it already locked by the same Thread?
		{
			(*mutex)++; // yes, increment count
			return;
		}
		
		if (ThreadWaiting == -1) ThreadWaiting = CurrentThread;
	}
}

// unlocks a resourse 
void MutexUnlock(int *mutex)
{
	if ((*mutex & 0xFFFF) == 0) // already unlocked?
	{
		printf("Mutex Error\n");
		return;
	}


	if ((*mutex >> 16) == CurrentThread) // was it already locked by the same Thread?
	{
		// yes
		if ((*mutex & 0xFFFF) == 1) // count of 1?
		{
			if (CurrentThread == ThreadWaiting) ThreadWaiting = -1;
			*mutex = 0;  // yes, release
		}
		else
		{
			(*mutex)--; // no, decrement count
		}
	}
	else
	{
		printf("Mutex Error\n");
	}
}

btw can you get your code to run and crash without hardware? Can you delete sections to determine where is the cause?

If this becomes very involved you might try using CCS to debug. There is a low cost Emulator (TMS320-XDS100-V3) for this DSP. That would allow you to set hardware breakpoints and watchpoints.

SJHardy · Post by **SJHardy** » Fri Mar 08, 2024 5:25 pm

That mutex implementation looks like it would work. Only very rarely would it spin until the end of its allotted thread time.

I'm already running with minimal hardware. We have a "dummy machine" mode that runs with the bare breakout board and not much else - probably 90% of development is done that way, and the rest is fixing real-world bugs - I've only crashed $10k of machines instead of $1M

. I have been trying to delete/simplify code sections, but the whole system is very interdependent so it's hard to isolate things that way.

Anyway, I discovered some code that I wrote many years ago and forgot about, which does some of the logging which I want. Currently in the process of bring it up to date so it will work on the kogna (bigger gather buffer etc.)

Post by **TomKerekes** » Fri Mar 08, 2024 8:35 pm

Btw KFLOP/Kogna temporarily uses the beginning of the gather_buffer (< 128KB) during bootup so don't expect that to be preserved after re-boot.

SJHardy · Post by **SJHardy** » Fri Mar 08, 2024 11:53 pm

On the kflop, I used to have a reset pushbutton between JP6 pins 4 (reset#) and 8 (gnd), which would allow me to reboot without losing memory contents. Same trick on the kogna does not seem to work.

I do notice on the usb console that a message "No Ethernet Link" is issued, but it otherwise seems to ignore the hardware reset.

I now have a program I can run after reset (and before running any of my code) which will read out selected gather buffer contents, such as my large execution trace buffer. But that won't work unless I can reset without power cycling. Any ideas?

SJHardy · Post by **SJHardy** » Fri Mar 08, 2024 11:58 pm

OOps, I just saw your most recent reply. I do in fact use the first 64k for a major control block. I guess I can make a shadow copy of that, or even move it to a different location. Most of what I'm interested in is in the top half of the gather buffer.

Dynomotion Forum

Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions

Re: Kogna debug suggestions