
CLOO: Memory Leaking Troubles
Posted Tuesday, 2 February, 2010 - 13:01 by CodyIrons inHi guys hope all is going well,
I'm running into a bit of an issue when using CLoo (o.5.1 and 0.6.0) and attempting to have a program call a kernel an obscene amount of times over the coarse of a benchmark. I'm just using the vectorAdd kernel and basically what i'm doing is generating a set of random inputs each iteration then calling the kernel to compute. I've tried to separate as much as possible the setup and execution of the kernel as i would like this to become fairly modular in any future projects I decide to do. But I just can't seem to pinpoint what could be causing this leak.
The leak seems to be revolving around the creating and disposing of the ComputeBuffers (but if it's some of my code please feel free to say so.) The project is only two classes and i tried to document them as much as possible.
Class one is Program.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Cloo; namespace OpenCLTesting { class Program { static void Main(string[] args) { ///first lets collect all the compute devices on all detected platforms ///this may be a little overkill but i was experimenting with ideas at the time Dictionary<ComputeContext, List<ComputeDevice>> thesystem = new Dictionary<ComputeContext, List<ComputeDevice>>(); List<ComputeDevice> allDevices; ComputeContext context; foreach(ComputePlatform cp in ComputePlatform.Platforms) { ComputeContextPropertyList propertyList = new ComputeContextPropertyList(cp); context = new ComputeContext(ComputeDeviceTypes.All, propertyList, null, IntPtr.Zero); allDevices = new List<ComputeDevice>(); foreach(ComputeDevice cd in context.Devices) { allDevices.Add(cd); } thesystem.Add(context, allDevices); } string vectorAddKernel = @"kernel void vectorAdd(global read_only float * a, global read_only float * b, global write_only float * c) { // Vector element index int nIndex = get_global_id(0); c[nIndex] = a[nIndex] + b[nIndex]; }"; int testSize = 1024; int testLengthInSeconds = 60; ///we will be tested each device in each platform for performance ///we will print off the data so it can be collected and observed foreach (KeyValuePair<ComputeContext, List<ComputeDevice>> platform in thesystem) { ComputeContext localContext = platform.Key; List<ComputeDevice> devices = platform.Value; foreach (ComputeDevice device in devices) { //Testing each individual device Console.WriteLine("DevicePlatform = {0}",device.Platform.Name); Console.WriteLine("DeviceName = {0}",device.Name); Console.WriteLine("DeviceCUnits = {0}", device.MaxComputeUnits); Console.WriteLine("DeviceSpeed = {0}", device.MaxClockFrequency); float result = testDevice(localContext, device, testLengthInSeconds, testSize, vectorAddKernel, "vectorAdd"); Console.WriteLine("DevicePerformance = {0}",result); //float result1 = testYetAgain(localContext, device, testLengthInSeconds, testSize, vectorAddKernel, "vectorAdd"); //Console.WriteLine("DevicePerformance1 = {0}", result1 ); } } Console.ReadLine(); } /// <summary> /// here we attempt to separate the oonstruction of the values needed /// by the kernel during execution time. This is being done in an /// attempt to make execution of the kernel slightly more abstracted /// </summary> /// <param name="localContext">the compute context</param> /// <param name="device">the device we are running on</param> /// <param name="lengthOfTestSeconds">how long in seconds we would like the test to run</param> /// <param name="testSize">the number of objects the kernel will be performing in one batch 'these are just floats for now'</param> /// <param name="kernelSource">the kernels source</param> /// <param name="kernelName">the name of the kernel in the source</param> /// <returns>a float value representing how well this device performed</returns> private static float testDevice( ComputeContext localContext, ComputeDevice device, int lengthOfTestSeconds, int testSize, string kernelSource, string kernelName) { float result = 0.0f; //create our test kernel instance Test2Kernel t2k = new Test2Kernel(localContext, device, kernelSource, kernelName); //the number of inputs to the kernel int inputCount = 2; List<float[]> inputs = new List<float[]>(); for (int i = 0; i < inputCount; i++) { float[] arrI = new float[testSize]; inputs.Add(arrI); } //an array for our outputs int outputCount = 1; List<float[]> outputs = new List<float[]>(); for (int i = 0; i < outputCount; i++) { float[] arrC = new float[testSize]; outputs.Add(arrC); } //just setting up random number gen some timing stuff and the //arrays list that will store all of our communication with the kernel Random rand = new Random(); DateTime start = DateTime.Now; TimeSpan testLength = new TimeSpan(0, 0, lengthOfTestSeconds); List<float[]> arrays = new List<float[]>(); //just looping through till time is up while ((DateTime.Now - start) < testLength) { //clear our array arrays.Clear(); //for the size of the test, (currently set to 1024) for (int i = 0; i < testSize; i++) { //for each input buffer we will genereate a random double for (int l_inputs = 0; l_inputs < inputCount; l_inputs++) { inputs[l_inputs][i] = (float)(rand.NextDouble() * 100); } //for each output buffer we will initialize to zero for (int l_outputs = 0; l_outputs < outputCount; l_outputs++) { outputs[l_outputs][i] = 0.0f; } } //now loop through our inputs and add them to arrays for (int i = 0; i < inputCount; i++) { arrays.Add(inputs[i]); } //do the same with outputs for (int i = 0; i < outputCount; i++) { arrays.Add(outputs[i]); } //now perform our calculation by lettting the kernel know how long //the test is, where the arrays are, the number of inputs and the //number of outputs t2k.performCalculation(testSize, ref arrays, inputCount, outputCount); //just some debuging to let us know it is running //Console.WriteLine("{0}){1} + {2} = {3}", result, arrays[0][0], arrays[1][0], arrays[2][0]); result++; //desparate attempt to get the memory leak to go away GC.Collect(); GC.WaitForPendingFinalizers(); } //we are calcualting performance as the size of the test (so 1024) //multiplied by the number of times it got through the loop //devided by the length of the test //this should possibly give us something like calculations per second return (float)testSize * result / lengthOfTestSeconds; } /// <summary> /// flattened version of test2kernel to rule out anything weird happening /// in our test2kernel instance based test. /// </summary> /// <param name="localContext"></param> /// <param name="device"></param> /// <param name="lengthOfTestSeconds"></param> /// <param name="testSize"></param> /// <param name="kernelSource"></param> /// <param name="kernelName"></param> /// <returns></returns> public static float testYetAgain(ComputeContext localContext, ComputeDevice device, int lengthOfTestSeconds, int testSize, string kernelSource, string kernelName) { float result = 0.0f; ComputeProgram m_computeProgram; ComputeKernel m_computeKernel; ComputeBuffer<float> tempBuffer; ComputeCommandQueue m_queue; ComputeBuffer<float>[] m_buffers; m_computeProgram = new ComputeProgram(localContext, new string[] { kernelSource }); m_computeProgram.Build(null, null, null, IntPtr.Zero); m_computeKernel = m_computeProgram.CreateKernel(kernelName); m_queue = new ComputeCommandQueue(localContext, device, ComputeCommandQueueFlags.None); m_buffers = new ComputeBuffer<float>[3]; // the number of values we want to run through the kernel each pass //int count = 10; //the number of inputs to the kernel int inputCount = 2; List<float[]> inputs = new List<float[]>(); for (int i = 0; i < inputCount; i++) { float[] arrI = new float[testSize]; inputs.Add(arrI); } //an array for our outputs int outputCount = 1; List<float[]> outputs = new List<float[]>(); for (int i = 0; i < outputCount; i++) { float[] arrC = new float[testSize]; outputs.Add(arrC); } Random rand = new Random(); DateTime start = DateTime.Now; TimeSpan testLength = new TimeSpan(0, 0, lengthOfTestSeconds); List<float[]> arrays = new List<float[]>(); while ((DateTime.Now - start) < testLength) { arrays.Clear(); for (int i = 0; i < testSize; i++) { for (int l_inputs = 0; l_inputs < inputCount; l_inputs++) { inputs[l_inputs][i] = (float)(rand.NextDouble() * 100); } for (int l_outputs = 0; l_outputs < outputCount; l_outputs++) { outputs[l_outputs][i] = 0.0f; } } for (int i = 0; i < inputCount; i++) { arrays.Add(inputs[i]); } for (int i = 0; i < outputCount; i++) { arrays.Add(outputs[i]); } //t2k.performCalculation(testSize, ref arrays, inputCount, outputCount); for (int i = 0; i < inputCount; i++) { m_buffers[i] = new ComputeBuffer<float>(localContext, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrays[i]); } for (int i = inputCount; i < inputCount + outputCount; i++) { m_buffers[i] = new ComputeBuffer<float>(localContext, ComputeMemoryFlags.WriteOnly, arrays[i].Length); } for (int i = 0; i < inputCount + outputCount; i++) { m_computeKernel.SetMemoryArgument(i, m_buffers[i]); } m_queue.Execute(m_computeKernel, null, new long[] { testSize }, null, null); for (int i = inputCount; i < inputCount + outputCount; i++) { arrays[i] = m_queue.Read(m_buffers[i], true, 0, testSize, null); } for (int i = 0; i < inputCount + outputCount; i++) { m_buffers[i].Dispose(); } //Console.WriteLine("{0})", result); result++; //GC.Collect(); //GC.WaitForPendingFinalizers(); } return (float)testSize * result / lengthOfTestSeconds; } } }
class 2 is called Test2Kernel.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Cloo; namespace OpenCLTesting { class Test2Kernel { ComputeContext m_context; ComputeDevice m_device; ComputeProgram m_computeProgram; ComputeKernel m_computeKernel; ComputeBuffer<float> tempBuffer; ComputeCommandQueue m_queue; //ComputeBuffer<float>[] m_buffers; List<ComputeBuffer<float>> m_buffers; string m_kernelSource; string m_kernelName; /// <summary> /// Constructor prepares a device given the the kernel information /// </summary> /// <param name="context"></param> /// <param name="device"></param> /// <param name="kernelSource"></param> /// <param name="kernelName"></param> public Test2Kernel(ComputeContext context, ComputeDevice device, String kernelSource, String kernelName) { m_context = context; m_device = device; m_kernelSource = kernelSource; m_kernelName = kernelName; initialize(); } /// <summary> /// pull out the instantiation of everything /// originally thought to 're-initialize' everything if certain conditions /// arize during execution /// </summary> private void initialize() { m_computeProgram = new ComputeProgram(m_context, new string[]{m_kernelSource}); m_computeProgram.Build(null, null, null, IntPtr.Zero); m_computeKernel = m_computeProgram.CreateKernel(m_kernelName); m_queue = new ComputeCommandQueue(m_context, m_device, ComputeCommandQueueFlags.None); //m_buffers = new ComputeBuffer<float>[3]; m_buffers = new List<ComputeBuffer<float>>(); } /// <summary> /// this perofrms the actual construciton of our compute buffers /// and then performs the calculation. The only Cloo items that /// are being reset each time are the ComputeBuffers. /// /// Have tried several ways of storing the buffers 'as array' /// 'as a list' but each way we still end up with a rather nasty /// memory leak /// </summary> /// <param name="count"></param> /// <param name="arrays"></param> /// <param name="inputCount"></param> /// <param name="outputCount"></param> public void performCalculation( int count, ref List<float[]> arrays, int inputCount, int outputCount) { //add our 'input' compute buffers to the compute buffer list for (int i = 0; i < inputCount; i++) { tempBuffer = new ComputeBuffer<float>(m_context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrays[i]); m_buffers.Add(tempBuffer); //m_buffers[i] = new ComputeBuffer<float>(m_context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrays[i]); } //add our 'output' compute bufferes to the main compute buffer list for (int i = inputCount; i < inputCount + outputCount; i++) { tempBuffer = new ComputeBuffer<float>(m_context, ComputeMemoryFlags.WriteOnly, arrays[i].Length); m_buffers.Add(tempBuffer); //m_buffers[i] = new ComputeBuffer<float>(m_context, ComputeMemoryFlags.WriteOnly, arrays[i].Length); } //map our compute buffers to the kernel for (int i = 0; i < inputCount + outputCount; i++) { m_computeKernel.SetMemoryArgument(i, m_buffers[i]); } //execute our kernel m_queue.Execute(m_computeKernel, null, new long[] { count }, null, null); //read from each output buffer and save it into our array reference for (int i = inputCount; i < inputCount + outputCount; i++) { arrays[i] = m_queue.Read(m_buffers[i], true, 0, count, null); } //dispose of each buffer for (int i = 0; i < inputCount+outputCount; i++) { m_buffers[i].Dispose(); } m_buffers.Clear(); //tempBuffer.Dispose(); } } }
But basically when you run this it will iterate through your platforms (which is always just one as far as i know) collecting contexts and devices for that context. It will then perform the benchmark on each device. The benchmark is set to take 60 seconds and if you have your taskmanager up you can see all your memory slowly creep away. I've also left in some commented code so you can see different things i've tried.
I just can't seem to find what could be leaking or a way to make it stop.
I guess a pertinent question would be, is there a way to clear the memory used by a device that i have not noticed yet?
Thanks for any help.
-Cody


Comments
Re: CLOO: Memory Leaking Troubles
So i've been messing with this tonight after work and i went ahead and refined it a lot and cleaned up some of the unimportant code. I was able to move the buffer creation into the init function and also the setMemArg into init. I also had to change my ComputeMemoryFlags.CopyHostPointer to ComputeMemoryFlags.UseHostPointer to get this thing working how i think it should, but in debugging i seem to have stumbled into something weird where i have no idea what is happening.
I currently have the program setup to run for 1 second and i have the buffer size set to 10. After each kernel call i print off index 0 for the inputs and outputs. The first few lines that are print to the console appear to be correct but after a 10 - 15 prints it starts to repeat the result.
I've attached the project this time instead of pasting the code as that may be more useful. But any ideas on what i'm doing incorrectly are appreciated.
Re: CLOO: Memory Leaking Troubles
So it seems that in the latest project i submitted if you set the "testSize" to something really large (100000) and the time to 60 seconds it performs exactly as desired with no memory leak.
Actually did a little more digging just now and any value below 6755 will return bad result eventually (starts returning zeros if left to run for a long time.) Any value 6756 and over returns proper results.
Re: CLOO: Memory Leaking Troubles
Other observations (sorry for using this as a blog of sorts):
I switched from my Intel processor based netbook (just an atom processor) to my desktop (AMD quad core with 3 graphics adapters) and the only device that performed the benchmark properly was the cpu. Each of the video cards always returns the same value for every result, and there do not seem to be any exceptions being thrown (i'm not even attempting to catch any, has CLoo abstracted away the ability to check error codes after kernel compilation?).
Another interesting result though is that no more than one core of the cpu is ever utilized. I double checked this on my work pc as well (dual quad core xeon) and the kernel only ever attempts to execute on one core. For some reason i recall another test of mine working across all cores.
Is anyone able to utilize CLoo to completely saturate a device? I would be interested in testing more complex kernels across my diverse set of platforms. I would love to use CLoo to do some AI/Swarm based learning but if i can't get the basic tests to completely utilize a device I really am apprehensive about putting the time into converting my PSO/pathfinding algorithms to openCL using CLoo.
Re: CLOO: Memory Leaking Troubles
Hi,
sorry for being a bit unresponsive. My real life is a bit challenging at the moment so I'm not able to prompt reply all the time.
Some observations:
1) I did some testing using your original code and I'm pretty sure there are no leaks in the sense of opencl objects. Everything that's created is disposed of properly, either manually or automatically. I run the program for more than 30 minutes with hundred thousands of objects getting created and destroyed and the counts always matched. Having that out of the way I can now focus on other things like GC handles and the like. No tests have been run on your other code yet. Hopefully, I'll be back with the results in a couple of hours.
2) Cloo introduces no restrictions over raw OpenCL. It is its main design goal so if you're having problems then you've possibly hit a bug or a missing bit. In that case I'll do my best to fix it.
Re: CLOO: Memory Leaking Troubles
I'm also using Cloo to do AI and I have no problems utilizing both of my cores in my dual core AMD CPU to 95-100%. And this is for a very complex kernel with lots of function calls that can take 20+ minutes. The memory usage is stable since I only allocate memory once and it's a console program and exits at the end of the run.
I ran the code from the OP and I do appear to get an ever increasing usage of memory and there isn't anything wrong with the code that I can see. More debugging will be needed.......
Re: CLOO: Memory Leaking Troubles
I also had to change my ComputeMemoryFlags.CopyHostPointer to ComputeMemoryFlags.UseHostPointer to get this thing working how i think it should, but in debugging i seem to have stumbled into something weird where i have no idea what is happening.
UseHostPointermeans OpenCL will operate in place (i.e.: using the array you've provided). That said, the array should be pinned otherwise you may experience unexpected behaviour and random access violations!Re: CLOO: Memory Leaking Troubles
I also had to change my ComputeMemoryFlags.CopyHostPointer to ComputeMemoryFlags.UseHostPointer to get this thing working how i think it should, but in debugging i seem to have stumbled into something weird where i have no idea what is happening.
UseHostPointermeans OpenCL will operate in place (i.e.: using the array you've provided). That said, the array should be pinned otherwise you may experience unexpected behaviour and random access violations!Interesting, i had not heard of such a thing in C# before (i had to google it to see what that was.) But that would explain why it would seem that randomly into the program the values would all default to a bad value. So using something like GCHandle pinnedRawData = GCHandle.Alloc(foo, GCHandleType.Pinned); could then be passed to the compute buffers, or is it just at some point after setting up my
List <float []> foo;that i need to pin it so that the GC doesn't move it around? I'll experiment in a little bit with this, still at work.Re: CLOO: Memory Leaking Troubles
You need to pin the arrays i.e.: all the
float[]s because those are used as buffer's content, not the list. Basically you need to prevent the GC move the data while OpenCL accesses them.It's actually what I'm fighting with most of the time while putting together Cloo :)
Re: CLOO: Memory Leaking Troubles
Okay, I've determined the cause of the bug using the code in the original post of this thread. It's a weird bug, that's for sure. In the original code, where the main while loop is:
If, instead you change it to a for loop, like:
Then you will get no more memory leaks. The task manager shows that the memory use is now absolutely stable! Why this is so, I have absolutely no idea! In my OpenCL code, I've always used the for loop method instead of the while loop method so I haven't encountered this bug before.
It would certainly be interesting if someone could figure out the reason that a while loop causes a memory leak and a for loop doesn't.
Re: CLOO: Memory Leaking Troubles
After a little more tinkering, I've now determined that it isn't the while loop itself that is the cause of the bug, but rather the (DateTime.Now - start) part that is causing the memory leak. The exact reason for this is still unknown.
The (DateTime.Now - start) part will cause a memory leak if used either in
or
I think the lesson here is to avoid using DateTime and TimeSpan in the comparators. However, the following works fine:
Re: CLOO: Memory Leaking Troubles
DateTIme is either using a method that allocates memory (either the subtraction operator or the comparison operator) or it is cast into some interface that causes boxing. Maybe .Net Reflector can provide a hint regarding the cause.
Re: CLOO: Memory Leaking Troubles
So going back to the OP. Was there possibly any other changes made to the code? As i just commented out the while loop and replaced it with a for loop:
//while ((temp = DateTime.Now - start) < testLength)
for(int f = 0; f < 10000; f++)
And i still see the memory increase at a rapid rate.
Re: CLOO: Memory Leaking Troubles
So going back to the OP. Was there possibly any other changes made to the code? As i just commented out the while loop and replaced it with a for loop:
//while ((temp = DateTime.Now - start) < testLength)
for(int f = 0; f < 10000; f++)
And i still see the memory increase at a rapid rate.
Maybe I spoke too soon........ Ok, back to debugging.
Re: CLOO: Memory Leaking Troubles
I have just tried running the VectorAdd sample in Cloo 0.5.1 and looping that a million times and I am also experience a memory leak there so this appears to be a bug in Cloo or OpenCL. Looks like this won't be an easy bug to figure out.
Re: CLOO: Memory Leaking Troubles
A quick profiling run should identify the cause of the leak. I might be able to do that tomorrow.
Re: CLOO: Memory Leaking Troubles
So an interesting scenarios i'm seeing, I think it might be beneficial to get everyones computer setup to help debug this as i'm seeing different results across all my platforms.
Just going to enumerate my pc's so i can reference them easily.
PC1: Vista 64bit AMD cpu ATI gpus
PC2: Windows 7pro 32bit Intel cpu
PC3: Windows XPpro 32bit Nvidia gpu
PC4: Windows HPC edition 64bit Intel cpu (but at work and don't have access right now)
I've attached the latest version of my little wanna be benchmark that uses the pinned memory and on PC1 the cpu device passes the tests while the gpus fail, during this test memory only increases when moving from device to device as my test2kernel doesn't implement disposable and i don't clean up anything at the end of the test.
On PC2 there is no memory leak and the cpu calculates everything correctly.
On PC3 i can observe the memory leak and the gpu does not calculating anything properly.
Haven't checked PC4 but i imagine it will work as it's similar OS and cpu only.
PC3 originally had an error when trying to run and it was the testSize set too big. Decreasing it from 100000 to 10000 allowed it to execute. And if you want to see the output of addition tests that did not pass uncomment line 138 from Program.cs and whenever there is an error in calculation it will tell you about it.
Re: CLOO: Memory Leaking Troubles
Cody, what did you change from the code in your original post to make the memory leak go away? This is baffling me.
Re: CLOO: Memory Leaking Troubles
There are a few differences from the OP to the previous attachment. But it revolved around moving the creation of the memory buffers from the loop to init in test2kernel.cs. And doing this also relies on pinning the memory so the reference given to the compute buffers doesn't change. However i have yet to see any successful calculations with this version on a gpu.
Re: CLOO: Memory Leaking Troubles
Hmmm, something just isn't working here. I've taken the VectorAdd sample from Cloo 0.6 and just tried looping it a 100000 times as follows:
This results in an ever increasing usage of memory, which it shouldn't be doing or I would run out of memory very quickly. Can someone try running the code attached and see why there appears to be a memory leak for what should be a straight forward demo? I'm beginning to doubt how I ever got my AI kernels to work since I use the same for loop technique with them.
Re: CLOO: Memory Leaking Troubles
Yeah i'm stumped on that one. I've tried pinning the arrays and UsingHostPointer and i can't get them to stop leaking. I'm amazed that mine doesn't leak now... considering i based it off these examples. I think i need to learn to use one of these fancy profilers.
Re: CLOO: Memory Leaking Troubles
Test subject:
Maybe we should start with a simple example. Given that the following code also leaks it's a good candidate for the job.
Testing method:
Putting different pieces of the code inside a loop will do.
Testing shows there's nothing leaking until
program.Build(...). Fiddling around with its internals shows that the cause for the leak isCL10.BuildProgram. Unless the DllImport is wrong (you never know), we've hit a bug inside OpenCL.This doesn't mean the rest of the commands are ok. I'm just not there yet.
Edit:
program.CreateKernel()leaks even though every created kernel is properly GC collected.queue.Execute()leaks.queue.Read()looks like leaking for small buffers only. For huge ones the memory (obviously) fluctuates wildly but I cannot tell whether it is really leaking.Re: CLOO: Memory Leaking Troubles
May I suggest a bit of out of the box thinking? The web is littered with threads like this one:
http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=122161
I'll have to dig into C/C++.
Re: CLOO: Memory Leaking Troubles
That thread hadn't been updated recently on the amd board so i went ahead and resurrected it.
Re: CLOO: Memory Leaking Troubles
Hmmm, a memory leak like this should be a show stopper for OpenCL in the ATI Stream SDK. Can anyone confirm if this memory leak also occurs on Nvidia implementations?
@Nythrix, what tool are you using to detect the memory leaks?
Re: CLOO: Memory Leaking Troubles
I'm just using the task manager. I'm not sure we can use OpenCL profilers with managed apps. ATI's throwing some errors at me, and nVidia's didn't install at all.
After playing with some native code (see attachment) I can safely say that part of the leakage is caused by the drivers (program.Build() on both ATI and nVidia, program.CreateKernel() on nVidia only).
However, leaks at queue.Execute() and queue.Read() are not caused by the drivers. I get no errors, no exceptions and GCHandles are freed properly yet something is wrong with Cloo.
I need some tea. And lot's of imagination...
Re: CLOO: Memory Leaking Troubles
(offtopic: fixed file upload permissions to allow c/c++ files).
Re: CLOO: Memory Leaking Troubles
(offtopic: fixed file upload permissions to allow c/c++ files).
Thanks. I was surprised it didn't let me :)
Re: CLOO: Memory Leaking Troubles
Just a little anecdote, but in my AI OpenCL code using Cloo, I get random crashes every so often during the middle of a run that can take minutes. It is weird because I can't reproduce it each time even though my code is totally deterministic and I use my own random number generators. Most of the time, the simulation will run fine and end correctly, but every so often, the code will crash and say it tried accessing memory it can't access or I get an OpenCL error message that doesn't point to anything in my code. I don't believe I'm trying to access memory out of bounds in my code so I'm wondering if this could be related to this memory leak problem. I'm going to be modifying my OpenCL kernel so I can run it in a C++ compiler to see if it is my code or something wrong with OpenCL\Cloo. It will take a little while to do...
Oh, and after looking closer at the memory usage of my AI code, it does exhibit the memory leak problem when the kernel is executed multiple times, but leaks much less than the VectorAdd demo I posted. It goes from about 20 megs of ram usage to about 100 megs after 10000 executions of my AI kernel.
Re: CLOO: Memory Leaking Troubles
Yeah, when i reported that last program i posted did not leak that turned out to be not true. Certain values for the size of the test don't seem to leak but really large values and really small values do leak across all my platforms.
Re: CLOO: Memory Leaking Troubles
I've been able to trace the cause of the problem back to the flat CL bindings. And it's not only Cloo's bindings, OpenTK's suffer from the same problem. My guts tell me we're somehow messing with marshaling and that is the root of all evil.
This is the C definition of a typical leaking function:
which translates to:
Is anyone able to see through this?
I've tried different signatures, especially replacing pointers with arrays, to no effect.