nythrix's picture

Cloo - Compute Language, Object Oriented

The first testing release is out! Grab a copy and test your OpenCL installation.

Please report any findings!

P.S: The support for images is a work in progress so any related API method will punch you with a NotImplementedException. You don't have to report those.


Comments

the Fiddler's picture

I hope to have the image-related wrappers fixed by tomorrow.

nythrix's picture

Thanks. No rush, though. I can get busy elsewhere in the code. As for you, I guess the priority ATM is OpenTK 1.0.

carga's picture

Hello,

I would like to use kernel from NBody demo with signature:

kernel void nbody_sim(
    global float4* pos,
    global float4* vel,
 
    int numBodies,
    float deltaTime,
    float epsSqr,
 
    local float4* localPos)

What C# type should be mapped to float4*? Is it possible to use float[4, SIZE]? What type should I provide to ComputeBuffer?

Is there C# struct in Cloo, that is designed to be mapped to vector types?

Best regards,
Anton.

nythrix's picture

You can use any struct that has exactly 4 float fields:

struct Vector4f
{
    float x;
    float y;
    float z;
    float w;
 
    // methods here ....
}

If you don't have such a structure in your project you can use OpenTK.Vector4 instead.

carga's picture

Thank you for OpenTK.Vector4 idea!

Currently I do the following:

Vector4[] pos = new Vector4[count];
Vector4[] vel = new Vector4[count];
Vector4[] buf = new Vector4[count];
 
ComputeBuffer<Vector4> a = new ComputeBuffer<Vector4>(context, MemFlags.MemReadWrite | MemFlags.MemCopyHostPtr, pos);
ComputeBuffer<Vector4> b = new ComputeBuffer<Vector4>(context, MemFlags.MemReadWrite | MemFlags.MemCopyHostPtr, vel); 
ComputeBuffer<Vector4> c = new ComputeBuffer<Vector4>(context, MemFlags.MemReadWrite | MemFlags.MemCopyHostPtr, buf);
 
kernel.SetMemoryArg(0, a);
kernel.SetMemoryArg(1, b);
kernel.SetValueArg(2, count);
kernel.SetValueArg(3, 0.1f);
kernel.SetValueArg(4, 1e-6f);
 
   kernel.SetMemoryArg(5, c);

When executing last method (argument with index 5), I receive ComputeException with ErrorCode.InvalidArgValue.

If I do not initialise that parameter, I receive ComputeException with ErrorCode.InvalidKernelArgs.

How should I initialize kernel argument marked as local float4* localPos?

Thank you in advance,
Anton.

nythrix's picture

I've never tried setting a local argument. Chapter 3.3 of the OpenCL specs:

Local Memory: A memory region local to a work-group. This memory region can be
used to allocate variables that are shared by all work-items in that work-group. It may be
implemented as dedicated regions of memory on the OpenCL device. Alternatively, the
local memory region may be mapped onto sections of the global memory.

Table 3.1 states that you cannot access (read or write) such arguments. You can only allocate them. Try removing the MemFlags.MemReadWrite flag when you create c and see what happens.

Edit: You can also create buffers without specifying an array:
ComputeBuffer<float> c = new ComputeBuffer<float>( context, flags, count );

carga's picture

I've tried
ComputeBuffer<Vector4> c = new ComputeBuffer<Vector4>(context, MemFlags.MemUseHostPtr, buf);
but ErrorCode.InvalidArgValue

then I replaced "local" with "global" and it works now.

Thank you very much!
Anton.

viewon01's picture

No news about the "local" problem ?

Thx

nythrix's picture

This is what I've found in the OpenCL specs:

If the argument is declared with the __local qualifier, the entry arg_value must be null

However, the current implementation will probably crash if you try to kernel.Set*Arg( index, null );. I'm working on a fix. I will also post a howto on setting kernel arguments. It's a dark area where I get lost too.

carga's picture

Thank you very much for new release (0.3.1) and for a new test (KernelArgsTester).

In my environment (Intel CPU) compilation of the kernel fails with following error messages:

------------------| Start Kernel args test |------------------
C:\Users\xxx\AppData\Local\Temp\OCL9414.tmp.cl(4): error: kernel pointer
          arguments must point to addrSpace global, local, or constant
      kernel void k03(          image3d_t img ) {}
                  ^
 
C:\Users\xxx\AppData\Local\Temp\OCL9414.tmp.cl(14): error: a parameter
          cannot be allocated in a named address space
      kernel void k11( global   image3d_t img ) {}
                       ^
 
C:\Users\xxx\AppData\Local\Temp\OCL9414.tmp.cl(14): error: kernel pointer
          arguments must point to addrSpace global, local, or constant
      kernel void k11( global   image3d_t img ) {}
                  ^
 
3 errors detected in the compilation of "C:\Users\xxx\AppData\Local\Temp\OCL94
14.tmp.cl".
    image3d_t im
-------------------| End Kernel args test |-------------------

That's ok, but just for your info...

Are you going to implement some kind of automatic .NET to OpenCL kernel translation? There is a Brahma project http://brahma.ananthonline.net/ with some steps toward similar goal, but the project is completely stalled now. =((( It was an attempt to translate general computational LINQ expression to its parallel equivalent and to execute it on GPU using some deprecated DirectX GP GPU set of libraries.

I do not feel myself comfortable enough with writing parallel expressions on LINQ, but it is better to LINQ then to OpenCL. =) I mean it is not my first dream to study new deeply graphic oriented C-like dialect.

Have a fast code!
Anton.
http://kyta.spb.ru

nythrix's picture

As I pointed out before, defining/seting kernel args is a bit obscure. It would have been great if Khronos included a summary table with clSetKernelArgs.

With KernelArgsTester I set out to try every possible combination of global/constatnt/local/none with simple type/image/sampler/buffer. Then I commented out the ones that don't compile. I'll recheck this example when I get home.

The LINQ to OpenCL conversion is quite an interesting idea. It is definitely worth considering. However it requires three things:
1) Me learning enough LINQ to tell whether this is possible at all. Probably yes, but you didn't hear me promise anything.
2) Cloo (and possibly the whole Xloo/OpenTK 2.0) will have to target C# 3.0. Which hasn't been discussed yet.
3) Enough time for me to actually implement the thing. Given my ongoing exams season, that's not to happen until February. Or even spring.

carga's picture
nythrix wrote:

With KernelArgsTester I set out to try every possible combination of global/constatnt/local/none with simple type/image/sampler/buffer. Then I commented out the ones that don't compile. I'll recheck this example when I get home.

No problem. There are chances that this is problems on my side: you provide kernels for nVidia implementation and I try to compile it using ATI's 2.0-beta4 driver.

nythrix wrote:

The LINQ to OpenCL conversion is quite an interesting idea. It is definitely worth considering. However it requires three things:
1) Me learning enough LINQ to tell whether this is possible at all. Probably yes, but you didn't hear me promise anything.

I do not advertise LINQ (just mentioned Brahma project as reference): a) MS announced PLinq already; b) it is hard to write general computations in this syntax. We all like conditions and loops and all other procedural benefits C-like language gifts us. =)

Hmm!.. I wonder to start with System.Expression-to-Kernel conversion. System.Expression trees are very general way to represent general program tree with all its conditional branching and loops. LINQ is nomore then just a short way to write some complicated Expression tree...

But at the end of the game I dream to have some stand-alone .NET class written completely in C# [probably] without any external dependencies, [probably] completely covered with usual unit-tests. This class performs just one CPU intensive task and it _IS_ able to do the job. But it is too slow. Then I dream this class to be able to automatically analyze its own IL and to emit corresponding OpenCL kernel. After that (thanks to Cloo) it is just a few seconds to get 10-100 times speedup with OpenCL-on-CPU or even 100-1000 times speedup with OpenCL-on-GPU! Does anybody have robust IL-to-OpenCL translator? =DDD

nythrix wrote:

2) Cloo (and possibly the whole Xloo/OpenTK 2.0) will have to target C# 3.0. Which hasn't been discussed yet.

It's a serious point. =|

nythrix wrote:

3) Enough time for me to actually implement the thing. Given my ongoing exams season, that's not to happen until February. Or even spring.

Even a more serious point. But you are ready to show them the excellence, aren't you? ;-)

Have a fast code!
Anton.
http://kyta.spb.ru