Old School GPGPU is Frustrating

So I burned the evening on trying to get a working GPGPU implementation for my desktop/workstation and laptop.

Started off with PyOpenCL, but wound up blocked on that due to libcffi failing with some missing headers. So switched back to playing with raw GPGPU on GLSL. The actual setup/GL part went well, but when I did my first spike test, a simple dot-product for a 128 float input against a 128x128 matrix things started going pear-shaped. I can sample the samplerBuffers for both the input and the matrix and everything is working fine, but whenever I enable the for-loop to accumulate the result I'm getting what looks like overflow or similar errors (basically the first product renders, but nothing else does). More puzzling, even if I enable the accumulator loop and set the input to all 1.0 values, the loop produces the same result, that is, with weights being an 128x128 identity matrix:

accumulator = 0.;
for (int i=0; i< 128; i++) {
    accumulator += (
        texelFetch(weights, x_offset + i).r
        //*
        //texelFetch(inputs, i).r
    );
};

if I uncomment the multiply and texelFetch for inputs (which is an array of 1.0 values) the output becomes 1 for the first output, but 0 for the rest, while if I leave those two lines commented I get the expected 1.0 values for all outputs. That's rather frustrating mostly because it makes precisely no sense unless it's a very low level effect (i.e. something like a floating point accuracy issue).

Anyway, I suppose I'll go back to it some other day. I was hoping to do some actual playing with networks, but it seems that won't happen with this approach any time soon.

Comments

Comments are closed.

Pingbacks

Pingbacks are closed.