Just found a good tutorial on how to setup a Gauss filter kernel here:
OpenGL Bloom Tutorial
The interesting part is that he shows a way on how to generate the offset values and he also mentions a trick that I use for a long time. He reduces the filter kernel size by utilizing the hardware linear filtering. So he can go down from 5 to 3 taps. I usually use bilinear filtering to go down from 9 to 4 taps or 25 to 16 taps (with non-separable filter kernels) ... you got the idea.
Eric Haines just reminded me of the fact that this is also described in ShaderX2 - Tips and Tricks on page 451. You can find the -now free- book at
BTW: Eric Haines contacted all the authors of this book to get permission to make it "open source". I would like to thank him for this.
Check out his blog at