Diary of a Graphics Programmer

Ha ha that would be nice!

2021-06-12T11:00:47.005-07:00

Ha ha that would be nice!

You might want to fix up your barycentric derivati...

2021-04-27T10:05:33.858-07:00

You might want to fix up your barycentric derivative computation.
https://github.com/ConfettiFX/The-Forge/blob/master/Examples_3/Visibility_Buffer/src/Shaders/D3D12/visibilityBuffer_shade.frag#L45

I saw the sample break the texturing many times with clipped triangles that were intersecting the camera's near plane or going behind the camera.

The reason would be that once the post-transform W coordinate gets close to zero you get huge projections with coordinates really far away or worse, you get a negative W and the triangle cannot project linearly anymore (it breaks in half and folds over on itself).

The possible fixes are:
- you clip the triangles against the near plane into [0,2] output triangles yourself
- if any of the W coordinates are not positive enough, then subgroup vote and use an expensive function which computes the derivatives from 3D coordinates the long way

Its also probably not the best precision to interpolate everything from the difference between the projected provoking/first vertex and the pixel coordinate, mostly for the W-related precision issues.

Furthermore it seems to be that if you're interpolating more than 1 attribute, it would make sense to precompute the barycentrics from the derivatives, rather than apply the derivatives onto the attributes to get a derivative of the atrtibute every single time.

Coming in 2021, Wolfgang starts a band!

2021-01-08T12:38:35.152-08:00

Coming in 2021, Wolfgang starts a band!

I'd like to join MJP's question. Is by DX...

2018-09-12T13:53:47.096-07:00

I'd like to join MJP's question.

Is by DXR performance it is meant the performance of Microsoft's (reference?) implementation? Or it's something else?

I did compare it to Microsoft's implementation, and indeed it seems quite faster. But to be frank, I'm more interested in vendor DXR implementations rather than Microsoft's.

Yep that is a bit confusing. As you say, DXR curre...

2018-09-09T14:39:23.305-07:00

Yep that is a bit confusing. As you say, DXR currently only runs on GeForce Volta/RTX. On everything else it is very slow.
I hinted towards the fact that even if you compare DXR on RTX GPU with our handwritten implementation, it should look favorable.
Don't take my word, download the source code and try it ... you can also try on many platforms :-)

Let me know how it goes.

"When using DXR, we expect Hybrid Ray Traced ...

2018-09-07T15:43:24.740-07:00

"When using DXR, we expect Hybrid Ray Traced Shadows to run on all hardware apart from the GeForce RTX with subpar performance"

I'm a bit confused by this statement. At the moment, Nvidia RTX/Volta is the only hardware that actually supports using DXR. Are you suggesting that the other vendors are going to release DXR implementations that are sub-optimal compared to what you have in Confetti?

What are your thoughts about Apple's approach ...

2018-06-05T20:03:54.488-07:00

What are your thoughts about Apple's approach to ray tracing in the form of a Metal Performance Shader?

https://developer.apple.com/documentation/metalperformanceshaders/metal_for_accelerating_ray_tracing

Yeah, these are all fair points I didn't think...

2018-04-14T06:46:53.034-07:00

Yeah, these are all fair points I didn't think about before, but agree on. I think the best solution would be exposing the hardware for low-level developers, but having the more blackboxy options for lazy people like me, who would really like to have a standard and tested fast BVH acceleration structure with all the necessary shader/ray sorting, fast rebuilding and such.

Completely agree on the triangle filtering. I thin...

2018-04-13T09:10:03.875-07:00

Completely agree on the triangle filtering. I think the triangle filtering pass is very valuable. My question is mostly on the vis+shading pass after the triangle filter. It seems to me that a forward+ should give you the same bandwidth saving while avoid manual derivative, vertex fetch, etc in shading pass.

We do a Z pre-pass but we do also a triangle pre-p...

2018-04-12T22:16:51.337-07:00

We do a Z pre-pass but we do also a triangle pre-pass (filtering and then writing into the Visibility Buffer) at the same time. Having one layer of triangles and one layer of depth is optimal when it comes to Forward+ lighting ... it solves the problem of Forward+ ... having a lot of triangles.

Great article on an interesting technique! Just s...

2018-04-11T14:45:58.079-07:00

Great article on an interesting technique!

Just some thoughts. If the major motivation is to save memory bandwidth of G-Buffer, why not just using a Z-prepass + forward rendering + clustered lighting after the triangle filtering? Z-prepass removes pixel overdraw and has the same cost as the vis buffer generation, while forward pass give you HW supported derivative, interpolation, vertex fetch and reduced vertex processing. The downside seem to be extra geometry killed by Z, and also worse light list cache performance. But your triangle filtering should already remove a lot of triangles, and shading pass should be pixel shader bound. Have you guys did a comparison with this kind of forward path?

Great post Wolfgang. I think I'm broadly in ag...

2018-04-06T18:10:59.023-07:00

Great post Wolfgang. I think I'm broadly in agreement. Personally I'd much rather see the hardware blocks that accelerate Ray Tracing exposed from DirectX and other similar APIs, and then have some wrapper APIs on top that you can opt into that help you manage the complexity (If you want that!).

On the hardware side I think there are two main things that would help. One is some hardware to accelerate the BVH/Triangle intersection. The other is something to help manage the large unsorted "bag" of shader threads that Ray Tracing tends to produce. In my head I see this is some sort of hardware sorter, so your intersection hardware spits out "any hit" requests to invoke shaders, that then get buffered up somewhere in hardware, and periodically spat out as an ordered list, ready to be easily dispatched by the hardware. Same thing happens again when you invoke the closest hit shader.

This sort of hardware block to allow fast coalescing of micro thread dispatches is not just useful for tracing rays though, it's something that comes up a lot in GPGPU, and so having something there that was open and designed for helping to flexibly manage micro dispatches would be a big win all around. Ideally you want to be able to dispatch multiple threads with this if you so desire, not just a single thread "per ray". Also it would be nice if this was something that didn't have to feed straight into a dispatch if you didn't want it to. Seeing as RT suffers from so many issues to do with divergence once you're not dealing with primary or shadow rays, it would be good if the application had some hope of trying to do further processing/sorting/black magic on it's secondary rays to try to reduce this. Then you might be able to even start using LDS and cross lane fun to accelerate thing (if its not hidden from you by the API..).

Those are all reasonable concerns. The opaqueness ...

2018-04-06T18:00:15.155-07:00

Those are all reasonable concerns. The opaqueness of the acceleration structures is a little worrying. Hopefully more details will come with time.

That said, we all work with a black box with regard to rasterization and the shader pipeline on modern GPUs, and I don't see that as a negative. Sure there are some problems, but they've abstracted away a lot of details that we don't need to worry about anymore, and I see that as a net benefit. I feel that DXR can possibly get us to that point, that's assuming that the accelerated structures and traversal is something we all agree on.

The question I have is, why do we need new hardware for this? What specifically needs to be accelerated that couldn't already be done with existing compute hardware.

Ideally this ray tracing framework would have been implemented entirely as compute libraries that we could opt in to, and if your application doesn't want to use any ray tracing, then those compute resources could be used for other tasks. Then again, maybe I'm just wanting someone to do my work for me.

Thanks for your opinion! When I read it I thought:...

2018-04-06T00:28:06.408-07:00

Thanks for your opinion! When I read it I thought: Wow, I could‘t formulate my concerns any better! With all the hype around ray tracing we really don‘t need a black box API that hides the acceleration structures from the user. I wrote a path tracer in Cuda (for medical visualization) and what I really missed was a real cross-platform Compute API on the level of Cuda (bindless texture objects, device pointers in structs, ...), not a ray tracing API.

Thank you! Fixed!

2018-04-03T08:19:37.876-07:00

Thank you! Fixed!

http://www.cs.unc.edu/~olano/papers/2dh-tri/2dh-tr...

2018-04-02T20:04:20.215-07:00

http://www.cs.unc.edu/~olano/papers/2dh-tri/2dh-tri.pdf - this link does not work.

Awesome! I will check this out. Thanks for the goo...

2018-03-31T13:30:10.014-07:00

Awesome! I will check this out. Thanks for the good feedback.

This is a great write-up on this technique, thank ...

2018-03-31T12:19:59.250-07:00

This is a great write-up on this technique, thank you for sharing!

For decals, you can also bin the decals (in tiles, clusters, or some other data structure) and apply them during your shading pass in the same way you would apply lights. Doing this requires either atlasing your decal textures or having bindless texture access, but that's already a requirement for the visibility buffer approach. On the positive side, it prevents you from having to write/blend your decal properties into a G-Buffer, since the blending is done in registers. You're also not restricted to fixed-function blending equations, so you can easily do neat things like height-based blending. DOOM and Call of Duty: IW have already shipped with similar approaches, and we do something similar in our engine. I also implemented it in this code sample: https://github.com/TheRealMJP/DeferredTexturing

How many hours one class lasts?

2018-02-28T04:25:19.325-08:00

How many hours one class lasts?

For the last 20 years we've been living in a R...

2017-11-07T09:28:14.346-08:00

For the last 20 years we've been living in a Rec.709 world where RGB "just works". You take a picture on your cellphone, move it to your PC, cast it to your TV screen and it just works. That's because all of those devices have been Rec.709 (ok, flat panels use BT1886 to correct for gamma, but that's a small detail). Welcome to the bad old days of 1990's color displays and printing, the days before the HDTV standard. Remember picking your "color intent" before printing a document? They're back!

The problem is that the SDR signal is mapped according to the Rec.709 standard where the maximum intensity of your (1,1,1) pixel is 100 nits. But almost no TV or monitor you use today follows that convention, they on average use 300-nits for a (1,1,1) pixel and further "vividness" controls rotate the RGB colors out of the rec.709 colorspace into something more saturated. You are used to over-sugared content.

The Rec.2100 standard was designed to convert SDR signals to Rec.709 values, so you have to add back the sugar you are used to. Scale your SDR content to 300-nits and rotate your Rec.709 colors into Rec.2100 colorspace before encoding it to PQ for output - that's the basic transform that will solve the "too dim" or "too unsaturated" problems.

After that you can start to tame overbright pixels or start generating Wide Color Gamut values using WCG content, filmic curves and all the bells and whistles.

Sorry for the late reply. You can find articles on...

2015-10-22T14:51:50.778-07:00

Sorry for the late reply. You can find articles on "Visibility Buffer" based rendering from Intel and Christoph Schied. You might want to check them out. We -Confetti- will open-source a solution very soon.

Excuse me, could you tell what alternative approac...

2015-08-18T14:53:00.068-07:00

Excuse me, could you tell what alternative approach you can recommend instead of G-buffer?

Great article. Really interesting how you had to w...

2015-08-11T18:12:37.590-07:00

Great article. Really interesting how you had to work around FP exponent special cases.

Images are missing.

2015-07-30T00:52:46.493-07:00

Images are missing.

Images are missing

2015-07-30T00:52:13.207-07:00

Images are missing