Sunday, September 21, 2008

Shader Workflow - Why Shader Generators are Bad

[quote]As far as I can tell from this discussion, no one has really proposed an alternative to shader permutations, merely they've been proposing ways of managing those permutations.[/quote]

If you define shader permutations as having lots of small differences but using the same code than you have to live with the fact that whatever is send to the hardware is a full-blown shader, even if you have exactly the same skinning code in every other shader.
So the end result is always the same ... whatever you do on the level above that.
What I describe is a practical approach to handle shaders with a high amount of material variety and a good workflow.
Shaders are some of the most expensive assets in production value and time spend of the programming team. They need to be the highest optimized piece of code we have, because it is much harder to squeeze out performance from a GPU than from a CPU.
Shader generators or a material editor (.. or however you call it) are not an appropriate way to generate or handle shaders because they are hard to maintain, offer not enough material variety and are not very efficient because it is hard to hand optimize code that is generated on the fly.
This is why developers do not use them and do not want to use them. It is possible that they play a role in indie or non-profit development so because those teams are money and time constraint and do not have to compete in the AAA sector.
In general the basic mistake people make that think that ueber-shaders or material editors or shader generators would make sense is that they do not understand how to program a graphics card. They assume it would be similar to programming a CPU and therefore think they could generate code for those cards.
It would make more sense to generate code on the fly for CPUs (... which also happens in the graphics card drivers) and at other places (real-time assemblers) than for GPUs because GPUs do not have anything close to linear performance behaviours. The difference between a performance hotspot and a point where you made something wrong can be 1:1000 in time (following a presentation from Matthias Wloka). You hand optimize shaders to hit those hotspots and the way you do it is that you analyze the results provided by PIX and other tools to find out where the performance hotspot of the shader is.

21 comments:

  1. Why can't shader generators generate fast shaders?

    Compilers can generate fast code. The shader compilers especially work hard to optimize what you give them, and usually do a good job.

    Sure sometimes you have to go in and really dial a shader in performance-wise. It would be foolish to lock yourself into only using your ubermaterial or shader generator or what have you. But if you can write a system to deal with 90% of the work for you, isn't that worthwhile?

    It seems to me that shader generators are like scripting languages. You can get most of what you need done with them. The rest has to be done at a lower level, but that's ok, because you can then spend most of your time focusing on that since the other system deals with the rest.

    ReplyDelete
  2. I think we're tiptoeing around the fact that shaders can't make vectored function calls. Cg's so-called interfaces are a big fake, in the sense that the function calls are JIT flattened.

    The lack of vectored execution leads to shader proliferation leads to unpleasant bandaids like shader generators and ubershaders.

    I think we should be mentioning to TPTB that "exciting" GPU shader features and new stages are orders of magnitude less exciting than solving this issue.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hmm, your post sounds widely generalizing and assumptive. But it is an interesting topic.

    I would argue that shader permutation generation systems are really quite common for AAA next-gen titles. Just look at Unreal and the vast amount of licensees using that engine together with the graph-bases shaders and generated permutations of them (AFAIK).

    And regarding material variation / artist workflows and shader generators, having the ability to easily create domain-specific shading, lighting & compositing with rich high-level tools (wether graph-based or not) and freely both experiment and optimize content without having to be spoon-fed by a rendering programmer is something I know many value.

    In other words; having efficient & adaptable Workflows are generally more important than pure performance, esp. as of this and the coming generations.


    Oh and there are people proposing solutions to the "shader permutation hell" problem that pretty much everyone is facing, wether you create every single shader by hand or generate permutations.

    DX11 is a partial step in the right direction with static subroutines, but in my mind the full solution: dynamic subroutines (i.e. shader function pointers), will require more HW & API changes but is a logical & very plausible/practical extension.

    ReplyDelete
  5. Hello,
    There're some interesting posts on shader generator systems around these days.

    See the following post:
    Graphical shader systems are bad - http://realtimecollisiondetection.net/blog/?p=73

    Even though Unreal provides that system but it is more proper to flim industry or something that performance is not a priority. (yes, there might be some artist who even talented on shader programming though but it is not a common case :-)

    Doesn't it enough to have a simple system which combines fragment code written by manually described at here - http://www.talula.demon.co.uk/hlsl_fragments/hlsl_fragments.html

    ReplyDelete
  6. Um, Wolfgang, I hate to point this out, but übershaders and graphical material systems (both of which I presume fall into your "shader generators" category as they generate shader permutations) are provably used by several AAA teams. When you then claim that "This is why developers do not use them" in reference to these systems -- which is trivially a false statement -- that makes your whole post rather vacuous.

    ReplyDelete
  7. Having a well setup shader workflow the way I describe it does not mean that the artists can't tune shaders. They can tune a lot. In Table-Tennis we had per character 200 sliders :-) ... having very well tailored shaders in the pipeline you end up with lots of sliders for the artists.
    Obviously the skin shader has different tuning values than the cloth shader etc.. because cloth uses for example Oren-Nayar or Minnaert lighting.

    shader permutation hell ... never heard of that. This sounds like a strong lack of organization. My first question would be: did you guys had more than 20 different materials? I think the number 20 does not sound like hell and it was good for all projects I worked on so far.

    Ok Christer; it depends on where you start counting AAA ... a very narrow view would be more than 4 million copies :-) ... that would exclude then most titles I know with licensed engine technology.

    ReplyDelete
  8. ... but seriously all games I know do not use shader generators. I know that Unreal 3 has a material editor which does not work out well because Artists can't optimize shaders and they tend to do redundant stuff with it.

    ReplyDelete
  9. btw. Christer I saw now your post at

    http://realtimecollisiondetection.net/blog/?p=73

    It seems like we are not far away here.

    ReplyDelete
  10. I like actually this paragraph: "You rarely hear people say bad things about these graphical shader systems, but next time you’re at GDC and have the opportunity to meet up with a bunch of developers who’ve used a graphical shader system, ask them how much work they spent on cleaning up shaders at the end of the project. The stories I’ve heard haven’t been pretty." This is my experience. I didn't know that you have already written about it. So I will refer to your post next time.

    ReplyDelete
  11. IMO, one of the reasons we have the combinatoric problem with shaders is that the language designers decided to leave out the linker stage. the only way to compose shader functions into the final result is to run the compiler front end over and over again on the same code fragments and produce binary shaders at the other end.

    If there were a way to precompile shader segments and compose them at "link time", a lot less tine would be spent in the compiler front end. Instead we could compose fragments in intermediate representation (e.g. PS3.0) which then gets passed to the driver back end for the normal translation and peephole optimization stage.

    Is there any good reason why the shader compilers work contrary to most other high level languages, other than the fact that they use a lot of inlining? I just get the feeling that someone (*cough*NVidia*cough*) took a shortcut early in the dev. process and we're all paying for it years down the line.

    ReplyDelete
  12. You missed the bit where I mentioned we opted for an übershader-approach for God of War 3, after considering all the alternatives. And contrary to your statements I believe we're typically considered an AAA team and I like to think we do understand how to program graphics cards!

    ReplyDelete
  13. >>> I know that Unreal 3 has a material editor which does not work out well because Artists can't optimize shaders and they tend to do redundant stuff with it. <<<

    It worked out really well for Epic last time I checked Gears of War's sales. And some of their licensees, e.g. 2k Boston/Australia. I'd certainly consider Bioshock a AAA title.

    What I can agree with is that such systems have their problems as I well know (our team is using such a system). Optimizing shaders is more difficult. Seeing some of the shaders that content creators generate will make you want to cry.

    I just think it's misleading to make such a blanket statement. Different tools make sense for different teams and for some teams a graphical shader editor makes sense -- even AAA teams.

    ReplyDelete
  14. <<<
    You missed the bit where I mentioned we opted for an übershader-approach for God of War 3, after considering all the alternatives. And contrary to your statements I believe we're typically considered an AAA team and I like to think we do understand how to program graphics cards.
    <<<
    Yep my joke was not good. Let's say I owe you at least a beer next time we bump into each other :-) ... then I will be curious to hear how your ueber-shader approach went through (I think I can count three projects where I was involved with one)

    ReplyDelete
  15. I definitely agree with the sentiment - the results of tools like Mental Mill are enough to make a shader programmer have a small fit.

    However I do think ubershaders can be excluded from this, simply because they're written by a coder anyway, and the combinations are finite - you can profile all the various options you allow. Eventually of course, the combinations may get too unwieldy and you'll be forced to either stop adding options, or split the shader into more than one ubershader with a tighter focus. I think multiple categorised ubershaders are a good middle-ground, since they allow you to include/exclude shader code, change semantic bindings and sizes etc depending on requirements without so much duplication as the separate shader appraoch, but with a programmer controlled 'break point' where you split things off for the sake of keeping a lid on the complexity & permutations.

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. To take a totally different perspective:

    Shader Generators are currently a bad idea because little of the final look has anything at all to do with the writing of the shader but rather the artwork and parameters used to feed the shader. A good artist can wrangle something good out of almost any shading system.

    True, excellent engineers can give artists more capability, speed or simplicity, but 90% + of the time actually spent making something look good come from the tweaking of the parameter space rather than the crafting of the behavior.

    To speak to someone's earlier point, who is your audience for such a tool?

    The engineer? Why take away power from them when the language tools they have are appropriate to the detail they want to work with.

    The artist? They don't deal well with creating generalized behavior. They work very well using generalized behavior created for them.

    The technical artist? The rare technical artist that is both a good artist and a good coder would be the perfect candidate, but these people are very hard to find. In addition, once they're a good coder they can make do with what's there. They are also rarely the majority of the art staff on a project.

    The art teams I've seen on AAA products are usually much larger than the number of people authoring shaders. Wolfgang's estimate of 20 shaders rings very true.

    All this said, once we have so much power in hardware, it doesn't matter anymore (look at film) then artist can make whatever they want in say, XSI, and it'll "just go". But that's not for at least a few years.

    ReplyDelete
  18. A graph-based shader tool is as good as the guy who wrote it, plain and simple...

    There has been many attempts, but not many graph-based shader tools exist or are being used effectively. I can agree that some graph-based tools are clunky, generate ugly and bad performing code, and make it harder on the developer to take full control of the output. This doesn't have to be the case!

    I've been developing a graph-based tool for 8 years now called ShaderWorks which was pulled in by Activision a long time ago. I've been beating my head against a wall from day one, I knew this type of tool was very very powerful, but was on the fence about how well of a code generator and shader prototyping power it could be.

    Only until version 4.0 have I jumped completely on the "graph-based" side of the fence.

    The main things holding me back was that the code being generated was inherently horrid to look at and the template code which defined the graph/block/node and shader code was just too annoying to deal with. Even though not many have this figured out, I can honestly say it is possible to fix.

    In most generators, there tends to be a ton of variable indirection and redundancies which make the generated code ugly ( if not done right ) but regardless these things are simple for the compiler to optimize away.

    Another concern was weather or not to generate nested code ( all in one shader function ) from the graph blocks or to generate per-block functions. Either way seems to generate comparable performance but I chose function based as it keeps things very readable and clean and it lets the function inputs/outputs rename the variable names so everything ties in better with the templates.

    A well done code generator can actually do a “perfect” job at packing / unpacking interpolator registers and making that part SO SO much easier to manage and change. So with those things covered, what is left that could be slower compared with hand written shaders? Not much since the code written in the graph templates are written by us, the developer, the same people writing the hand coded shaders, so any per shader optimization can easily be done in template, afterwards manually in your very clean shader output…

    ReplyDelete
  19. (continues) - sorry, lots to say :)

    Some argue these applications are not flexible enough for many types of techniques? Well again, if you have a good code generator, the output is as flexible as your material graph / template system makes it. You can go complex with your system, or have whole shaders handled mostly within few blocks / templates.

    With performance dealt with, a graph-based shader tool can generate you MUCH better looking, organized and CONSISTANT shader code than what most coders churn out if given the time and MUCH more understandable than any uber-shader since the option filtering is done in the code generation pass, not the pre-processor pass( mostly ).

    Alright now on to the template language. Coding HLSL is one thing, having to deal with horrible template code is a whole new can of worms. Yes this could be the case ( was for earlier version of our tool ) but that doesn’t have to be how things work. A C/C++ like interpreter can make writing template code as easy as writing little bits of C code to manipulate shader string output and handle input/output ports, among many other useful utilities. I can assure you that having a flexible, error reporting, template language is possible and fun to use. It's like having VC++ built into our tool.

    A graph-based shader tool ( with static tweaks ) can allow a good shader permutation system to generate and manage far less shaders and make it easier to add/remove features which in a if/def style ubershader would get out of hand VERY quickly. Remember, a graph-based shader tool is inherently a dynamic uber-shader generator.

    Any shader tool ( like FX Composer ) because of its framework easily improves on iteration time and a well developed graph-based tool can even more drastically improve prototyping times AND lead to exiting new shader developments. For example, procedural effects are a bitch to envision, let alone develop blindly, doing so with a graph-based shader tool, it’s a snap and easy to visualize and debug.

    I'm not saying that this type of tool is for every studio, as it did take me 8 years to get to where I am with my tool, but I just wanted to say such a tool, if done right, would enhance a pipeline in MANY ways and lead to some amazing shader development / prototyping capabilities.

    ReplyDelete
  20. Another misconception is that graphical shader tools are developed specifically for use by artists / technical artists and not coders. It's obviously a horrible idea to have artists develop shaders that go directly into the pipeline. Having them prototype new effects that can be handed off to a developer for later optimization is reasonable in certain cases. I mean, myself, i generate craziest 500+ instructions, 2000 lines shaders until I'm satisfied with the look and quality I was aiming for, then I hack away at it to get the instruction count down considerably for use in game.

    On the other hand, most low level shader developers I've met are purist and performance junkies and shy away from any clunky mechanism getting in their way, unless they fully understand the shader system and know exactly what to expect from the code generation. I mean, if the tool is good enough, it's like trying to give your grandmother a cell phone, she'll fight it to death, but once she uses it a bit, she's texting all her 90 year old boyfriends... ( dramatization )

    Ok, I'm done, I'm starting to sound like a crazed lobbyist or something! haha

    ReplyDelete
  21. Hey Scott,
    nice to hear from you again. I believe I was one of your alpha testers 8 or 9 years ago with ShaderWorks. The tool looked very promising and I was quite excited about it.
    What you say here sounds very good. I would be very interested in checking out your tool but I guess I would have to join Activision to do this :-)

    I must say you describe shader/graphics programmers pretty good here :-).

    The process of optimization shaders does not only consist of counting cycles, we also move for example parts of the lighting formula back and forth between the vertex and the pixel shader. We merge code pieces when we add for example for skin a subsurface scattering model or a Fresnel term etc.. I usually prototype with RenderMonkey or FX Composer because I can do it at home or while traveling and do not need access to my machine at work ... I also prototype with a piece of paper and a pencil by just writing a shader on a piece of paper that is just available.
    If you have a library of shader sub-routines, you have a good starting point for writing shaders.

    It is a shame that we can't try your program. I can only imagine what kind of progress you made in those eight years ...

    ReplyDelete