White Shining Rock Logo
White Shining Rock Logo

Hardware bugs??

August 13, 2013 | Dukus | 54 Comments

Sometimes fixing bugs makes you question why programming is your chosen profession. Or at least why you chose to make a game engine from scratch. Sometimes there are technical details you don't really care to know about.

This last week I've been attempting to build the largest possible city I can to determine reasonable population counts for achievements and scenarios. This takes a lot of time. The amount of time is compounded by me fixing any issues I run into. Because of bugs, I haven't gotten to max population.

If the bug is something such as the opacity of an icon is wrong, I just write it down in my bug list to be fixed later. If it's something that stops play, like the game crashes, I fix the issue immediately. In general if the fix wasn't too invasive, I can continue from an auto-save and at most lose a minute of game play time.

There's also balance issues. I ran into a situation where some fishing docks weren't producing enough fish for a beginning settlement to survive on. You can survive on gathering, farming, or hunting - so it doesn't make sense to not survive using fishing. Balancing bugs take a bit of play and testing to fix but in general are pretty easy to deal with.

Then there are the really bad issues. Problems that crop up only every few hours and are not reproducible using any simple steps. If you've got good testers eventually they can figure out reproduction steps, but doing so in a debug build is usually painful, as the game runs so slow and it takes a long time for the bug to occur.

I've got one bug like this that I've been ignoring it for a while. Ever since I added DirectX 11 rendering I've occasionally seen the interface to the graphics hardware fail. All it reports is:

D3D11: Removing Device.

If you query the interface for more information the error code is

DXGI_ERROR_DEVICE_REMOVED

When this occurs, you can't continue rendering without restarting the graphics interface.

It happens occasionally, usually rendering large scenes, but not always. The D3D11 debug layer output shows nothing extra, no warnings, no errors. This is a non-debuggable error. Rendering the exact same scene that caused the issue doesn't make the issue occur again, so reproducing it is very hard. Only my main development machine does this. Other computers I have around do not. The DirectX 9 version of the game also has no problems and can run for days without the graphics card dying.

The documentation says if you get this error, either the driver for the video card has been updated, or the video card has been physically removed from the machine. Clearly, I'm not doing either of these things while the game is running. Since those two things are very rare, I had written the code to just throw an error and quit should it occur.

It's not comforting that it seems to happen anywhere from five times a minute to once an hour on my main development machine. But some days it doesn't happen at all.

Searching the internet for other developers with the same issue is very hard - there's a lot of noise with non-programmers talking about it. A lot of new games don't handle this error and simple quit as well. Doing a search for 'DXGI_ERROR_DEVICE_REMOVED' gives you an unlimited list of forums for Crysis 3, Arma, Civ 5, Conan, Hitman, Secret World, Battlefield 3, 3DMark, and more, that just report this error and quit. Apparently gamers are not happy about this. I wouldn't be either.

The list of possible fixes people recommend are re-installing drivers, using beta drivers, re-installing windows, removing dust from the video card and reseating it, lowering quality settings in the game, and even increasing the voltage sent to the video card.

These are not things I have ever had to do to fix a bug. I've never actually run into a driver malfunction either - it's always been something stupid I'm doing that makes the video card throw errors. After hours of debugging and trying different things to figure out what I was doing wrong, I eventually made a google search that eliminated all pages that contain the game names that have this error, and found a Microsoft blog talking about things that cause this error that aren't in the official documentation.

It could be driver bugs, hardware faults, overheating, GPU removed from system, or DirectX running out of memory. I can't tell which is happening. And since I can render the exact same scenes with DirectX 9 on the same video card, I have a hard time believing any of these things is actually occurring in a fatal way. Either way, it seems like this is a general 'something went wrong' error.

For all my searching, I've not seen one developer write about how they fixed this issue. This is annoying that it even happens, and more so since it can actually occur differently from what the official documentation says.

I don't know how many gamers will have this error occur, but looking at the major releases that have it, I can imagine it won't be an isolated problem. Since it appears that I'm not doing anything wrong, I just need to handle this error as gracefully as possible. So what's the fix?

My first gut feeling is to just ignore DirectX 11 and only ship DirectX 9 since it works perfectly, but I did spend the time to make a DirectX 11 renderer and the performance win on newer graphics hardware is very hard to ignore.

To fix it properly, I have to release all graphics resources, shut down D3D11, restart it, and then recreate all the resources. All while the game is running. The problem with this is recreating the resources is somewhat painful. The resources are in video memory, but not in any application accessible memory. Once the device gets into the removed state, I can't access them.

I could possibly reload all meshes and textures from disc, but this is slow. Some resources, like the terrain, are only stored inside a save game and would be very hard to get at in this state. If a device removal condition occurs I really wouldn't want the player to see a hiccup that would be over a second or two of time as things reload.

So instead I'll have to keep a memory backup of every texture and mesh so that I can restart the device at anytime. This seems like a serious waste of memory, but there's no other real choice. D3D9 requires a similar handling for device resets, but it isn't as severe as it takes care of some of the legwork for you, and you don't have to actually destroy and recreate the interface to the graphics hardware.

I haven't made this fix yet, so I'm interested to see how often it occurs once the recovery process works properly. I'm also wondering if I should make the game quit if it's happening at some high frequency.

If anyone has any additional information about this issue. I'd love to hear about it.

And here I thought making a video game was all about design, balancing, and fun. 🙂

Leave a Reply

Your email address will not be published.

54 comments on “Hardware bugs??”

  1. This sounds extraordinarily frustrating - you have my sympathies. Hopefully with perseverance the issue can be tackled. You've overcome a lot on this project, so I'm sure you can wrestle this bug eventually. Good luck and thanks for the interesting posts, they're a great insight into the development process.

  2. I agree with Ben. Totally have faith in you on this one and since it is not specific to this game and more likely and issue with the hardware/compatibility. Keep up the great work and the updates are fantastic! Again I would like to stress how much I would like to be a part of the Aplha/Beta testing.

  3. I've been reading your posts for a while - i love them! You are one of the few email subscriptions i haven't got rid of after a week or two.

    I'm a game designer and a big fan of strategy games so its fascinating to hear about your experiences - lots that i recognise 🙂

    I love the sound of the game, looking forward to trying it. Must be tough working solo for so long but also must be v rewarding - am equally impressed and jealous 🙂 Keep it up!

    PS that bug sounds a right pain to deal with - i'd leave it for now in case someone else fixes it and shares info then go back and fix at the end if no luck 🙁

  4. Hey worst case scenario you can just fall back on the old method of "What a horrible night for a curse" as you reset the graphics.

  5. This sounds like overheating of gpu. Many games ends with same error, when gpu temp was too much. When it happens, gpu stops, and driver controlled gpu stops too - user noticed with this error. Imho it's a driver problem, not game.

  6. Hey,

    I hope this helps you in someway. I used to have that exact error occur in DiRT 2/3 and Battlefield 3. It ended up being the fact that my SLI'd GTX570's were not running the same BIOS. Upon updating BIOS to F7 on both cards, the issue NEVER came back.

    While I know this is a solution that might only work for me, it is a solution. And it can help find a root cause as to why this happens.

    Currently, I run a single GTX680, and I have not seen this error since the days of my GTX570 SLI.. Whether its a single vs. dual GPU issue, or whether the root causes is different per case, that I cannot answer. But I hope me sharing this helps in someway.

    Cheers.

  7. Your graphics card may be consuming too much power when this happens. Set it to 80% or 90% power usage with MSI Afterburner.

  8. Personally, I would not mind waiting a few seconds for resources to reload if it meant preventing the game from crashing and having to restart the entire game. In fact, I would find it a very nice feature.

    Perhaps you can keep track of when the last time it happened was while the game is playing, and if it happens again within a certain time period (a minute, maybe 5?), let the user know something funky is happening, and ask them if they want to continue hobbling along, or if they want to exit the game and try again. Perhaps even give them an option to reboot the game in DX9 in this case.

    In the end though, it seems like a bug that is really out of your control. Best course of action is to handle it as elegantly as possible, and I think you already have an elegant way planned!

  9. I understand your dillema.
    This may make your decision easier:

    Adding the recreation of resources from scratch is not such a wasteful effort as it seems at forst sight.

    If you ever end up porting to Android: this is a hard requirement. If a simple pop up dialog appears over your game you need to relinquish all resources, and afterwards, your app gets control again: then you need to recreate all the graphics resources from scratch.

    It is a lot of work, i know, but you may end up needing this anyway, bug or no bug.

  10. A couple of folks have already beaten me to the suggestions I was going to have, but I want to reiterate them. You're stressing your video card to a point that it "hiccups". Whether that be by providing too much voltage to it or too much heat. If the inside of your case has developed that thick sheet of dust that some people allow then this could lead to both of those issues since dust insulates as well as conducts. Or if you've an older video card that's just having odd difficulties at irregular moments. There's nothing you can do via code to resolve this issue though, so don't expect to be able too. The best you can do, like you've already suggested, is to just hack (elegant work around) a reload fix to where the player has minimal interruptions at best. I wouldn't spend too much time on it though, not without first moving your dev box video card from that machine to another and testing there. That's not a guaranteed test as your motherboard could very easily be the culprit.

    In all honesty, you've got enough fans on here that if you asked for donations to replace your video card I'm pretty sure you'd come up with enough funds. I'd support you. Keep up the great work! I'm eager to try out your game.

  11. I'm assuming you've already checked windows event logs and reliability monitor for events but double check reliability monitor to see if you have a listing for a video hardware recovery error with BCC 116 or 117. It's a common directX/video card issue.

    If these are being created at or near the time of crash the issue is likely power or overheating given your circumstances though there's several other causes. If this is the case you should be able to manage the changes as Lukio recommended. Also you'll have a nice neat dump file from windows driver recovery!

    I may be way off but it's something...

  12. Hello,

    I think it sounds like it will waste a lot of memory. Surely it would be easier to let it crash and tell the player if this happens please use DX9?

    Can't wait for this, will it be available on steam or is it disc? 🙂

  13. You have my support if you need a new video card. We are all here for you .... you are making the game that I have been looking forward to since stronghold crusades even if there is no combat it is still the best thing that is available .. make a paypal account put on your website lets help you my good sir

  14. This is the worst part of developing for PC.

    and even assuming you fix all the problems flawlessly. you'll still have goofs running 10+ year old hardware complaining that their integrated graphics card that only supports shader model 1.1 doesn't work.

    really hoping ps4+xboxone have a much easier entry point for development..

  15. does sound like a hardware/driver issue rather than an engine failure. I don't quite think it's best to have a failsafe that hogs that much resources on by default. it would be more elegant to offer this as a solution when the problem first appears on a user's system, with a few things they could try to fix it.

  16. >Sometimes fixing bugs makes you question why programming is your chosen profession.

    I know that feeling very well.

  17. Just curious, will the first release of the game be final, or will you do updates? No updates, bug updates, or adding things to the game? Also, is beta testing available?

  18. things like this are why if you're comfortable with where the game is at for now you should let people test for you. More people trying things, especially if you list the types of things you're looking for, would benefit you and you would have more time to work/debug instead of having to look for these problems yourself.

  19. This is why you need a small beta to see if anyone else even gets this error. Even without any dumps etc it's useful info. Do you play any that have postings with this error? Perhaps you should to see if it occurs there. If it does then it's your video card. Rather than dumping time into something even the big companies cant figure out make your DX mode settable, then if this error is seen force the game to dx9 permanently and have them restart the game. You end up with a happy gamer (can still play the game) and a happy you (didn't sink time into something that will effect .001% of users). I am sort of a perfectionist, but sometimes you have to be pragmatic and cut your losses.

  20. I know the felling for sure.

    The best fix I found for something like this is a cold beer. 🙂

  21. Personally, I would rather get a message saying that the game encountered an error and I should wait a while for the game to recover. Its a much better alternative to the game crashing and leaving me thinking that my pc is broken somehow. I've actually had this error before with BF3 and I would have appreciated it if the game just froze for a while and then continued working rather than losing my slot on the server, my position on the scoreboard etc.

  22. This was a really interesting post !

    Keep up the good work, we all have faith in you 🙂

  23. Before you spend hours coming up with a fix/recovery solution: Make sure your gtx 280 is stable. Does it survive a FurMark stresstest?

  24. Not exactly the same issue... But I frequently have video card lockups or crashes. But it only happens in certain programs. Today I decided to underclock my GPU... lo and behold the crash went away (was trying to play Dead Island) Either too much voltage causing issues, or overheating, hard to say which.

    Anyway, try underclocking your GPU and see if the errors persist. Some times the stock binning of your chip is stable enough to pass the binning inspection but not stable enough in all circumstances.

  25. Sounds really frustrating. Alas I have only experienced this problem form the consumer side of the street. Initial with Age of Conan, and a few other titles since then. It is rather maddening that the error message from DX are not more helpful to help diagnose the exact problem from a development standpoint. Ahh well.

    If you can figure out a good way to resolve the problem with DX11, good! If not, DX9 is still a very good choice for a game of this sort IMNSHO.

  26. I have encountered sich error many times playing many games.

    In most cases cause for this is overheating of your graphics card. Note this can be caused even if your graphic card memory chips or some other components are overheating and not necessariely onlg GPU overheating.
    So in order to rule out overheating I recomend opening your PC box and temporarily putting another fan blowing cool air directly into your graphics card. If the problem goes away then the cause is overheating.

    Another thing that might be causing this is poor power supply.
    With time PSU's do slowly weaken so when you heavily overlaod them (cause high power usage) the output voltages of your psu might become to fluctuate which can cause lots of problems with your PC (crashing, poor performance and even data corruption).
    So replacing your PSU might solve theese problems.

    And the thid cause for this might also be bad shaders. Yes if for yome reason your shaders are causing infinite loops, memory leaks, or even caling of unsupported functions this might result in Graphical Driver to reset your graphic card to prevent system hang.
    ATI/AMD calls this feature as "VPU recover". I don't know how nVidia calls this but I do know they have such feature.
    So I would recomend rechecking your shaders to make sure you don't have any bug in them.
    But since this would take lots of time I would first try to rule out first two posible causes.

  27. Also I would strongly recomend writing a fail safe routine becouse there can be other perfectly valid scenarios which cause Graphical Device instance to be destroyed like switching between graphical card and APU integrated GPU becouse of power saving (most common on Laptops when they run on batery)

  28. If it was a problem that was common, your solution would be a nice one. A lot more graceful than for any other game. But since it mostly sound like something very rare, bound to some very specific and uncommon hardware configuration/condition, I wouldn't waste time on building this solution. However, if you decide to implement it, please make it an option that has to be turned on by people with the specific problem rather than force it on most people.

    If I were you, I would simply display a message that explain to the user what to do (check for possible overheat, try DX9, etc), create an auto-save (if that is possible at that point) and then exit.

  29. Once apon a time, when hardware or driver failures occurred, the computer would just crash. Now things are sort of recovered on failure, and you think it is your problem to deal with. But it isn't. You cannot improve the error message. You cannot fix the hardware or driver issue. All you can do is mask this particular failure from the user. This isn't necessarily good, because when things are failing in this way there is something needing to be fixed and not fixing it can cause irreparable damage. Ignore heat problems or voltage problems long enough and you can fry all sorts of components in your PC.

  30. Congratulations! You got to the point where you have enough visitors to attract spammers' attention. Time to think of a filter!

  31. That's why I chose to stick to server-side development 🙂

    My suggestion would be to replace your graphics adapter with an identical (or very similar) model and see if the bug disappears.

  32. I'd put up a GPU monitor window over the game and if the game crashes and it shows unusual heat or anything than it would pinpoint some information (or not if the problem is somewhere else), but i'd give it a try.
    Also I think if the problem occurs, you should put up a dialogue where you state the problem and when it occurs 2nd time I'd ask the user if he/she wants to change to DX9 or wants more RAM usage and faster recovery from the error 🙂
    Also good luck with the game!!! 🙂

  33. Random non-specific hardware crashes are the bane of my life with my current PC and solving them has proved impossible with my basic level of technical expertise. If you manage any kind of solution to that kind of problem you're streets ahead of any other software developer I've encountered. Good luck!

  34. I searched the issue and crossed feedback from several sources and it seems there is a common point.
    Symptoms so far:
    - buffer error (somehow lost the device)
    - dual card with different BIOS version
    - vSync error - disabling vSync stabilize
    - vSync disable - GPU overrun - sometimes activating vSync stabilize
    - GPU overheating (exceptionally big rendering / bad fan settings / ...) using a tweak tool (EVGA precision) may stabilize
    - driver failure when switching (Alt-Tab or back from screensaver)
    - Card BIOS options: overclocking / GPU turbo mode/...

    Not sure if you can limit GPU usage but most issues seem particularly related to it. GPU usage do not necessarily depends on resolution, I experienced high GPU usage with low resolution and low textures only with lights / shadows / HDR rendering for example. Switching those rendering options live may fire as well the error.

    Some graphic cards drivers updates resolve the issue by adding more control to GPU handling. NVidia is regularly pointed out though.
    I also saw some posts regarding setting a bigger pagefile size... but this one seems out of context.

    Let us know about what you found out if you took time on this issue. You do a great work and I'm eager to being able to play your game.

  35. I think that I would prefer the "Please wait recovering from driver crash" solution.
    If it happens frequently either suggest to switch to DX9 mode or enable the resource hogging solution (if you get around to implementing it).
    In this way most of your users will get the best experience without the extra resource consuming solution. The players affected will have a choice in which compromise to make: 1. wait for each recovery 2. use extra memory 3. Change to DX9.
    Best of Luck.

  36. Dear creator of Banished, I have a pressing question. Why have you missed the boat in releasing your game, during the great outage of Simcity 2013? did you not want to capitalize on the opportunity? Now that Simcity has reached v7.0 with continued support, I no longer feel the sudden absolute need to want to must have Banished any longer. I will still buy your game, in support, but I ask, what are your reasons not having released in all this time?? Has EA payed you to keep quiet?

  37. @ Robert

    Maybe because the game isn't ready and unlike EA he doesn't want to release a piece of crap on to the market and ruin his rep and the good will of his customers?

  38. I may have to second Templar_X, seems to me that implementing an option flag that lets the user choose its poison would be not only reasonable but, indeed, pleasing.

    It shows the attention on technicalities the producer (you) has put when the programmer (you) stumbled upon a problem the designer (you!) wouldn't evetn think about.

    Still mad props for a very interesting blog on an even more interesting project!

  39. Hi,

    Not only because of this incident but in general my advice to you is to consider a broader beta test. This would help you to find out how serious this problem really is or if it is a more ore less isolated incident. Further it would help you to "make sure" that your software has a certain level of quality.

    I do have 25 years of experience as a MCSA, software developer and software development project lead and as such I can tell you that you can't even think of all the crazy things your future customers will do with your game and further a broader or maybe even an open beta test would help to test the software on a much broader hardware basis. For this you could even consider selling early access codes.

    I wish you the best of luck!

    Kind regards
    mac

  40. This can be because of few things, test the game with a better power supply, if the problem continues, try to downgrade the graphic card driver, if you use nVidia, try the driver 310 or an older one, if you use AMD, try the version 12 or an older one too and see the results, the mostly people with this error are using nVidia cards, if you can, test with an AMD card
    hope it would be helpful

  41. Lots of games are released with bugs.

    I went off Total War games because of the number of bugs!

    If you try to get everything absolutely perfect; you will never get it finished!

    I hope you get it "finished" because I haven't bought a game in ages after having been stabbed in the back by the Total War series!

  42. XKCD is always relevant...
    http://xkcd.com/979/

    Have you tried testing the graphics card RAM? I think I found a similar issue with my Nvidia Geforce 260m.

    I say think because sometimes the ram checks out fine, but other times there's a storm of errors. Some games work great forever, others crash every few minutes, and in one case there's a game (Outerra's prealpha graphics/streaming terrain thing [http://www.outerra.com/]) which appears to handle it somewhat gracefully but has horrifying graphics glitches caused by corrupt color values.

    Anyway, assuming that the glitches are caused by my problem, that my problem is the same as your problem, and that the Outerra people have any idea what is going on in my computer, then hey, maybe they solved your problem.

    But I wouldn't bet on it.

    Incidentally, it wouldn't hurt to check out Outerra; it's a neat engine that's looking for a game to take advantage of it.

  43. Hey,

    Don't give up. Like people said, the best solution to solving problems is to have people test it. I know, I learned it the hard way. Maybe have people pre-order (is this the new hotness?) it and test it out for you. For all you know this is an isolated problem (your machine only?)

    We are in love with this game and wanna help, so can we?

    Hang in there!

  44. @Robert

    How had SimCity improved since the release 6 months ago? I am one of those who pre-ordered the game, realized the problems within a few days and stopped playing. At first I thought, oh well, they will fix these. But soon I realized those things I thought were problems were actually as intended. I removed the game from my computer withing 10 days and will never return to it. Worst $80 I ever spent.

    Banished is not released yet, because it is not complete yet. This guy shows how much a game developer is suppose to care for its customer. Rushing a game out might possibly make you some money, but not on the long run.

    I will never, ever buy any other game by EA. If Banished ends up being as good as it looks, I will definitely buy it, and any other new game that will be released in the future.

  45. Here's one for the 'try this and see if it fixes things' box:

    It wouldn't be your graphics driver crashing, would it? I had that issue popping up on secret world (which I've now left for other reasons).

    Try increasing the voltage on your graphics card by 5-10ma (but not much more, because you may damage it), OR underclocking it by about 5%, and see if the error persists. Increasing the voltage slightly made my card stable and stopped the driver from crashing, which in turn stopped the game from wetting itself.

  46. I don't see anything wrong with stopping things for a few seconds, if the error is as rare as all that. A small price to pay for saved memory, IMO. Just put up a 'Pardon our mess...' message and then let the player continue on. Back in a jingleheimer jiffy and all that.

More Posts

Code Rot

April 17, 2022
1 2 3 47
Back to devlog
Back to devlog
© Copyright 2021 Shining Rock Software
Website Design & Branding by Carrboro Creative
menu-circlecross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram