## From an engineering point of view what are the downsides of a too accurate model?

22

4

I was wondering, from an engineering point of view, for what reasons can too much accuracy in modeling be detrimental?

From a scientific point of view it seems like it's almost always a benefit, apart from when you need to take less computation time.

So from an engineering point of view, besides time (or computing power) why should you avoid that?

2PLease define "accuracy" and "too much" here. You could have a model which predicts the uncertainty range to extremely high accuracy, or a model which reduces said uncertainty itself to a very small value. And so on.Carl Witthoft 2017-11-29T18:48:04.043

1“Everything should be made as simple as possible, but no simpler.” Einstein.Eric Duminil 2017-11-30T11:22:04.033

1"besides time (or computing power)" It seems all the answers missed this point..agentp 2017-11-30T13:44:21.200

1@agentp On the contrary, the question answers itself by trying to exclude that. It's a silly thing to be in the question in the first place.jpmc26 2017-11-30T19:40:40.650

Accuracy != Precision. It's the first thing I was taught in physics class. 3 is a more accurate representation of Pi than 3.5794.

Given this differentiation, I don't think you are correct in assuming that an over accurate model is ever detrimental. Accurate means close to ground truth. – user247243 2017-12-01T07:49:58.107

@user247243 I don't think you are correct in assuming that an over accurate model is ever detrimental. If one statistical model tells us that we need 11.5 cup coffee maker and another takes ten times longer to tell us we need an 11.46124 cup coffee maker because our cups are slightly smaller than the norm we've wasted a bunch of time coming to the same conclusion (that we will buy a 12 cup machine).Myles 2017-12-01T13:39:23.947

@Myles The problem is, the detrimental case you have listed is purely a time/computing power issue. There is no other detriment to using a model like that. OP has also specifically said that time and computation aren't being considered here.JMac 2017-12-01T19:42:50.810

@JMac Which is why it is a comment rather than an answer.Myles 2017-12-01T19:51:03.763

2this is seriously the worst "highly up voted" question I've ever seen. It is flat out confusing.agentp 2017-12-02T01:42:06.307

37

Beware of overfitting. A more accurate model of gathered data from a system may not be a better predictor of future behavior of a system.

The above image shows two models of some data.

The linear line is somewhat accurate on the training data (the points on the graph), and (one would expect) it will be somewhat accurate on the testing data (where the points are likely to be for x < 5 and x > -5).

By contrast, the polynomial is 100% accurate for the training data, but (unless you have any reason to believe the 9th degree polynomial is reasonable for some physical reason), you would assume this will be an extremely poor predictor for x > 5 and x < -5.

The linear model is 'less accurate', based on any comparison of errors with the data we have gathered. But it is more generalisable.

Additionally, Engineers have to worry less about their model, and more about what people will do with the model.

If I tell you that we're going on a walk on a hot day and it's expected to last 426 minutes. You are likely to bring less water than if I tell you the walk will last 7 hours, and even less than if I say the walk will last 4-8 hours. This is because you are responding to my implied level of confidence in my forecast, rather than the mid point of my stated times.

If you give people an accurate model, people will reduce their margin of error. This leads to bigger risks.

Taking the walk on a hot day example, if I know the walk will take 4-8 hours in 95% of cases, with some uncertainty around navigation and walking speed. Perfectly knowing our walking speed will decrease the uncertainty of the 4-8 figure, but it won't significantly effect the 'chance of us taking so long that water becomes an issue', because that is driven almost entirely by the uncertain navigation, not the uncertain walking speed.

1Right, though I'd remark that a polynomial of degree $N$ is an example with unusually bad behaviour; one should definitely never use such a model. Sensible models, even when overfitted, should not explode like that unless you actually leave the range covered by the measurements. In fact even a polynomial of degree 8 would already make for a much smoother fit, given those data.leftaroundabout 2017-11-30T14:42:41.910

Key quote from the linked Wikipedia article: 'overfitting occurs when a model begins to "memorize" training data rather than "learning" to generalize from a trend.'Emilio M Bumachar 2017-12-01T17:02:15.163

2Would we really consider overfitting to be "too much accuracy in the model"? That's not a downside of having "too accurate a model". That's a downside to having too many accurate points and modelling poorly. Making a bad model off accurate data isn't an accurate model.JMac 2017-12-01T19:45:53.907

@JMac: Overfitting can occur naturally in machine learning contexts, without deliberately setting out to build a bad model, just by throwing too much data at the training set. I'm not sure "too accurate" is the right way to describe that kind of outcome, but neither is "simple modeling error."Kevin 2017-12-02T22:19:29.443

26

The most obvious downside is cost, all engineering projects have a finite budget and spending more money than you need to is clearly a bad thing not to mention wasting time.

There can also be more subtle issues. Things like FE analysis are always approximations and sometimes adding unnecessary detail can introduce artefacts and make it more difficult to troubleshoot a model. For example you can get discontinuities which result in stress raisers

There is also the consideration that even if you have the computing power to comfortably handle a large chunk of data suppliers and customers may not and in many cases transferring big files is still a bit of a bottleneck.

Similarly if you have more parameters than you need you are potentially creating extra work down the line in managing and debugging files.

Again even if you have abundant time and resources now it may well be that someone further down the line needs to use that model without the same luxury, especially if it ends up being part of a product that youe are selling to customers.

7Query: 2nd paragraph should it read "... adding necessary detail ..." or "adding unnecessary detail"Fred 2017-11-29T23:58:43.807

yeah should be unnecessaryChris Johns 2017-12-01T11:16:28.077

I'm not sure if the FE example works well here. In that case, the FE is the model. Using more accurate data could present problems; but if your FE model is accurate, then obviously you don't need to worry about the artefacts; because your model doesn't have them. We've already defined it as accurate. Maybe in the case of using a different model to plug into a FE analysis; but then that's mostly just the point of "someone further down the line" using the model.JMac 2017-12-01T19:51:42.467

13

There are a few reasons.

From a purely pragmatic perspective, it's due to time constraints. The requisite time to solve a model increases far, far faster than the level of precision, and whichever level is adopted is subjective, anyway.

This is also affected by the fact that excessive accuracy is mostly useless. After all, your model might be 99.999% accurate for the given input values, but the real world is imprecise. Steel's modulus of elasticity has a tolerance of $\pm5$-$15\%$, for example. So why bother with a super accurate model if one of your key inputs can be off by 10%? (it goes without saying that the margins of error for other materials such as concrete or soil and for other variables such as loading are significantly higher).

Due to this, there is no point in being too precise. But indeed, it may be beneficial to not even try to be too precise. The reasons for this are mostly psychological, however. Mainly, you don't want your model to be too precise, and you don't want to output your results with seven decimal places, because you don't want to evoke a false sense of confidence.

The human brain is hardwired to think that 1.2393532697 is a more accurate value than 1.2. But that's actually not the case. Due to all the real-world uncertainties your model cannot possibly take into consideration (especially given current hardware limitations), 1.2 is almost certainly just as valid a result as 1.2393532697. So don't ilude yourself or whoever sees your model. Just output 1.2, which transparently indicates that you don't really know what's going on after that second digit.

6

An extremely accurate model may require a prohibitive amount of input data. It might be possible to generate an excellent model of weather systems, for example, by taking as input the position and velocity of every gas molecule in the atmosphere. In practice, such a model would not be useful, since there's no realistic way to generate the proper input. A less accurate model that only requires limited input data would be preferable in this case.

1You've answered a different question, to wit: "how much input data is too much"Carl Witthoft 2017-11-29T18:46:59.840

I'd probably add on here a note about how the question mentions "besides when you need less computation time," because that's also a good reason to have a less precise model; if your model is too precise, real-world cases might take longer than the heat death of the universe to calculate.Delioth 2017-11-29T19:36:26.417

5

"Too accurate" is not monotonic. It can actually create an illusion of fidelity which makes you think it's worth pumping more money into the simulation. This becomes very important when you're presenting data from mixed-fidelity models, where some parts are very detailed and other parts are very coarse.

A real life example I had involved sampling altitudes over terrain. The team had decided to sample the terrain in 1024 chunks to maximize fidelity. Our customer wanted a ReallyGood(tm) answer.

Now I was bothered by the runtime hits this particular algorithm caused, and I wanted to understand how much fidelity I was actually paying for. I hadn't seen any terrain data, so I asked them how they loaded it. The answer was "oh, we don't have terrain. It's just flat."

So it sounded like I had an awesome high-fidelity model which sampled 1024 points. What I actually had was a low-fidelity model which did no better than sampling 1 point 1024 times, but ran a whole ton slower, and masqueraded as a higher-fidelity model!

In the real engineering world, leaders don't always have the opportunity to learn the entire architecture of a model. In fact, I'd say they never have the time. Our leadership was making decisions off the assumption that we had an awesome 1024 point model. Nobody was at fault, it's just what happens when you tune the fidelity up too high on one part of the model, and have low fidelity on the other. Its the nature of the beast with mixed-fidelity.

A parable about how reducing to significant figures isn't always just about cutting off trailing zeros.Eikre 2017-11-30T19:51:09.643

1

In reality there is the data we have, and there is the data we don't have. Almost always, the amount of data we don't have is much much more than we could ever hope to gather for practical or economical reasons.

By trying to fit data obnoxiously well to the few samples we have therefore will risk making our model do really bad estimates into areas where we honestly have no clue (due to lack of data). Then our model will give us a false sense of security.

1

So from an engineering point of view, besides time (or computing power) why should you avoid that

Coming from a mechanical engineering perspective the biggest reason is you only commit to the additional effort if it produces significantly different results.

If the level of accuracy in your model is orders of magnitude higher than the level of accuracy your would be able to deliver in execution of your design you are wasting your effort. If the level of accuracy described in your model is higher than what is required that has impact for the client. You are wasting money. For example if you are specifying higher precision than the design actually requires (eg +/- .00001mm in the length of a vent pipe) you are wasting your clients money because a 350mm vent to atmosphere does about the same job as a 350.0005mm vent to atmosphere but the latter is significantly more expensive to produce.

In university we all learned to model the physical world using Newtonian physics even though it is well established that post-Newtonian physics present a more accurate model of physical behavior. In spite of this I know of no mechanical engineering program that by default eschews Newtonian models as too inaccurate. If we use the more accurate model and come up with an answer that is 0.1% closer to the theoretical truth that will not impact our final design in the vast majority of cases. If our yield stress is 0.1% different that gives us an insignificant difference in our required cross section which leads us to choosing the exact same size of I-beam based on either method. In this circumstance the costs of additional effort deliver no additional benefit.

Now there are situations where precision is required to produce a workable design, for example modeling of some satellites required relativistic physics. In these circumstances we need to find a model that delivers the level of accuracy required and we need to design to the model. If we need to calculate dimensions to +/- 0.0001% it's totally wasted effort if our part dimensions are +/- 0.1%. In real world applications part dimensions of the latter degree of accuracy are much more common than the former.

0

Cost : the cost of time or the cost of computing power and the cost of accuracy - if other variables have a tolerance of 5% for example why compute results to 1%...

0

In the previous answers input and cost were mentioned. If you want accuracy eg. in optimization of production parameters you probably need more measurements and first you need to analyze how much you can reduce costs vs how many work hours thus increased expence is for increasing freaquency of measurements or the cost of automized system which will replace manual data collecting. Second example if you get very accurate resaults in which you invested time and other resources to obtain, do you have adequate equipment for quality control, industrial measurements etc. or even technology. If your results are in vain than time spent obtainig them is misspent.

0

Would hou need a satellite image at centimeter-resolution in order to identify forests by color? Sure not. I would be detrimental, as you'd have to decide about any non-green 10 square centimeters patch. The same for modelling: the detail resolution should fit the resolution of your target features. If not, you'll loose time downsizing.

0

Most of the real answers are excluded by your artificial constraint that computing power and calculation times are not to be considered. A model that takes hours or days to evaluate does not allow rapid design iterations and will slow things down on a human scale, increasing cost and possibly leading to inferior results. Cleverly simplifying models without losing too much accuracy can be a very useful approach, then the brute-force model can be used to validate the final iteration.

It's possible overly complex models may mask fundamental errors in the model, or that the work required to gather information to practically use the model to the maximum will outweigh any potential benefit. For example if you need to know the characteristics of a material to a greater degree of accuracy than the supplier can control them, you can either accept the error bands or go and test each batch of material to tweak the model.