Question:

Does a line of best fit split the points in half or go through as many points as possible ?

by Guest58524 | earlier

0 LIKES Like UnLike

Tags:

Report

Answer The Question I've Same Question Too

Follow Question

6 ANSWERS

Sort By: Date | Rating

Splitting the points in half could mean going at right angles to the main slope!

The line of best fit reduces the overall distance between all points and the line.

The least squares method is pretty good.
Report (0) (0) | earlier
you can find the line that best fit by calculating the mean on each axe

and then you can try to get the line that best fit through the mean on the graph and try to have and approximately equal amount of x's in each side.

or something like that.
Report (0) (0) | earlier
not neccesarily either. It TRIES to "hit" as many points as possible, but may not actually hit any of them. It's the line (or curve) that best represents what the data is doing. In a way, it's sort of the average of the data. You wouldn't say an average of numbers splits those numbers. Although sometimes it might. An average of numbers also doesn't necessarily have to be one of the numbers itself, although it could.
Report (0) (0) | earlier
It won't necessarily do either.

First of all, as was mentioned, the "line of best fit" doesn't have a common definition.  However, we can say some things for specific cases.

Consider simple linear regression (i.e., one dependent variable Y, one independent variable X, and we assume the form Y=a+bX+e, where e is a random variable with mean zero).  Then we can take the observed X_i's and Y_i's and minimize the sum of their squared deviations from our fitted line by choosing a suitable a and b.  (This is called least square estimation; for details I refer you to wikipedia or elsewhere, but this is the way any program will do simple linear regression).  This fitted line (with the chosen a and b) is then the "best" in the sense that it produces unbiased estimation (that is, we expect it to be correct on average) and has the least variance (that is, it stays most accurate) of any linear model.

As such, the "best fit" line has several properties.  First of all, the average observed x-value corresponds to the point on the line with the average observed y-vale.  Second, the sum of the error terms is zero (and of course the sum of their squares is minimal).  However, it is not necessarily true that the line splits the points in half, and it probably doesn't go through any of the points whatsoever.

EDIT:  Oh, and one more thing.  If you'll allow something other than a straight line, and we have only finitely many data points to consider, we can create a polynomial that will exactly fit all of those points.  The problem there is that our predicted values are a bit funny, as we haven't allowed for any deviation (that +e above where e is a random variable).  Not many systems conform exactly to some nice functional form.
Report (0) (0) |   earlier
For a first approximation or rule of thumb, you should draw the line in such a way that you have as many points above the line as there are below it.  One way of thinking of it is that the sum of all distances from the line of points above the line should be the same as the sum of all distances from the line of all points below the line, to minimise the 'bias' in where you draw the line.

For mathematical reasons, which you may check further in a textbook for more depth, rather than add up all the distances of each point from the line (whether above or below), we add the squares of the distances, a trick that gets rid of any negative numbers, which can be awkward to deal with.  Then, the 'best fit' line is that in which the sum of squares of distances above and below the line is the least.  You can draw infinitely many lines through a set of points, but there's only one in which this 'sum of squares' is the least it can be. It's often called the 'least squares regression' line.

You can also see, intuitively, that the closer all the points are to the line, the 'better' the fit of the line to the points, and the smaller this least squares sum will be.  When all the points line on the line, the least squares value will be 0, and the line will be a perfect fit.

All this assumes that the underlying data points are really represented by a straight line.  However, the points could really be on a curve of some kind, say, a polynomial or exponential or logarithmic curve.  Usually, if your data points are experimental results of some kind, you have an idea or a suspicion of which type of curve it may be.  Since drawing a straight line and estimating its position is usually easier than trying the same thing for a curve, one trick is to convert the curve to a straight line, and you can do this by plotting, not the X or Y values, but the logarithm of the X or Y values, or both. You can even buy pads of 'log-log' graph paper, where the X and Y axis scales are logarithmically plotted, rather than linearly as normal.  With this kind of paper, your 'curve' data ends up as a straight line, which is easier to work with.  And happily enough, the mathematics of the 'least squares' method works exactly the same way with linear-log and log-log plots.
Report (0) (0) |   earlier
It depends on what the line of best fit does. Line of best fit is just a term for a line of approximation. Simply put, it can be whatever you want it to be.

In Excel, I believe lines of best fit minimize error at each point.
Report (0) (0) | earlier