Best Fit Bezier Curve
During my grad school days I came a cross a pretty fun math problem.
I was working with digital ink data, that is, time-stamped x-y coordinates which represented handwritten pen strokes. We wanted to compute the best-fit Bezier curve for a portion of any arbitrary pen stroke. In my googling I only came across approximations, e.g., gradient descent, simulated annealing, etc. These types of solutions are typically non-deterministic, slow, and do not guarantee the best solution. For that reason, I worked out a best-fit solution to this problem, very similar to the least squares approach to linear regression. In this article, I will walk through all the math I did, but if you want to jump to the end, I put it all in a code snippet on Source Forge.
Lets start by defining a cubic Bezier curve:
Each c is a two dimensional Euclidean point and t ∈ [0,1] as illustrated in the figure below.
What is a simple/intuitive explanation of what this curve is doing? At t = 0, B(0) = p1 and the curve follows a trajectory towards the control point p2. As t increases though, the "influence" will shift, such that the curve follows a trajectory towards p4 coming from the control point p3. Finally, at t = 1, B(1) = p4.
To make it easier to do the least-squares derivation, we express the Bezier function in terms of matrix operations. Namely, given the matrices:
We can write the function for a Bezier curve as:
Next, lets define the input data we are trying to fit. Namely, we have a series of, n, Cartesian points comprising a segment of a handwritten stroke:
Such that:
If we consider the x values separately from the y values, we effectively have two vectors:
We need a way to synchronize the points in the stroke segment with points in the Bezier curve when computing the error of fit. Namely, we need an index into the Bezier curve for each point along the handwritten segment. To do this, we use the path length of the segment, defined at the ith point in the stroke as:
With d1 = 0. We can then define a vector that maps the index, i of each point in the stroke to a corresponding index, ti, in the Bezier curve.
We fit the y-values first and then repeat the process for the x-values.
As in any least squares problem, we need an equation that computes the error of fit for a given Bezier curve, defined by its control points P to the given y-values, E(P). I use the residual sum of squares, summing the squared distance from each handwritten point to its Bezier curve approximation:
If we define the matrix:
Then we can rewrite the error equation and find its maximum:
Taking the derivative and setting it equal to zero to find the maxima:
Finally, solving for Py:
And that's Howie do it. If you want to know the y-components of the best fit Bezier curve for our n, simply apply the transformations shown in the last equations. Similarly, to find the x-components, replace the y-vector with x:
I received several requests for code and have thus created a Source Forge project that hosts a JAVA implementation of this code. It requires the UJMP package to work.