# Difference: PilotEpisode (1 vs. 3)

#### Revision 32008-01-28 - DickFurnas

Line: 1 to 1

 META TOPICPARENT name="SeasonOne"

# Numb3rs 101: Pilot

 In this episode, a serial murder-rapist has made numerous
Changed:
<
<
attacks in the Los Angeles area.  Making some seemingly harmless
>
>
attacks in the Los Angeles area. Making some seemingly harmless
basic assumptions, Charlie builds a statistical model of the attacker's behavior which helps the FBI stop the murders.
Line: 12 to 12

## What does practical mean?

A model of the attacker's behavior could be a number of different

Changed:
<
<
things.  Don and the FBI want to know where the next attack will be.  Charlie points out that this might be the incorrect approach to the problem by making an analogy to a sprinkler.  He proposes
>
>
things. Don and the FBI want to know where the next attack will be. Charlie points out that this might be the incorrect approach to the problem by making an analogy to a sprinkler. He proposes
that finding the sprinkler would be easier than deducing the location
Changed:
<
<
of the next point where a drop will hit.  In many ways, Charlie is probably correct.  We can probably assume that the killer has an apartment where he or she spends quite a bit of time.  If we can
>
>
of the next point where a drop will hit. In many ways, Charlie is probably correct. We can probably assume that the killer has an apartment where he or she spends quite a bit of time. If we can
find the most likely neighborhood where the attacker resides, it should be easier on FBI resources than sending a dozen agents to patrol
Changed:
<
<
several neighborhoods searching for an attack in progress.  A
>
>
several neighborhoods searching for an attack in progress. A
second reason, going back to the analogy of the sprinkler, is that the physics of finding the next sprinkler drop are quite complicated; by this, Charlie means that regardless of how many drops we've seen hit the ground, the area where we should look for the next drop to hit will
Changed:
<
<
be quite large.  In the best possible model, more data should provide significantly better deductions.  Since the sprinkler is
>
>
be quite large. In the best possible model, more data should provide significantly better deductions. Since the sprinkler is
stationary, however, the seeming randomness of the action of physics on each droplet will affect Charlie's model less and less as the number of droplets are observed.

We will define a model of the attacker's behavior to be a function p(x) from the addresses in Los Angeles to the unit interval (the

Changed:
<
<
interval [0,1]).  That is to say that if x is a location in Los Angeles, then p(x) is a number between 0 and 1.  Not just any function will do, however.  We require the function p(x) to have
>
>
interval [0,1]). That is to say that if x is a location in Los Angeles, then p(x) is a number between 0 and 1. Not just any function will do, however. We require the function p(x) to have
the following property: when we sum the values p(x) over all addresses,
Changed:
<
<
the result is 1.  We then call p(x) the probability that x is the killer's address.  Where p(x) is higher, the assailant is more
>
>
the result is 1. We then call p(x) the probability that x is the killer's address. Where p(x) is higher, the assailant is more
likely to reside.

After thinking about the problem briefly, we realize that we have

Changed:
<
<
our hands full: there are infinitely many models to choose from.  We somehow need to find the right one.  However, we don't even have a concept of what right means.  In laymen's terms, we need the model to "fit the data."  How do we quantify this common notion so that we may make mathematical deductions?  Charlie makes
>
>
our hands full: there are infinitely many models to choose from. We somehow need to find the right one. However, we don't even have a concept of what right means. In laymen's terms, we need the model to "fit the data." How do we quantify this common notion so that we may make mathematical deductions? Charlie makes
a point that when a person tries to make a bunch of points on a plane appear randomly distributed, the result is that the points adjacent to any given point x are all approximately the same distance away from
Changed:
<
<
x.  Charlie uses this information to produce his model.  As
>
>
x. Charlie uses this information to produce his model. As
one can see from the map Charlie brings to the FBI, the "hot spot" - the most likely area where the perpetrator lives - which Charlie computes is in what we might imagine is the "center" of the attacks.
Changed:
<
<
Unfortunately, Charlie's model fails.  The FBI gets DNA samples
>
>
Unfortunately, Charlie's model fails. The FBI gets DNA samples
from every resident of the neighborhood Charlie says they should check,
Changed:
<
<
but none of the DNA matches the perpetrator's.  So, Charlie has to ask himself if the model he made was good.  He sees one data point which appears anomalous.  However, any model with the properties
>
>
but none of the DNA matches the perpetrator's. So, Charlie has to ask himself if the model he made was good. He sees one data point which appears anomalous. However, any model with the properties
we outlined above shouldn't be affected too much by a single data
Changed:
<
<
point.  Indeed, after fixing the error, Charlie's new model has a
>
>
point. Indeed, after fixing the error, Charlie's new model has a
smaller hot zone which lies completely within the one the FBI already
Changed:
<
<
checked.  He is stunned by the realization that his model is bad.
>
>
checked. He is stunned by the realization that his model is bad.

Changed:
<
<
Eventually, Charlie realizes that he has made a classic error.  It is sometimes quipped that the only difference between
>
>
Eventually, Charlie realizes that he has made a classic error. It is sometimes quipped that the only difference between
physicists and engineers is that physicists can be sloppy in their
Changed:
<
<
approximations.  A physicist wanting to produce a set of laws of
>
>
approximations. A physicist wanting to produce a set of laws of
physics which is as complete as possible will not choose the most
Changed:
<
<
complicated set of laws before trying out a simpler set first.  In
>
>
complicated set of laws before trying out a simpler set first. In
the same way, Charlie chooses the mathematically practical approach by choosing the simplest possible solution to his problem by assuming
Changed:
<
<
there would be exactly one hot zone.  It is mathematically
>
>
there would be exactly one hot zone. It is mathematically
practical in the sense that solving this complicated problem is a lot
Changed:
<
<
easier with one hot zone than two.  Generally this is a good way
>
>
easier with one hot zone than two. Generally this is a good way
to approach a problem; at worst one learns why the easy approach does not work which hopefully gives some clues as to what the more
Changed:
<
<
complicated approach should look like.  With two hot zones (one
>
>
complicated approach should look like. With two hot zones (one
representing home and the other representing work), Charlie's model gives more accurate results: one hot zone is in the same neighborhood as before, the other is in an industrial area, and the hottest parts of
Changed:
<
<
the hot zones are quite small.  After making the arrest, Don
>
>
the hot zones are quite small. After making the arrest, Don
notices that the perpetrator had moved from the original hot zone a few weeks ago, which is why the FBI hadn't found him in their original search.
Line: 91 to 91

## How did Charlie produce his model?

Of all the possible models, how did Charlie find that specific

Changed:
<
<
one?  Not many clues are given in the episode as to what method he uses.  So, let's consider the following more tractable problem.  Suppose we have done the following experiment: we hung a
>
>
one? Not many clues are given in the episode as to what method he uses. So, let's consider the following more tractable problem. Suppose we have done the following experiment: we hung a
spring from the ceiling and measured the lengths of the spring after
Changed:
<
<
attaching a weight.  After doing this for several different weights, we have collected a bunch of data.  After making a graph
>
>
attaching a weight. After doing this for several different weights, we have collected a bunch of data. After making a graph
of weight versus change in length, we might notice that the points form
Changed:
<
<
almost a line (assuming the weights aren't too heavy).  This means
>
>
almost a line (assuming the weights aren't too heavy). This means
that if the weight is W and the change in length is L, W=kL for some number k (this relationship is called Hooke's law after the British
Changed:
<
<
physicist born in the 1600s).  How do we compute k?  We could
>
>
physicist born in the 1600s). How do we compute k? We could
draw in a bunch of lines which seem to approximate our data well and
Changed:
<
<
pick the one which is the best.  This is called the linear regression problem.  But which one is the best?  There are
>
>
pick the one which is the best. This is called the linear regression problem. But which one is the best? There are
many different concepts of that, and the simplest one isn't the one we generally use.
Changed:
<
<
Let M denote the set of all lines through the origin.  M is our set of models.  Let S denote the set of points which we have computed experimentally.  Given any line W(L) in the set M and
>
>
Let M denote the set of all lines through the origin. M is our set of models. Let S denote the set of points which we have computed experimentally. Given any line W(L) in the set M and
data point (x,y), compute the quantity , the
Changed:
<
<
vertical distance between the line W and the point (x,y).  Add up
>
>
vertical distance between the line W and the point (x,y). Add up
the quantities for each point in
Changed:
<
<
S.  This gives us a mapping from the set of models to the non-negative real numbers.  Generally, a map from a set of
>
>
S. This gives us a mapping from the set of models to the non-negative real numbers. Generally, a map from a set of
functions (in this case, lines) to the real numbers is called a
Changed:
<
<
functional.  Denote our functional by A(W).  If we can find a line for which A(W) = 0, then our data points are all colinear.  This is generally not going to happen.  The next best thing would be to find a line W for which A(W) is as small as possible.  In
>
>
functional. Denote our functional by A(W). If we can find a line for which A(W) = 0, then our data points are all colinear. This is generally not going to happen. The next best thing would be to find a line W for which A(W) is as small as possible. In
this case, we say that such a line W minimizes A(W).
Changed:
<
<
So now we must find a line which minimizes A(W).  When one
>
>
So now we must find a line which minimizes A(W). When one
wishes to minimize a quantity, one generally uses differential
Changed:
<
<
calculus.  Unfortunately, the absolute value function has no derivative at zero.  So, square the distance first!  Instead
>
>
calculus. Unfortunately, the absolute value function has no derivative at zero. So, square the distance first! Instead
of adding up to create A(W), we add up Changed:
<
<
.  This is
>
>
. This is
called the method of least squares and is generally the accepted method
Changed:
<
<
of solving the linear regression problem.  To solve the problem requires a little calculus.  Wolfram's website has a
>
>
of solving the linear regression problem. To solve the problem requires a little calculus. Wolfram's website has a
relatively good explanation .
Changed:
<
<
Notice that there was nothing special about the line here.  Any
>
>
Notice that there was nothing special about the line here. Any
set of functions M and non-negative functional A(W) would have worked (although there will be technical problems if M and A are not chosen wisely, such as not being able to find a minimizing function inside our
Changed:
<
<
set M).  So we could have found the quadratic polynomial of best
>
>
set M). So we could have found the quadratic polynomial of best
fit or the exponential of best fit in a similar fashion, although solving such a problem will no doubt be vastly more complicated.

This method is a general approach used by mathematicians in a

Changed:
<
<
variety of situations.  The difficulty is generally in proving the
>
>
variety of situations. The difficulty is generally in proving the
existence of a function which minimizes whichever functional we have
Changed:
<
<
decided to work with.  In physics, one often hears that objects take the path of least action or least energy.  That is to say
>
>
decided to work with. In physics, one often hears that objects take the path of least action or least energy. That is to say
that rather than solving a very complicated differential equation, we
Changed:
<
<
could solve a variational problem instead.  This method is so useful that it is basis of
>
>
could solve a variational problem instead. This method is so useful that it is basis of
theoretical physics.

## Random sequences?

Charlie makes a comment that to

Changed:
<
<
consciously construct a random sequence is impossible.  This is true in many ways.  As an example, consider the following property of sequences due to Khinchin.  Given any real
>
>
consciously construct a random sequence is impossible. This is true in many ways. As an example, consider the following property of sequences due to Khinchin. Given any real
number x we can find a continued fraction expansion
Line: 183 to 183 Changed:
<
<
for almost every continued fraction.  Here almost every refers to a somewhat complicated notion from measure theory .  Luckily, in our case it means exactly what it sounds like.  What
>
>
for almost every continued fraction. Here almost every refers to a somewhat complicated notion from measure theory . Luckily, in our case it means exactly what it sounds like. What
is interesting is that no one has been able to demonstrate a continued
Changed:
<
<
fraction which has the property given above.  So, truly, we are
>
>
fraction which has the property given above. So, truly, we are
not very good at coming up with random sequences at all!

#### Revision 22008-01-24 - DickFurnas

Line: 1 to 1
Changed:
<
<
 META TOPICPARENT name="WebHome"
>
>
 META TOPICPARENT name="SeasonOne"

# Numb3rs 101: Pilot

>
>
In this episode, a serial murder-rapist has made numerous attacks in the Los Angeles area.  Making some seemingly harmless basic assumptions, Charlie builds a statistical model of the attacker's behavior which helps the FBI stop the murders.
>
>

## What does practical mean?

#### Revision 12008-01-24 - DickFurnas

Line: 1 to 1
>
>
 META TOPICPARENT name="WebHome"

# Numb3rs 101: Pilot

In this episode, a serial murder-rapist has made numerous attacks in the Los Angeles area.  Making some seemingly harmless basic assumptions, Charlie builds a statistical model of the attacker's behavior which helps the FBI stop the murders.

## What does practical mean?

A model of the attacker's behavior could be a number of different things.  Don and the FBI want to know where the next attack will be.  Charlie points out that this might be the incorrect approach to the problem by making an analogy to a sprinkler.  He proposes that finding the sprinkler would be easier than deducing the location of the next point where a drop will hit.  In many ways, Charlie is probably correct.  We can probably assume that the killer has an apartment where he or she spends quite a bit of time.  If we can find the most likely neighborhood where the attacker resides, it should be easier on FBI resources than sending a dozen agents to patrol several neighborhoods searching for an attack in progress.  A second reason, going back to the analogy of the sprinkler, is that the physics of finding the next sprinkler drop are quite complicated; by this, Charlie means that regardless of how many drops we've seen hit the ground, the area where we should look for the next drop to hit will be quite large.  In the best possible model, more data should provide significantly better deductions.  Since the sprinkler is stationary, however, the seeming randomness of the action of physics on each droplet will affect Charlie's model less and less as the number of droplets are observed.

We will define a model of the attacker's behavior to be a function p(x) from the addresses in Los Angeles to the unit interval (the interval [0,1]).  That is to say that if x is a location in Los Angeles, then p(x) is a number between 0 and 1.  Not just any function will do, however.  We require the function p(x) to have the following property: when we sum the values p(x) over all addresses, the result is 1.  We then call p(x) the probability that x is the killer's address.  Where p(x) is higher, the assailant is more likely to reside.

After thinking about the problem briefly, we realize that we have our hands full: there are infinitely many models to choose from.  We somehow need to find the right one.  However, we don't even have a concept of what right means.  In laymen's terms, we need the model to "fit the data."  How do we quantify this common notion so that we may make mathematical deductions?  Charlie makes a point that when a person tries to make a bunch of points on a plane appear randomly distributed, the result is that the points adjacent to any given point x are all approximately the same distance away from x.  Charlie uses this information to produce his model.  As one can see from the map Charlie brings to the FBI, the "hot spot" - the most likely area where the perpetrator lives - which Charlie computes is in what we might imagine is the "center" of the attacks.

Unfortunately, Charlie's model fails.  The FBI gets DNA samples from every resident of the neighborhood Charlie says they should check, but none of the DNA matches the perpetrator's.  So, Charlie has to ask himself if the model he made was good.  He sees one data point which appears anomalous.  However, any model with the properties we outlined above shouldn't be affected too much by a single data point.  Indeed, after fixing the error, Charlie's new model has a smaller hot zone which lies completely within the one the FBI already checked.  He is stunned by the realization that his model is bad.

Eventually, Charlie realizes that he has made a classic error.  It is sometimes quipped that the only difference between physicists and engineers is that physicists can be sloppy in their approximations.  A physicist wanting to produce a set of laws of physics which is as complete as possible will not choose the most complicated set of laws before trying out a simpler set first.  In the same way, Charlie chooses the mathematically practical approach by choosing the simplest possible solution to his problem by assuming there would be exactly one hot zone.  It is mathematically practical in the sense that solving this complicated problem is a lot easier with one hot zone than two.  Generally this is a good way to approach a problem; at worst one learns why the easy approach does not work which hopefully gives some clues as to what the more complicated approach should look like.  With two hot zones (one representing home and the other representing work), Charlie's model gives more accurate results: one hot zone is in the same neighborhood as before, the other is in an industrial area, and the hottest parts of the hot zones are quite small.  After making the arrest, Don notices that the perpetrator had moved from the original hot zone a few weeks ago, which is why the FBI hadn't found him in their original search.

## How did Charlie produce his model?

Of all the possible models, how did Charlie find that specific one?  Not many clues are given in the episode as to what method he uses.  So, let's consider the following more tractable problem.  Suppose we have done the following experiment: we hung a spring from the ceiling and measured the lengths of the spring after attaching a weight.  After doing this for several different weights, we have collected a bunch of data.  After making a graph of weight versus change in length, we might notice that the points form almost a line (assuming the weights aren't too heavy).  This means that if the weight is W and the change in length is L, W=kL for some number k (this relationship is called Hooke's law after the British physicist born in the 1600s).  How do we compute k?  We could draw in a bunch of lines which seem to approximate our data well and pick the one which is the best.  This is called the linear regression problem.  But which one is the best?  There are many different concepts of that, and the simplest one isn't the one we generally use.

Let M denote the set of all lines through the origin.  M is our set of models.  Let S denote the set of points which we have computed experimentally.  Given any line W(L) in the set M and data point (x,y), compute the quantity , the vertical distance between the line W and the point (x,y).  Add up the quantities for each point in S.  This gives us a mapping from the set of models to the non-negative real numbers.  Generally, a map from a set of functions (in this case, lines) to the real numbers is called a functional.  Denote our functional by A(W).  If we can find a line for which A(W) = 0, then our data points are all colinear.  This is generally not going to happen.  The next best thing would be to find a line W for which A(W) is as small as possible.  In this case, we say that such a line W minimizes A(W).

So now we must find a line which minimizes A(W).  When one wishes to minimize a quantity, one generally uses differential calculus.  Unfortunately, the absolute value function has no derivative at zero.  So, square the distance first!  Instead of adding up to create A(W), we add up .  This is called the method of least squares and is generally the accepted method of solving the linear regression problem.  To solve the problem requires a little calculus.  Wolfram's website has a relatively good explanation .

Notice that there was nothing special about the line here.  Any set of functions M and non-negative functional A(W) would have worked (although there will be technical problems if M and A are not chosen wisely, such as not being able to find a minimizing function inside our set M).  So we could have found the quadratic polynomial of best fit or the exponential of best fit in a similar fashion, although solving such a problem will no doubt be vastly more complicated.

This method is a general approach used by mathematicians in a variety of situations.  The difficulty is generally in proving the existence of a function which minimizes whichever functional we have decided to work with.  In physics, one often hears that objects take the path of least action or least energy.  That is to say that rather than solving a very complicated differential equation, we could solve a variational problem instead.  This method is so useful that it is basis of theoretical physics.

## Random sequences?

Charlie makes a comment that to consciously construct a random sequence is impossible.  This is true in many ways.  As an example, consider the following property of sequences due to Khinchin.  Given any real number x we can find a continued fraction expansion In fact, as long as x is irrational, the continued fraction expansion is unique. Khinchin's Theorem says that for almost every continued fraction.  Here almost every refers to a somewhat complicated notion from measure theory .  Luckily, in our case it means exactly what it sounds like.  What is interesting is that no one has been able to demonstrate a continued fraction which has the property given above.  So, truly, we are not very good at coming up with random sequences at all!

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.