EPR: The Strange Paradox of Distance and Observation

I’ve just graduated high school, and I’m still grappling with the realization that I’m never going to have to take an extraneous course again. I can’t say I’ve always enjoyed excessive coursework from classes I didn’t choose, but they all usually have a silver lining. This is especially true of Physics, which is one of the courses I’m going to miss the most. Barring any drastic major changes, it will be difficult to seriously continue my physics education, so to cope with that fact I’ve directed my recent attention almost completely to a few good physics books.

One of my favorite series is The Theoretical Minimum, which I admire for its constantly well-organized explanations and exercises. I recently finished the second volume, “Quantum Mechanics,” and all my friends already know because I keep bothering them about the interesting things it says. Since the prominent ideas all rely on an inhibitive amount of background knowledge, this post is my attempt at an amateur’s explanation in the style of another great book I’ve read recently: QED, by Richard Feynman. I will imitate Feynman’s use of arrows, although I will try to add a bit more math so that readers can follow along if the abstraction fails them.

My goal is to explain how quantum states are measured and modeled, what it really means to be “entangled,” and significant arguments informing QM interpretations such as the Einstein-Podolsky-Rosen Paradox and Bell’s Theorem. $\newcommand{\bra}[1]{\langle #1 \vert} \newcommand{\ket}[1]{\vert #1 \rangle} \newcommand{\inner}[2]{\langle #1 \vert #2 \rangle} \newcommand{\sqmag}[1]{\vert #1 \vert ^2} \definecolor{lblue}{RGB}{74, 144, 226} \definecolor{lred}{RGB}{208, 2, 27} \newcommand{\rtext}[1]{ \color{lred} #1 \color{black} } \newcommand{\btext}[1]{ \color{lblue} #1 \color{black} } \newcommand{\rket}[1]{ \ket{ { \color{lred} #1 \color{black} } } } \newcommand{\bket}[1]{ \ket{ { \color{lblue} #1 \color{black} } } } \newcommand{\rbket}[2]{ \ket{ { \color{lred} #1 } { \color{lblue} #2 \color{black} } } }$

State of a Single System

The essential characteristic of a quantum system is that in many cases, there is no definite method to determine the outcome of measurements on that system. It is likely that no such method exists, so instead we consider what results one might expect to see.

At this point, I’d like to provide a visual abstraction that will help me throughout these explanations. Consider some sort of arrow placed perpendicular to a surface:

In this picture the arrow is fixed, but imagine an experiment in which it is allowed to fall to either the right or the left. You can recreate this experiment on the desk with a pencil. If you align your pencil perfectly, it will be difficult to predict which way the pencil would fall.

If, however, there is a draft in the room, or the table is slightly uneven, the arrow might be more likely to fall on a specific side. Regardless of its bias, once the experiment is completed and the arrow is laying on its side, there is no more chance, and the outcome is predictable. “Dropping” the arrow now isn’t going to affect its position because it has already fallen. These characteristics are analogous to the important characteristics of quantum measurements:

When a specific state is prepared, experimental outcomes may be undeterminable.
However, there may be bias so that the experiment is somewhat predictable.
Subsequent experiments will always have the same result.

For actual quantum systems, there is a mathematical framework that also captures these characteristics. We start by developing a representation of the arrow in terms of its unambiguous states, left and right. We define two vectors, $\ket{l}$ and $\ket{r}$, which will serve as stand-ins for the two states. Here they are written using “Dirac Notation,” which is the bar and arrow surrounding each vector’s identifier. When the arrow points to the right, the vectors are called “kets.” This notation helps to specify vectors without enumerating their components, which takes focus away from a specific set of basis vectors as separate bases may be used together frequently. It also becomes very easy to write operations like the dot product and complex conjugation, which are also common.

The point of starting with left and right states is that we can compose them to create representations of the other possible states. Ideally, these representations should allow us to predict the results of dropping the arrow. If we define $\ket{l}$ and $\ket{r}$ to be orthogonal, as if one points up and the other points $90$ degrees to the side, we can create other vectors as linear combinations of the two. We are still using them to represent parallel directions, but defining them to be orthogonal allows them to represent a basis, like the familiar $\hat{I}$ and $\hat{j}$.

Using the new basis, we will add coefficients to represent the probabilities that the arrow will end up in each state. This is where our mixture comes in: if left and right is equally likely, our coefficients can indicate that the current state is half of one and half of the other. Instead of directly weighting the bases (ie. $\frac{1}{2}$ left and $\frac{1}{2}$ right) we will make probabilities out of the product of each basis vector’s coefficient and that coefficient’s complex conjugate. This is equivalent to taking the magnitude (a real number) and squaring it, and it is helpful to think of multiplying by the complex conjugate as a more general version of squaring a number. Using complex numbers increases the number of unique states we can represent, which turns out to be necessary for predicting experimental outcomes. So, we are essentially multiplying each basis by the “square root” of its probability as an outcome and creating our state-mixture by summing the results. Even if complex conjugates weren’t required, using the “square root” of a probability instead of the probability itself is helpful because when the sum of the “squares” of the components is $1$, the state vector has magnitude $1$, so it is normalized.

As an example, let’s consider the original experiment:

Since $\ket{l}$ and $\ket{r}$ are equally likely, $P\left( l \right) = P\left( r \right) = \frac{1}{2}$.
Our probability is also equal to the square of our coefficients: $c^2 = \frac{1}{2}$, $c = \pm \frac{1}{\sqrt{2}}$.
This yields two possibilities: one vector $\ket{s}$ whose coefficients have the same signs, and one whose does not.
We will call the former $\ket{u}$ (up) and the latter $\ket{d}$ (down).

$\ket{u} = \frac{1}{\sqrt{2}}\ket{l} + \frac{1}{\sqrt{2}}\ket{r}$
$\ket{d} = \frac{1}{\sqrt{2}}\ket{l} - \frac{1}{\sqrt{2}}\ket{r}$

I had previously mentioned Dirac’s notation, which we now have the opportunity to apply. In this notation, inner products are written $\inner{u}{v}$ – very simple. $\bra{u}$ is new: this is the notation for a “bra,” the complex conjugate of a corresponding ket $\ket{u}$. Imagine a new vector where all coefficients have been replaced by their conjugates and all basis vectors have been switched from ket-version to bra-version. You can only take the inner product between a bra and a ket, even though the transformation leaves real coefficients untouched. While this distinction may seem unimportant, it is crucial to allow more complex (this is a pun) calculations to have real outputs. I want to avoid giving so much detail that I could justify every conjugation, but generally they allow operations to take advantage of complex numbers while having real results.

For any normalized ket, the inner product with itself is equal to $1$, as usual. Again, multiplying something by its complex conjugate is like squaring the magnitude, and this is still true for bras and kets. For our mutually orthogonal kets, the inner product is $0$, which follows from the usual definition of the inner product. Lastly, the operation is also distributive, so that $\bra{a} \left( \ket{b} + \ket{c} \right) = \inner{a}{b} + \inner{a}{c}$. Let’s try to use these properties to simulate the running experiment. We will use our new notation to find the probability of the arrow falling to the right, given its initial state.

Prepare a state $\ket{s} = \frac{1}{\sqrt{2}}\ket{l} + \frac{1}{\sqrt{2}}\ket{r}$.
$P\left( r \right) $ is determined by the coefficient on $\ket{r}$, which we can select using the inner product: $\begin{align} \inner{r}{s} & = \frac{1}{\sqrt{2}}\inner{r}{l} + \frac{1}{\sqrt{2}}\inner{r}{r} \\ & = \frac{1}{\sqrt{2}} \left( 0 \right) + \frac{1}{\sqrt{2}} \left( 1 \right) = \frac{1}{\sqrt{2}} \end{align}$
Finally, we square the magnitude of this coefficient: $P \left( r \right) = \vert \inner{r}{s} \vert ^2$. This is equivalent to multiplying by the complex conjugate: $P \left( r \right) = \inner{s}{r} \inner{r}{s}$.

Another good experimental example is the calculation of probabilities for the outcomes of our unambiguous states: $\sqmag{\inner{r}{l}} = 0$, $\sqmag{\inner{r}{r}} = 1$, and so our original definition of the experiment is satisfied. Basis states will typically have numeric values associated with them as the outcomes of measurements. In this example, $\ket{l}$ will be assigned $1$ and $\ket{r}$ will be assigned $-1$. Assigning these values allows us to define the expected value for the experiment as the sum of all possible numeric outcomes weighted by their probabilities. In the absence of deterministic measurements, the expected value is the closest thing we can get to a specific value. For example, the expected value of this experiment is $\frac{1}{2} \left( 1 \right) + \frac{1}{2} \left( -1 \right) = 0$, which reflects the fact that the arrow isn’t biased to the left or right.

So far, we’ve developed the essential notation that will be used throughout the post. It is summarized below:

$\ket{u}$ represents a state with a basis.
The coefficients on these basis vectors determine experimental outcomes.
We use these coefficients in the square magnitude of the inner product.
Because of this, the basis vectors themselves represent “fallen arrows” with known outcomes.

Expanding Our Measurements

The arrow experiment is good for developing some basic concepts, but at this point it is still a bit limited. Right now, our measurement consists of determining the probability for the arrow to end up on either the left or the right, because these happen to be the two unambiguous states. These are not our only two options. Imagine taking the entire plane that the arrow is resting on and flipping that $90^\circ$. Now, without changing the state vector, it has become aligned to the surface. The same dropping experiment is now unambiguously “up,” even though the state vector hasn’t moved.

In any experiment, there are two ways to change the outcome. You can either change the thing you’re measuring, or change the information detected by the experiment itself. This is an example of the latter. Experiments allow us to measure something’s characteristics, which is key throughout physics because the only way to determine something about system without prior information is to measure it. In this example, there are some system states for which an up-down measurement is definite while a right-left measurement is completely uncertain. It is very common to see some measurement A that is certain while some other measurement B is uncertain, and this uncertainty is what makes quantum mechanics so difficult to digest.

Keep in mind that uncertainty in B doesn’t mean we can’t try to measure its quantity. Just like in the first iteration of the experiment, we will see one of two outcomes, and later attempts will repeat that outcome. If we take another look at measurement A, the first measurement will again be uncertain before subsequent A measurements are the same. Our state vector seems to be jumping between positions that align with A and that align with B. This is crucial: the initial measurement of a quantum state will disrupt that state. When a state vector of combined basis states suddenly becomes only one upon measurement, it is said to have “collapsed.” This is what happens when we measure A after B, and it is also what happens when we measure the right-left orientation of an up state.

In some cases, experiments don’t always collapse the state vector into a state that creates uncertainty for other experiments. In fact, some experiments may have multiple equivalent unambiguous states, where these states turn out to be inequivalent unambiguous states for other experiments. Imagine all the different ways a falling arrow might land on the right half of the plane. While these are all equivalently “right,” they might not all be equivalent in other experiments. The picture below demonstrates this idea with $\sigma$ and $\gamma$ experiments, which both have positive and negative outcomes.

Mathematically, we have to introduce new ideas to represent these experiments. We want some “experiment operator” that will act on the state vector to tell us what results we could get, how likely they are, and which unambiguous states the system might collapse to. All the information about any possible experiment should be contained between the experiment operator and the state vector. That’s a lot of information, but luckily, it fits elegantly into a mathematical construct called a “linear operator.” Since it is likely that many students will not be familiar with matrices or linear operators, I will summarize their important characteristics and explain how they can be used here. As a point of technicality, I will actually be discussing matrix operators, which is a distinction you can read about here.

A matrix operator $\sigma_{n \times m}$ is essentially a mapping from an input vector of size $n$ to an output vector $m$. In our case, we could make $n$ equal to $m$ so that the matrix describes state transitions, but in most cases the state just collapses abruptly into one of a few special states. Instead, we’ll use the mapping to create a new “intermediate vector” for calculating experimental outcomes, as a shortcut over the original $\vert \inner{a}{b} \vert ^2$. Using the operator this way is an arbitrary decision to make things easier that informs the structure of the matrix – it works because we design it that way. Specifically, we can set up our operator so that the expected value of a measurement on $\ket{s}$ is $\bra{s} \sigma \ket{s}$. The intermediate vector is the result of the operator’s action on one of the two vectors besides it.

Representing the unambiguous states with a matrix operator comes down to specifying that state’s vector and its associated numeric value. We can pair both of these together through eigenvectors and eigenvalues (collectively, eigenpairs). An eigenvector is an operator input such that the operator’s output is the same vector, up to a multiplicative constant. This constant is the eigenvalue. It only makes sense to talk about specific eigenpairs because of the normalization constraint introduced during the discussion of complex components – otherwise any eigenvector could be scaled infinitely many ways as long as the eigenvalue counteracts that scaling.

For example, the original $\ket{l}$ would be an eigenvector of the horizontal plane operator with eigenvalue $1$. Our choice of $1$ is arbitrary, but we choose $\ket{l}$ because it is one of the unambiguous basis states that the experiment will collapse to. So, $\mathbb{HP} \ket{l} = \ket{l}$, where $\mathbb{HP}$ is our horizontal plane operator. Similarly, $\mathbb{HP} \ket{r} = - \ket{r}$. It is easy to see that because the inner product of a vector and itself is $1$, the expected value for any unambiguous state is just that state’s eigenvalue, which matches the idea that this state (and its numeric value) will always be realized by the experiment.

The experiments whose results don’t interfere with each other have “simultaneous eigenvectors.” Since these measurements don’t disrupt a state for each other, they share it. For example, $\sigma \ket{\sigma^+ \gamma^-} = \lambda_1 \ket{\sigma^+ \gamma^-}$, $\gamma \ket{\sigma^+ \gamma^-} = \lambda_2 \ket{\sigma^+ \gamma^-}$, and $\sigma \gamma \ket{\sigma^+ \gamma^-} = \gamma \sigma \ket{\sigma^+ \gamma^-} = \lambda_1 \lambda_2 \ket{\sigma^+ \gamma^-}$, where $\lambda$ indicates whatever eigenvalues each operator has for the eigenvector. These equations also demonstrate why such experiments are said to “commute,” whereas non-commuting experiments would collapse the state vector differently and violate the last equality. You can think of non-commuting experiments in terms of the following image: each experiment collapses the state vector into a state that is uncertain for the other experiment.

Thus, matrix operators allow us to encapsulate the three pieces of information that, along with a state vector, completely describe an experiment:

The result will be one of the eigenvalues.
Their likelihoods are described by way of the intermediate vector and the resulting expected value.
The system will collapse into one the corresponding eigenvector.

Measurements on Two Systems

Now that we’ve developed our experiments for single systems, we can continue to expand to “combined systems.” A combined system is exactly what it sounds like: multiple systems put together. These systems are treated as single systems with single outcomes, but this can be confusing because all results are combinations of results from the constituent systems.

Mathematically, we combine independent quantum systems in the same way we combine independent joint probabilities: by multiplying them together. Because we’re dealing with matrices, we have to specify the kind of multiplication, which is in this case the tensor product. This method is desirable because the tensor product of a $u$ dimensional vector and a $v$ dimensional vector has dimension $uv$. So, a tensor product of $2$-dimensional vectors $\ket{uv} = \ket{u} \otimes \ket{v}$ has $4$ dimensions, just like how a system of separate binary states yields four possibilities in the picture above. Our new four-dimensional basis vectors are $\rbket{l}{r}$, $\rbket{l}{l}$, $\rbket{r}{l}$, and $\rbket{r}{r}$. We can also combine operators by taking their tensor product. If we only want to measure $\rtext{HP}$, for example, we can create a combination operator $\rtext{HP} \otimes \btext{I}$, where $\btext{I}$ is the $2 \times 2$ identity matrix. In this case, $\rtext{HP} = \btext{HP} = HP$, which is to say that the colors’ only purpose is to help us keep track of what system things correspond to.

This operator example is special because it demonstrates an interesting characteristic of combined systems. Even though we consider them as a whole, it is possible for experiments to act only on a part of the system. Consider the hypothetical combination of $\rket{s}$ and $\bket{l}$, where $\rtext{s}$ is the half-left/half-right ket for the red experiment and $\btext{l}$ is the left ket for the blue experiment. $\rtext{HP} \otimes \btext{I}$ will only collapse the red half of the combination vector, so that the only possible results are $\rbket{r}{l}$ and $\rbket{l}{l}$. The state vector for the system as a whole would be $\ket{c} = \frac{1}{\sqrt{2}} \rbket{r}{l} + \frac{1}{\sqrt{2}} \rbket{l}{l}$. Up to this point, state combinations and measurements are fairly intuitive.

Quantum systems start to create paradoxes once you explore the different states a combined experiment can take on. For example, $\ket{e} = \frac{1}{\sqrt{2}} \rbket{l}{r} + \frac{1}{\sqrt{2}} \rbket{r}{l}$ is a perfectly valid combined state with two equally likely outcomes. Note, however, that you can infer the state of the overall system by testing any of the constituent systems, because all of the possibilities have unique outcomes for all of the constituents. Testing a single system is equivalent to testing the whole system, and will in fact collapse the combined state as a whole. So, regardless of whether or not you start on red or blue systems, the outcome of the first test will be completely uncertain (because all possibilities are equally likely) but the outcome of the second test will be completely certain (because the combined system has collapsed). This is experimentally verifiable (and verifiably strange). Because measurements on one system affect the outcome of others, the two are said to be “entangled.” It seems that the first system’s outcome influences the second system to make it non-random, but this effect persists even over distances large enough that any direct effect would have to move faster than light.

Entanglement phenomena is one of the highly popularized examples of “quantum weirdness,” and now that we understand it we can investigate some of the different arguments about how to interpret these results.

Einstein-Podolsky-Rosen Paradox

The Einstein-Podolsky-Rosen (EPR) Paradox is argument that attempts to show that entanglement means quantum theory is incomplete. In this case, “incompleteness” would imply that there is relevant information to the systems quantum mechanics attempts to describe that cannot be expressed in the quantum mechanical framework. If the framework were incomplete, there would be some kind of “hidden variable” that just doesn’t fit into our state representation. Ideally, even if these hidden variables couldn’t be known, the shared information behind a composite system state might explain how entangled particles seem to coordinate their outcomes.

While quantum mechanics does technically predict experimental outcomes, many physicists viewed its probabilistic nature as a cop out. Furthermore, classical theories generally yield results by providing insight into the mechanics of a system. Quantum mechanics cops out again: there is no attempt to try to explain why systems behave this way. Proving its incompleteness makes sense as a goal for physicists who weren’t satisfied with its theories, despite its agreement with evidence.

Given what we already know about entanglement, the EPR Paradox doesn’t rely on much extra information. To set up the argument, the authors introduce a criterion for measurements to correspond to real, physical states:

If, without in any way disturbing a system, we can predict with certainty (i.e., with probability equal to unity) the value of a physical quantity, then there exists an element of reality corresponding to that quantity.

Essentially, this criterion asserts that if a measurement is unambiguous, there is a definite physical state to which the measurement corresponds. This is as philosophical as it is pragmatic: they are asserting that for us to know a physical quantity, it must come from a real thing that exists. They assert the connection between our eigenvectors, which represent the state of a particle, and the eigenvalues, which are measurements resulting from that state. The nuance here is that they’ve also linked the “realness” of a physical state to the fact that it can be measured. They’re not only saying that measurements imply some abstract mathematical representation of the state, they’re saying that they imply real physical entities (elements of reality) too.

After this assertion, the first main part of the argument is that either quantum mechanics is incomplete or position and momentum cannot exist simultaneously. When, using the quantum mechanics framework, one defines experiment operators to measure a particle’s position and momentum, there is no possible set of simultaneous eigenvectors – like the earlier example of non-commuting experiments, these two measurements just don’t work out together. Because of the assertion, this conflict would mean that there also can’t be a real physical entity having both position and momentum. Naturally, this is either a fundamental aspect of the universe (position and momentum cannot exist simultaneously) or there is a flaw in our framework (quantum mechanics is incomplete).

The second part of the argument attempts to demonstrate that there can be a real physical entity having both position and momentum. Consider an entangled system of two particles. They are assumed to have their own independent states (real physical entities) and it is also assumed that if the particles are separated, they can’t instantly affect each other’s state in any way. So, when the two systems are apart, their states are completely separated. Like the previous entanglement example, these systems happen to be entangled such that measurements on one system determine the inevitable outcome of the same measurement for the other. So, if the two particles are separated so that they can’t affect each other, then measuring the position or momentum of the first gives a value for the second, indicating a corresponding real physical entity for that quantity. But, the experimenter could have just as easily measured the other quantity to get a value for the second system. In either case the second system’s state is the same, because it is a separate physical entity that can’t be affected by measurements on the first system. Since position and momentum can be determined from what must be the same state, the second system’s state is a real physical entity having both quantities.

The result of this argument is that only one of the two possibilities provided by the first part is valid: that quantum mechanics is incomplete. However, this result rests on the two crucial assumptions in the second part, realism and locality, which we will see become suspect as a result of Bell’s Theorem.

Bell’s Theorem

29 years after the EPR paper was published, the physicist John Bell proposed a new experiment to further examine the possibility of quantum hidden variables. His results indicate that no local hidden variable theory can actually account for certain experimental observations that are predicted very accurately by our currently established probabilistic quantum framework. To do this, he introduces simple probabilistic rules that follow from locality as it is presented in EPR and uses these to create another simple probabilistic inequality that experimental observations violate. This is a proof by contradiction – if assuming realism and locality leads to a falsehood, one of them must not be valid.

There are many experiments suggesting a Bell-type inequality that observations would violate, and so there are many variants of this theorem with different contexts. Here, we will describe experiments on polarized light, which is one of the most common examples. Polarizing materials filter light passing through them on the basis of a “polarization axis,” so that all the photons that pass through a material are aligned to its axis. If you think of light as a wave, polarization seems to simply isolate the waves with a specific oscillation axis. So, for example, no light passes through two filters with orthogonal polarization axes because the first alignment removes all the perpendicular waves that would pass through the second filter. Light isn’t actually a standard wave, but the idea helps to explain how polarization works.

Experimentally, the percentage of light that passes through a second filter is always equal to $\cos^2 \left( \theta \right)$, where $\theta$ is the difference between the axis the light is aligned to and the axis of the filter it’s passing through. This formula is not actually unmotivated – a single photon passing through a filter is an experiment on a single system. Our formula for calculating the probability that a state vector $\ket{s}$ will collapse to a specific basis state $\ket{A}$ is $\vert \inner{s}{A} \vert ^2$, and since state vectors are normalized their inner product is equivalent to the cosine of the angle between them. This connection seems to have the photon’s state vector correspond to a direction. What’s interesting here is that any photon passing through a filter will become strictly aligned to that filter’s axis, because it has collapsed into that basis state. Both the cosine rule and the photon’s collapse fit naturally into the probabilistic quantum framework, but as the theorem will indicate, they do not fit in nicely with local variables.

So, we will prepare the Bell experiment as follows. Let there be three filters, $A$, $B$, and $C$, with polarization axes that are $0 ^\circ$, $22.5 ^\circ$, and $45 ^\circ$ away from the vertical axis respectively. The variable $e$ will serve as a stand in for one of these filters. All photons will be initially aligned vertically, so that there is a $100 \%$ chance they pass through $A$. Bell’s “simple probabilistic assumptions” are based on an experimental setup with two entangled particles that will always either both pass or both be blocked by separated filters with the same orientation, even though the two outcomes are equally likely. Under the hidden variable theory, there is some explanatory variable $h$ that is shared when the particles become entangled and which cannot change after they are separated.

Now, we will try to define how we expect photons to act. Instead of considering how the entangled pairs act with the same filters, we’re going to look at how they act when they travel through different filters. So, we first designate the probability of outcome $\rtext{r}$ for the first entangled system given all the information about everything else in the system: $p_h \left( \rtext{r} | e_{ \rtext{r} }, e_{ \btext{b} }, \btext{b} \right)$

It helps to read this equation as, “the probability of outcome $\rtext{r}$ given the shared hidden state $h$, knowledge of the chosen experiments, and the outcome for $\btext{b}$.” Keep in mind that our hidden variables assumption means that neither experiment actually has a probabilistic outcome. Instead, this distribution represents variability in the hidden states that determine the outcomes. In reality, our locality assumptions mean that neither $e_{ \btext{b} }$, the choice of experiment in the blue system, nor $\btext{b}$, that experiment’s outcome, can influence the red system. Since arguments for $\rtext{r}$ apply symmetrically to $\btext{b}$, the following should be true:

$p_h \left( \rtext{r} | e_{ \rtext{r} }, e_{ \btext{b} }, \btext{b} \right) = p_h \left( \rtext{r} | e_{ \rtext{r} }\right) \\ p_h \left( \btext{b} | e_{ \rtext{r} }, e_{ \btext{b} }, \rtext{r} \right) = p_h \left( \btext{b} | e_{ \btext{b} } \right) \\ p_h \left( \rtext{r}, \btext{b} | e_{ \rtext{r} }, e_{ \btext{b} } \right) = p_h \left( \rtext{r} | e_{ \rtext{r} }\right) p_h \left( \btext{b} | e_{ \btext{b} } \right)$

The last equation is especially important. In this system, the fact that outcomes are strictly dependent on a local hidden variable means they are independent of each other, and the joint probability of two independent events is equal to the product of their individual probabilities. In the context of our experiment this means that the probability for the entangled photons to go through two separate filters must be equal to the probability that two non-entangled photons would go through their respective filters separately. This property is called Bell Locality, and it is considered the criteria for locality in this experiment.

Using the last equation, we can finally create a Bell inequality. Consider testing the photon pair by sending one through $A$ and one through $C$. Let’s ask, what is the probability of a hidden state that allows the two photons to pass through $A$ and $B$ as well as through $B$ and $C$? We know that because of the hidden variable, two particles will always behave the same way for the same filters, so the number of photons that can pass through $A$ and $B$ and $B$ and $C$ should be strictly less than can pass through $A$ and $C$. Think of it this way: all photon pairs that get through $A$, $B$, and $C$ can get through $A$ and $C$, but not the other way around. So, the following should be true: $p_h \left( \rtext{r}, \btext{b} | A, C \right) > p_h \left( \rtext{r}, \btext{b} | A, B\right) p_h \left( \rtext{r}, \btext{b} | B, C \right)$

The crux of Bell’s Theorem is that this inequality is not true. Experimentally, we see $p_h \left( \rtext{r}, \btext{b} \vert A, C \right) = 0.500$ and $p_h \left( \rtext{r}, \btext{b} \vert A, B\right) p_h \left( \rtext{r}, \btext{b} \vert B, C \right) = 0.739$. It is more likely that a photon pair gets through $A$, $B$, and $C$, than it is that they only get through $A$ and $C$. Surprise! Here is the quantum explanation: Since the difference between $A$ and $B$ is $22.5^\circ$, it is $cos^2 \left( 22.5^\circ \right) = 85.4 \%$ likely for a photon passing through $A$ to pass through $B$. Similarly, if one particle passes through $B$, we know that the other would too, inducing collapse of the wave function into alignment with $B$. Then, because $B$ and $C$ are also $22.5^\circ$ apart, it is also $85.4 \%$ likely for a photon to pass $C$. $0.739 = (0.854)(0.854)$, which is where the value comes from. Entanglement in this experiment seems to create a situation that violates strikingly simple logic.

Again, this is essentially a proof by contradiction. That Bell Inequalities do not hold implies the underlying assumptions about locality and realism must also not hold. While the theorem takes aim at hidden variable theories, it only eliminates those in which the hidden variables are local. Still, these results cast doubt on some of the main assumptions we hold about the way the universe works, and it is is easy to see why many physicists regard this experiment as one of the most profound in history.

Conclusion

The most appropriate conclusion might just be that when things are tiny, physics is strange. I’m sure nobody started reading this expecting things to make sense, but I hope that these diagrams and explanations make the various phenomena, experiments, and surprises a bit easier to understand. While both the EPR Paradox and Bell’s Theorem are complex technical arguments about how the universe functions in extreme conditions, they arise essentially from measuring well-known characteristics of systems with only one or two particles. It is certainly impressive to gain so much insight from such simple circumstances. Since you’ve reached the end, thanks for your attention, and if you’d like to learn more I have some resources listed below.