2  Mathematical Formalism

In the previous chapter, we gave a brief overview of the origins and development of quantum mechanics. This was aimed largely to provide meaningful historical context for how we ended up with such a theory. With that context clarified, we are now ready to explore the subject matter itself.

Of course, before we delve into our study of quantum mechanics, we will need to establish the fundamental underlying framework behind the subject. This will involve introducing a lot of relevant mathematical formalism. The formalism will focus around important elements of linear algebra, notably studying spaces of wave functions (Hilbert spaces). We will be introducing other mathematical principles later on as needed, but we summarize the key aspects of the relevant linear algebra here to give us a solid springboard.

There will be a lot of math in the following chapter. This chapter alone will introduce more math than any other individual chapter. This may feel rather overwhelming. Do not fret; the goal is not to master all of the math we’re about to explain with the utmost rigor; rather, the goal is to obtain a basic framework so the math we’ll use to develop the remaining theory of quantum mechanics will make sense.

Let’s now begin with the formalism. For the time being, we will assume some prior familiarity working with linear algebra objects like vector spaces and linear maps, and a bit of knowledge about complex numbers, such as the definition of a complex conjugate. If you’re interested in learning more about these, feel free to check out the appendices at the end of this book, or consult some linear algebra textbooks (such as Axler 2023).1. Now, on with the show!

2.1 The Wavefunction

The central focus of our analysis is the wavefunction of a system, which we’ll denote with a capital or lowercase \(\psi\). As the name suggests, this is a function that will take certain inputs (such as position, momentum, etc) and can be used to ascertain other properties (such as energy). Solving for the wavefunction of a system is the central focus of quantum mechanics, since the wavefunction itself fully represents a system/state; this will be expanded on further in [POSTULATES REF].

Now, since the wavefunction has to represent physically realizable states (eg, an electron or a Hydrogen atom), there are certain restrictions we must enforce on what type of function this can be. What restriction can we put on a function describing some sort of physical system? Answering this question involves understanding what the wavefunction is meant to encode in the first place. The key insight is recognizing that we can make use of probability for this. In other words, when measuring a particle’s position, we know that it must exist somewhere in space. Mathematically, this means that the probability of finding the wavefunction across all of space must be 1: \[ \int_{-\infty}^{\infty} |\psi(x)|^2 \ dx = 1 \]

Remark. The interpretation that the squared norm of the wavefunction encodes probability was a nontrivial leap that took physicists years to uncover. We applaud Max Born’s discovery, but will make no attempt to naturally arrive at this conclusion ourselves.

When \(\psi\) satisfies this property, we say that it is normalized. To make our lives a little bit simpler for now, we can loosen this restriction and only enforce that the function is normalizable: \[ \int_{-\infty}^{\infty} |\psi(x)|^2 \ dx < \infty \] This condition ends up being sufficient for our discussion, since any wavefunction satisfying this can be normalized by multiplying \(\psi\) by the reciprocal of the square root of the integral: \[ \int_{-\infty}^{\infty} |\psi(x)|^2 \ d x = A \implies \int_{-\infty}^{\infty} \left|\frac{1}{\sqrt{A}} \psi(x)\right|^2 \ d x = 1 \] The requirement that \(\psi\) is normalizable and it being complex valued are the only restrictions on the kinds of \(\psi\) we are allowed to consider. The collection of such \(\psi\) forms a vector space, and since \(\psi\) is complex-valued, we call this a complex vector space. We can then take this space, and attach an operation called the inner product to produce a Hilbert space.2 Specifically, the inner product we use is: \[ \langle \psi(x), \phi(x) \rangle = \int_{-\infty}^{\infty} \psi^{*}(x)\phi(x) \ d x \] where \(\psi, \phi\) are two wavefunctions and the \(*\) represents the complex conjugate. Combining this inner product with the set of \(\psi\)’s defines our Hilbert space. The reason we do this is because, as we will see, this framework ends up being a very natural (and elegant) way to talk about wavefunctions and the relationships between them.

2.1.1 Different Types of Wavefunctions

So far, we mainly considered wavefunctions defined as functions of position \(\psi(x)\), but it’s entirely possible we could also care about wavefunctions as functions of momentum (or any other variable). In the particular case of momentum, we can convert between the two using what’s known as a Fourier transform: \[ \phi(p) = \frac{1}{\sqrt{2\pi\hbar}}\int \psi(x) e^{-ipx/\hbar} \ dx \] Unlike the positional wavefunctions \(\psi(x)\), which live in what we call configuration space, the Fourier-transformed momentum wavefunctions \(\phi(p)\) live in momentum space. You may question (and rightfully so) whether these wavefunctions are similar enough to be comparable. Fortunately, their norms in their respective spaces are equal (thanks to the Parseval identity, which we will not prove): \[ \int_{-\infty}^{\infty} |\psi(x)|^2 \ d x= \int_{-\infty}^{\infty} |\phi(p)|^2 \ dp \] Thus, these wavefunctions in momentum space form their own separate Hilbert space (in terms of the momentum \(p\), instead of the position \(x\)). It should be stressed that there is no one correct wavefunction: \(\psi(x)\) is just as valid of a wavefunction as \(\phi(p)\), and the choice of which one to use will depend on the problem at hand. Given that we have these choices though, this is a good point to take a step back and develop some general notation that is agnostic to the variable we ultimately choose.

2.2 Dirac Notation (Bras, Kets, etc.)

Most of our study will revolve around analyzing quantum states, which begs an important question: which basis should we choose? How do we properly represent states?

Earlier, we noted how there are various Hilbert spaces of wavefunctions that may be of interest to us (there are even more, but we won’t cover all of them). So, how do we denote a state without giving undue prejudice to one representation over another? This is where we want to establish some notational formalism that abstracts away from prejudicial state representations. Thus, we want to discuss Dirac notation.

2.2.1 Vector Representations

To represent a state of a given system, we will use a ket vector, denoted \(\ket{}\). So, instead of writing our state as a wavefunction of a particular variable, like \(\psi(x)\), we will abstract to the more general state \(\ket{\psi}\).

Remark. When solving physical problems, we will often need to decide on a space and revisit the traditional wavefunction as a function of a variable, but in the most general case, we will use ket vectors. In fact, we use this more abstract representation to later derive the more traditional wavefunctions.

To motivate why we want to do this: recall that in the last section we defined the set of allowed \(\psi\) as a vector space. So, it’s only natural that we think of each \(\psi\) as a vector and represent it using \(\ket{\psi}\). Since these are vectors, we expect them to behave as such, satisfying properties like: \[ c \ket{\psi} = \ket{\psi} c \] for any scalar \(c\). The complex vector space in which these \(\ket{\psi}\) live we will denote as \(\mathcal{E}\).

2.2.2 Dual Spaces

Linear algebra also tells us that vectors and their accompanying vector space have corresponding dual vectors that live in the dual space. These are our bras \(\bra{}\), which live in \(\mathcal{E}^{*}\). Put very simply, these dual vectors (otherwise known as covectors or forms), describe linear operators that act on vectors that live in \(\mathcal{E}\): \[ \bra{\psi}: \mathcal{E} \to \mathbb{C} \] In other words, any linear operation that takes ket vectors as an input and spits out a complex number will belong to our dual space. So, we can act a bra on a corresponding ket: \[ \bra{\alpha}(\ket{\beta}) = \braket*{\alpha}{\beta} \] For simplicity, we can drop the parentheses in these products. Given how the dual vector is defined, we then know that this quantity \(\braket*{\alpha}{\beta}\) will be a complex number.

When we want to emphasize that a bra is a complex-valued operator that takes in a ket as an input, we may reintroduce parentheses to accentuate this. Otherwise, we will drop it.

Bras are defined as linear operators, meaning we naturally expect them to satisfy the relevant linearity and homogeneity conditions: \[ \bra{\alpha}(c_1\ket{\psi_1}+c_2\ket{\psi_2}) = c_1 \braket*{\alpha}{\psi_1}+ c_2\braket*{\alpha}{\psi_2} \] Additionally, scalar products and sums work exactly as we would expect: \[ (c\bra{\alpha})(\ket{\psi}) = c\braket*{\alpha}{\psi}, \quad (\bra{\alpha_1} + \bra{\alpha_2})(\ket{\psi}) = \braket*{\alpha_1}{\psi} + \braket*{\alpha_2}{\psi} \]

Remark. If you’re struggling to remember which name refers to which state, think about bras and kets coming in pairs, together forming “brackets” \(\braket{}\). the “bra” comes first and points left, thereby representing the dual, whereas the “ket” comes second and points right, representing the traditional vector.

2.2.3 Connecting Bras and Kets

Even though we established what bras and kets are and the correspondence between the two may seem obvious, that’s not good enough for us mathematically: we need an explicit way to convert between bras and kets. For this, let’s consider some function \(g\) that takes in two kets and spits out a complex number: \[ g: \mathcal{E}\times\mathcal{E}\to\mathbb{C} \] This function is called a metric (or scalar product) on \(\mathcal{E}\). We want this function \(g\) to satisfy a few key properties:

  1. Linearity in second operand: \(g(\ket{\psi},c_1\ket{\phi_1}+c_2\ket{\phi_2}) = c_1g(\ket{\psi},\ket{\phi_1}) + c_2g(\ket{\psi},\ket{\phi_2})\)
  2. Antilinearity in first operand: \(g(c_1\ket{\psi_1}+c_2\ket{\psi_2},\ket{\phi}) = c_1^{*}g(\ket{\psi_1},\ket{\phi}) + c_2^{*}g(\ket{\psi_2},\ket{\phi})\)
  3. Symmetry (or Hermiticity): \(g(\ket{\psi},\ket{\phi}) = g(\ket{\phi},\ket{\psi})^{*}\)
  4. Positive-Definite: \(g(\ket{\psi},\ket{\psi})\geq 0\); equal only when \(\ket{\psi}=0\)

Here, the * refers to the complex conjugate of the number: \[ c = a+bi \implies c^{*} = a-bi \] With that metric established, we are able to properly define the dual correspondence that will convert between kets and bras. We’ll denote this correspondence \(D:\mathcal{E}\to\mathcal{E}^{*}\). This way, when we write a ket \(\ket{\psi}\), we know that the associated bra under this dual correspondence will be denoted \(\bra{\psi}\). Given this metric, we can then explicitly define the bra \(\bra{\psi}\) in terms of how it acts on any random ket \(\ket{\phi}\): \[\bra{\psi}(\ket{\phi}) = g(\ket{\psi},\ket{\phi})\]

We’ll denote the bra we get from a ket’s dual correspondence through a dagger \(\dagger\): \[\bra{\psi} = (\ket{\psi})^{\dagger}, \quad \ket{\psi} = (\bra{\psi})^{\dagger}\] This dagger means then that \(\bra{\psi}\) is the Hermitian conjugate of \(\ket{\psi}\) (and vice-versa).

Given how we defined our scalar product to be antilinear in the first operand, we expect the dual correspondence to also be antilinear: \[(c_1\ket{\psi_1} + c_2\ket{\psi_2})^{\dagger} = c_1^{*}\bra{\psi_1} + c_2^{*}\bra{\psi_2}\] Given how we expressed this correspondence, we naturally expect that repeating the operation necessarily returns us to the original state: \[(\ket{\psi})^{\dagger\dagger} = \ket{\psi}, \quad (\bra{\psi})^{\dagger\dagger} = \bra{\psi}\] With all of this clarified, we can proceed to simplify our notation of bra’s actions on kets: \[\bra{\psi}(\ket{\phi}) = g(\ket{\psi},\ket{\phi}) = \braket*{\psi}{\phi}\] This means we can also rewrite our Hermiticity and positive-definite conditions: \[\braket*{\psi}{\phi} = \braket*{\phi}{\psi}^{*}, \quad \braket*{\psi}{\psi} \geq 0\] This has allowed us to completely rid ourselves of the \(g\)-function notation, meaning we can now deal exclusively with bras and kets. This operation of acting a bra on a ket has another name which may be familiar to those who have studied linear algebra: the inner product.

Remark. You may ask why we wrote a \(*\) over the combined bra-ket product above instead of a dagger. This is because we will typically reserve the dagger notation for Hermitian conjugates of vectors and operators, while the star refers to the complex conjugate of a scalar (number). Since a bra acted on a ket necessarily produces a scalar, its conjugate will also necessarily be a number.

2.2.4 Some Practice: The Schwarz Inequality

Let’s put our newly-developed notation to some use and prove a very important theorem in linear algebra called the Schwarz inequality.

Theorem 2.1 (Schwarz Inequality) \[ \left| \braket*{\psi}{\phi} \right|^2 \leq \braket*{\psi}{\psi} \braket*{\phi}{\phi} \quad \forall \ket{\psi}, \ket{\phi} \] Equality holds iff3 \(\ket{\psi}\) and \(\ket{\phi}\) are linearly dependent (ie, \(\ket{\psi}=c\ket{\phi}\) for some scalar \(c\)).

Proof. Set some new ket \(\ket{\alpha} = \ket{\psi} + \lambda\ket{\phi}\), where \(\lambda\in\mathbb{C}\), and take the inner product with itself: \[ \braket*{\alpha}{\alpha} = (\bra{\psi} + \lambda^{*} \bra{\phi}) (\ket{\psi} + \lambda \ket{\phi}) = \braket*{\psi}{\psi} + \lambda \braket*{\psi}{\phi} + \lambda^{*} \braket*{\phi}{\psi} + |\lambda|^2 \braket*{\phi}{\phi} \geq 0 \] We know by definition that \(\braket*{\alpha}{\alpha} \geq 0\) (property 4 of the metric \(g\)), with equality only when \(\ket{\alpha} = 0\), or equivalently when \(\ket{\psi} = -\lambda \ket{\phi}\), which is precisely our linear dependence condition. We’ll skip the trivial case where \(\ket{\phi} = 0\), and address the more interesting case when \(\ket{\phi} \neq 0\). Since the above inequality holds for all \(\ket{\psi}\), \(\ket{\phi}\) and \(\lambda\), let’s set \(\lambda=-\frac{\braket*{\phi}{\psi}}{\braket*{\phi}{\phi}}\): \[ \begin{multline*} \braket*{\psi}{\psi} - \frac{\braket*{\phi}{\psi}\braket*{\psi}{\phi}}{\braket*{\phi}{\phi}} - \frac{\braket*{\psi}{\phi} \braket*{\phi}{\psi}}{\braket*{\phi}{\phi}} + \frac{\braket*{\psi}{\phi}\braket*{\phi}{\psi}\braket*{\phi}{\phi}}{\braket*{\phi}{\phi}^2} \\= \braket*{\psi}{\psi} - 2 \frac{\braket*{\psi}{\phi} \braket*{\phi}{\psi}}{\braket*{\phi}{\phi}} + \frac{\braket*{\psi}{\phi}\braket*{\phi}{\psi}}{\braket*{\phi}{\phi}} = \braket*{\psi}{\psi} - \frac{\braket*{\psi}{\phi} \braket*{\phi}{\psi}}{\braket*{\phi}{\phi}} = \braket*{\psi}{\psi} - \frac{\left| \braket*{\psi}{\phi} \right|^2}{\braket*{\phi}{\phi}} \geq 0 \end{multline*} \] Moving the second term to the right and multiplying by \(\langle\phi|\phi\rangle\) gives us our original inequality, as desired. \(\blacksquare\)

As we can see, Dirac notation is compact, consistent, and comprehensible for problem-solving.4

Remark. We illustrated this mathematical proof outside of an appendix to better demonstrate the power of Dirac notation, as well as to illustrate an important theorem that we’ll make heavy use of later on (notably with proving the Heisenberg uncertainty relations).

2.2.5 Why Bother with Dirac Notation?

This is all well and good, but why are we bothering with Dirac notation in the first place? After all, many introductory courses in quantum mechanics simply use traditional wavefunction notation and don’t make use of this notation until much later, so why are we bringing it up now?

There are several reasons for why we want to introduce this convention sooner rather than later. Chief among them is the notion of generality. Starting with more special cases like configuration and momentum-space representations and then trying to work backwards to generalizations later can often confuse the matter and make the abstraction seem pointless or needlessly convoluted. Starting with a more general formalism and then deriving special cases later can instead give us better intuition on the generalizations. This also allows us to be more rigorous in our development of the material, as everything will come naturally from this formalism, rather than forcing us to pull certain things out of a hat with minimal explanation as to how we got there.

This does come at a cost, though: it requires us to be much more mathematically technical at the start. We have already delved pretty deep into certain fundamental linear algebra concepts to make any of this make sense. Do not worry; with just a little bit of background, this will make our lives much easier in future discussions.

Soon, we will derive our usual wavefunction representations from Dirac notation, which will make things slightly easier to understand. In the meantime, do not fret; we have done all of this to make our foundation strong.

2.3 Operators and Commutators

Let’s now pivot our attention to operators, which are how we’ll extract meaningful information from our wavefunctions. For our purposes in nonrelativistic quantum mechanics, we will primarily be interested in two types of operators: linear operators and antilinear operators. These are both mappings from a ket space \(\mathcal{E}\) onto itself (i.e. \(A: \mathcal{E} \to \mathcal{E}\)), but they behave slightly differently on linear combinations of kets:

\[ \begin{align*} A(c_1\ket{\psi_1}+c_2\ket{\psi_2}) &= c_1A\ket{\psi_1} + c_2A\ket{\psi_2} &&\text{(linear)}\\ A(c_1\ket{\psi_1}+c_2\ket{\psi_2}) &= c_1^{*}A\ket{\psi_1} + c_2^{*}A\ket{\psi_2} &&\text{(antilinear)} \end{align*} \]

In the spirit of total transparency, we will almost exclusively deal with linear operators, but mention antilinear operators as well because we will analyze one such case much later on.

Remark. The keen among you may recognize this one antilinear operator of interest as the time reversal operator. Antilinear operators will appear more frequently when we transition to relativistic quantum mechanics and quantum field theory, but we won’t do that for a very long time.

Given how operators are defined as a mapping from a ket space onto itself, we know that these operators will take in ket vectors as an input and then spit out some other ket vector as an output.

2.3.1 Commutators

From linear algebra, we know that these operators can be scaled by complex numbers and added to each other to produce new operators on the same ket spaces, meaning they also form their own vector space. What about multiplication? Well, we expect the multiplication to be associative for any combination of linear operators \(A\), \(B\), and \(C\): \[A(BC) = (AB)C\] However, it won’t necessarily be commutative: \[AB \neq BA\] Why not? Let’s illustrate with a real-world example.

Example 2.1 (Taking a Stroll) Imagine you are navigating some space and start out facing east. If you walk forward, then turn left, you’ll end up further east than when you started, facing north. What would happen if you instead turned left first, then walked forward? Turning left when facing east would make you now face north, meaning that you’d end up north of where you started, facing north. Despite only performing two steps, the order clearly matters.

[INSERT TIKZ HERE]

If you think of operators acting on ket vectors as instructions we give someone to perform, we see that swapping the order of the operators is equivalent to giving instructions in a different order, which can produce completely different results: \[\begin{align*} AB\ket{\psi} &= A(B\ket{\psi}) = A(B(\ket{\psi})) = A(\ket{\phi_B}) = \ket{\alpha}\\ BA\ket{\psi} &= B(A\ket{\psi}) = B(A(\ket{\psi})) = B(\ket{\phi_A}) = \ket{\beta} \end{align*}\] Thus, multiplication of operators will not act the same as standard scalar multiplication, meaning we’ll have to be careful when specifying order.

Keeping in mind that operators will not necessarily commute, we need a way to quantify exactly how close (or far) two operators are from commuting with each other. To help us with this, we define the commutator:

Definition 2.1 (Commutator) The commutator of two operators \(A\) and \(B\) is the difference between their ordered products: \[[A,B] = AB - BA\]

Similarly, we define the anticommutator:

Definition 2.2 (Anticommutator) The anticommutator of two operators is the sum of their ordered products: \[\{A,B\} = AB + BA\]

Naturally, the operators will commute when their commutator is \(0\); in that case, we can freely interchange their order. Some operators may have an associated inverse, which we’ll denote \(A^{-1}\). This inverse “undoes” the action of the operator:

\[AA^{-1} = A^{-1}A = 1\]

Above, the “1” denotes the identity matrix, rather than a number. In some instances, we will opt to write \(I\) for the identity, but \(1\) is also acceptable.

Notice the hesitant phrasing of if the inverse exists. While it’s natural to assume that every operation can somehow be undone, this is unfortunately not always the case. Certain conditions must be met for an operator to have a valid inverse, which are addressed in more detail in our Linear Algebra appendix and other materials. Fortunately for us, many key operators will in fact be invertible.

When all available inverses exist, we have the following relation for inverting products of operators:

\[(AB)^{-1} = B^{-1}A^{-1}\]

Remark. Another important consideration we glossed over here is the dimensionality of our vector spaces. As it turns out, finite and infinite-dimensional vector spaces behave differently, meaning there’s extra work to be done when trying to extend properties that work in finite-dimensional vector spaces to infinite-dimensional ones. One example is the notion of inverses. When our vector space is finite-dimensional, the inverse of an operator will be both a left-inverse and a right-inverse, meaning it will undo the operation regardless of whether we act it before or after the operator. This intuitively makes sense, since the inverse of an operation shouldn’t depend on order. However, in infinite-dimensional spaces, it’s entirely possible (for reasons which we’ll not explore) for an operator to possess only a left or right-inverse, but not necessarily both. This, among many other reasons, is why we want to be very careful when making the jump from finite to infinite-dimensional spaces (and relevant operators/kets).

A few basic properties arise purely from the definition of the commutator:

  1. Linearity in first operand: \([c_1A_1 + c_2A_2,B] = c_1[A_1,B] + c_2[A_2,B]\)
  2. Linearity in second operand: \([A,c_1B_1 + c_2B_2] = c_1[A,B_1] + c_2[A,B_2]\)
  3. Antisymmetry: \([A,B] = -[B,A]\)

There are two additional properties the commutator obeys that we want to address separately. First is the Jacobi identity:

Lemma 2.1 (Jacobi Identity) \[[A,[B,C]] + [B,[C,A]] + [C,[A,B]] = 0\]

Proof. Expand out each nested commutator: \[\begin{align*} [A,[B,C]] &= [A,(BC-CB)] = A(BC-CB) - (BC-CB)A = \\ &= ABC - ACB - BCA + CBA\\ [B,[C,A]] &= [B,(CA-AC)] = B(CA-AC) - (CA-AC)B = \\ &= BCA - BAC - CAB + ACB\\ [C,[A,B]] &= [C,(AB-BA)] = C(AB-BA) - (AB-BA)C = \\ &= CAB - CBA - ABC + BAC \end{align*}\] Now, add these expressions together and cancel negative terms: \[\begin{align*} [A,[B,C]] + [B,[C,A]] + [C,[A,B]] &= \cancel{ABC} - \cancel{ACB} - \cancel{BCA} + \cancel{CBA} + \\ &+ \cancel{BCA} - \cancel{BAC} - \cancel{CAB} + \cancel{ACB} + \\ &+ \cancel{CAB} - \cancel{CBA} - \cancel{ABC} + \cancel{BAC} = \boxed{0} &&\checkmark \end{align*}\] As desired. \(\blacksquare\)

One more set of basic properties the commutator satisfies are collectively called the derivation property (or Leibniz rule):

Lemma 2.2 (Leibniz Rule) \[[AB,C] = A[B,C] + [A,C]B, \quad [A,BC] = B[A,C] + [A,B]C\]

Proof. We’ll prove both of these by expanding the left-hand and right-hand sides to show they’re equal, starting with the first: \[\begin{align*} [AB,C] &= (AB)C - C(AB) = ABC - CAB = \\ &= ABC - CAB + (ACB - ACB) = \\ &= ABC - ACB + ACB - CAB = \\ &= A(BC-CB) + (AC-CA)B = A[B,C] + [A,C]B \quad\checkmark \end{align*}\] Here, we cleverly added \(0\) to avoid changing the value of the LHS, while also giving us extra terms to rewrite the RHS. We do the same for the second expression: \[\begin{align*} [A,BC] &= A(BC) - (BC)A = ABC - BCA = \\ &= ABC - BCA + (BAC - BAC) = \\ &= BAC - BCA + ABC - BAC = \\ &= B(AC - CA) + (AB-BA)C = B[A,C] + [A,B]C \quad\checkmark \end{align*}\] As desired. \(\blacksquare\)

We will deal with commutators a lot moving forward, and we will often encounter very long and messy expressions that we would greatly prefer to simplify. Some of these properties will help us achieve that. We will, of course, introduce more commutator relations in the future, but we have to start somewhere.

2.3.2 Operators on Dual Vectors

Given how we have defined operators, we expect them to act on a ket vector and output some new ket vector: \[A\ket{\psi} = \ket{\phi}\] In other words, this operator will perform some kind of operation on the ket vector and modify it. What about bras and dual vectors? What kind of action will an operator perform in that scenario?

We will use the following notation to denote the action of an operator on a bra: \[\bra{\psi}A = ?\] Colloquially, we will say that operators act on kets to the right and bras to the left.

Since operators acting on kets will produce other kets, we should expect operators acting on bras to produce other bras: \[\bra{\psi}A = \bra{\phi}\] Given how we defined operators and how bras and kets are related, it is not inconveivable for us to extend their definition to behave as such.5 Now, given this, we can expect to envision an operator sandwiched between a bra and a ket as either a resultant bra acting on ket or vice versa: \[(\bra{\psi}A)\ket{\phi} = \bra{\psi}(A\ket{\phi})\] This means we can drop the parentheses and treat this as one unit: \[ (\bra{\psi}A)(\ket{\phi}) = \mel{\psi}{A}{\phi} \] We call this combined expression the matrix element.

Remark. In the context of linear algebra and linear maps, the term “matrix element” begins to make sense. Since a matrix is fundamentally a representation of a linear transformation’s action on a given vector input, sandwiching an operator between a bra and a ket can be interpreted as an extraction of the relevant matrix element of that operator.

2.3.3 Outer Products

We previously defined the inner product as a bra acting on a ket from the left, which would produce a scalar by the definitions of bras and kets, respectively. What if we instead have a bra acting on a ket from the right? Let’s act this new thing on a ket and see what we should expect: \[(\ket{\alpha}\bra{\beta})\ket{\psi} = \ket{\alpha}(\braket*{\beta}{\psi}) = \ket{\alpha}c_1 = c_1\ket{\alpha} = \ket{\gamma_1}\] Thus, this new combination, which we’ll call the outer product, should produce a resultant ket when acted on a ket. Similarly, when acted on a bra, we should expect a resultant bra: \[\bra{\psi}(\ket{\alpha}\bra{\beta}) = \braket*{\psi}{\alpha}\bra{\beta} = c_2\bra{\beta} = \bra{\gamma_2}\] Thus, this outer product must itself be a special type of linear operator. Note that the bra action is not the direct definition of the outer product; it is simply a consequence of the ket action, which itself is the proper definition.

2.3.4 Bases

Now, if we have a vector space, we should have a way to build this space out of a few fundamental vectors. A set of linearly independent vectors that can be used in some combination to produce any arbitrary vector in the space is called a basis for the vector space. We then use the length of this basis set to define the dimension of the vector space.

Remark. This is another key place where infinite-dimensional vector spaces behave differently from their finite-dimensional counterparts. We should expect such a space to also possess a basis, since we need to be able to construct it using some combination of vectors, but how many vectors should we have? We know from discrete mathematics that there are different kinds of infinities, namely countable and uncountable ones (put simply, countable infinities are ones where we can enumerate each element, and uncountable ones are those where it’s impossible to do so). Fortunately for us, one of the properties of a Hilbert space is that it possesses a countably infinite basis, which is good for our purposes and the end of our discussion of the matter.

We will often require to decompose a vector space into an appropriate basis, or use a basis to construct a relevant vector space. Since the only requirements for a set to become a basis are for it to be the appropriate length, have each vector be linearly independent, and have the vectors span the entire space, it is not inconceivable to have multiple different bases generating a single vector space.

Let’s label some discrete basis of some ket space with numbered kets: \(\{\ket{n}, n=1,2,\dots\}\). If all vectors in the basis are orthogonal (i.e. \(\braket*{n}{m} = 0\) when \(n \neq m\)), we call the basis an orthogonal basis. If, furthermore, all of the vectors of an orthogonal basis are normalized (each has magnitude \(1\)), we call the basis orthonormal. To properly notate, let’s introduce a new function which we’ll see a lot later on called the Kronecker delta6:

Definition 2.3 (Kronecker Delta) The Kronecker delta is defined like so: \[\delta_{mn} = \begin{cases} 1 & m=n \\ 0 & m \neq n \end{cases}\]

Thus, we can rewrite the orthonormality condition for our basis into the following really compact form: \[\braket*{n}{m} = \delta_{mn}\]

Remark. We will use index notation like the Kronecker delta above to simplify notation often throughout quantum mechanics (and in some instances in other fields of physics as well). Index notation will be clarified a bit more in some appendices, and will come more in handy when we start dealing with tensors, which will come a little later.

Given any basis, we can represent any arbitrary ket vector \(\ket{\psi}\) in that space as a linear combination of some these basis vectors: \[\ket{\psi} = \sum_{n}c_n\ket{n} = \sum_{n}\ket{n}c_n\] These expansion coefficients \(c_n\) are given by the following inner product if the basis is orthonormal: \[ c_n = \braket*{n}{\psi} \] This is pretty straightforward to justify given the definition of orthonormal vectors: \[ \braket*{n}{\psi} = \bra{n}\left( \sum_m \ket{m} c_m \right) = \delta_{nm} c_m = 0 + 0 + \dots + 1 * c_n = c_n \] This means we can rewrite our expression as: \[ \ket{\psi} = \sum_n \bra{n}\braket*{n}{\psi} = \left( \sum_n \bra{n}\ket{n} \right) \ket{\psi} \] Since \(\ket{\psi}\) is an arbitrarily-chosen ket vector, we can remove it from our expression to get a resolution of the identity: \[\boxed{1 = \sum_{n}\ket{n}\bra{n}}\]

2.3.5 A Special Case: Hermitian and Unitary Operators

We will see many kinds of operators, but in quantum mechanics, we are particularly interested in two special types of operators.

Before we discuss them, we need to extend the notion of Hermitian conjugation from kets to operators. Given some operator \(A: \mathcal{E}\to\mathcal{E}\), we expect the Hermitian conjugate \(A^{\dagger}: \mathcal{E}\to\mathcal{E}\) to be another linear operator with the following action on an arbitrary ket: \[A^{\dagger}\ket{\psi} = (\bra{\psi}A)^{\dagger}\] We have a couple of immediate consequences that result directly from this definition, which we compile here:

Corollary 2.1 (Properties of the Hermitian Conjugate) The following follow from the definition of the Hermitian conjugate of an operator:

  1. \(\mel{\phi}{A^{\dagger}}{\psi} = \mel{\psi}{A}{\phi}^{*}\)
  2. \((A^{\dagger})^{\dagger} = A\)
  3. \((c_1 A_1 + c_2 A_2)^{\dagger} = c_1^{*}A_1^{\dagger} + c_2^{*}A_2^{\dagger}\)
  4. \((AB)^{\dagger} = B^{\dagger}A^{\dagger}\)
  5. \((\ket{\alpha}\bra{\beta})^{\dagger} = \ket{\beta}\bra{\alpha}\)

Proof. The proofs for each are quite straightforward, but let’s illustrate them here. Starting with the first: \[ \mel{\phi}{A^{\dagger}}{\psi} = \bra{\phi} (A^{\dagger} \ket{\psi}) = \bra{\phi} (\bra{\psi}A)^{\dagger} = (\bra{\psi} A)^{\dagger} (\phi)^{\dagger} = \mel{\psi}{A}{\phi}^{*} \] Now, for the second, we can apply the above property twice: \[ \begin{align*} \mel{\psi}{(A^{\dagger})^{\dagger}}{\phi} &= \mel{\phi}{A^{\dagger}}{\psi}^{*} = (\mel{\psi}{A}{\phi}^{*})^{*} = \mel{\psi}{A}{\phi}\\ \therefore (A^{\dagger})^{\dagger} = A \quad \checkmark \end{align*} \] Next: \[\begin{align*} (c_1A_1 + c_2A_2)^{\dagger}\ket{\psi} &= (\bra{\psi}(c_1A_1 + c_2A_2))^{\dagger} = ((\bra{\psi}c_1A_1) + (\bra{\psi}c_2A_2))^{\dagger} = \\ &= (\bra{\psi}c_1A_1)^{\dagger} + (\bra{\psi}c_2A_2)^{\dagger} = A_1^{\dagger}(c_1^{*}\ket{\psi}) + A_2^{\dagger}(c_2^{*}\ket{\psi}) = \\ &= c_1^{*}A_1^{\dagger}\ket{\psi} + c_2^{*}A_2^{\dagger}\ket{\psi} = (c_1^{*}A_1^{\dagger}+c_2^{*}A_2^{\dagger})\ket{\psi}\\ \therefore (c_1A_1+c_2A_2)^{\dagger} &= c_1^{*}A_1^{\dagger}+c_2^{*}A_2^{\dagger}\quad\checkmark \end{align*}\] For the fourth, we use what we proved in the first: \[ \begin{align*} \mel{\phi}{(AB)^{\dagger}}{\psi} &= \mel{\psi}{AB}{\phi}^{*} = ((\bra{\psi}A)(B \ket{\phi}))^{*} = ((\bra{\psi}A)^{\dagger}(B \ket{\phi})^{\dagger}) \\ &= (\bra{\phi}B^{\dagger}) (A^{\dagger} \ket{\psi}) = \mel{\phi}{B^{\dagger}A^{\dagger}}{\psi} \\ \therefore (AB)^{\dagger}&= B^{\dagger}A^{\dagger} \end{align*} \] Finally, for the last one, we directly invoke the previous part (as well as dual correspondence between bras and kets): \[(\ket{\alpha}\bra{\beta})^{\dagger} = (\bra{\beta})^{\dagger}(\ket{\alpha})^{\dagger} = \ket{\beta}\bra{\alpha}\quad\checkmark\] We have shown all properties are true, as desired. \(\blacksquare\)

Now, we have properly defined the action of Hermitian conjugation on bras, kets, and operators. As a side note, it’s helpful to describe the action of Hermitian conjugation on scalars (numbers) as simply complex conjugation: \[c^{\dagger} = c^{*}\] With this in mind, we can combine all of these together and write that the Hermitian conjugate simply reverses the order of the quantities and swaps them out for their Hermitian conjugate: \[\boxed{(c_1A_1\ket{\psi_1} + \bra{\psi_2}c_2A_2)^{\dagger} = \bra{\psi_1}A_1^{\dagger}c_1^{*} + c_2^{*}A_2^{\dagger}\ket{\psi_2}}\] We are now finally ready to discuss a special class of operators that we will see a lot of in quantum mechanics. We start with Hermitian operators:

Definition 2.4 (Hermitian Operator) An operator \(A\) is Hermitian if it is equal to its own Hermitian conjugate: \[A^{\dagger} = A\]

Using the first auxiliary property we proved that followed from the definition, we can equivalently write this condition in terms of matrix elements: \[A\text{ Hermitian} \implies \mel{\psi}{A}{\phi} = \mel{\phi}{A}{\psi}^{*} \quad \forall \ket{\psi},\ket{\phi}\] Similarly, we have anti-Hermitian operators:

Definition 2.5 (Anti-Hermitian Operator) An operator \(A\) is anti-Hermitian if it is the additive inverse of its Hermitian conjugate: \[A^{\dagger} =-A\]

An operator being Hermitian or anti-Hermitian is a special case; in other words, if an operator isn’t Hermitian, it’s not necessarily guaranteed to be anti-Hermitian. In general, operators will be neither. That being said, it is always possible to uniquely decompose any operator \(A\) into a sum of a Hermitian and an anti-Hermitian operator.

Lemma 2.3 (Hermitian Decomposition of Operators) Any operator \(A\) can be uniquely decomposed into the sum of a Hermitian and anti-Hermitian operator: \[A = \frac{A + A^{\dagger}}{2} + \frac{A - A^{\dagger}}{2}\]

Proof. The decomposition itself is trivial, so there’s no need to prove it. The more interesting part to prove is that these newly-generated operators are Hermitian and anti-Hermitian. We prove by taking their respective Hermitian conjugates: \[\begin{align*} \left(\frac{A+A^{\dagger}}{2}\right)^{\dagger} &= \frac{A^{\dagger}+A^{\dagger\dagger}}{2} = \frac{A^{\dagger}+A}{2} = \frac{A+A^{\dagger}}{2} &&\text{(Hermitian)}\\ \left(\frac{A+A^{\dagger}}{2}\right)^{\dagger} &= \frac{A^{\dagger}-A^{\dagger\dagger}}{2} = \frac{A^{\dagger}-A}{2} = -\left(\frac{A-A^{\dagger}}{2}\right) &&\text{(anti-Hermitian)} \end{align*}\] Thus, the first term in the decomposition is a Hermitian operator, and the second is anti-Hermitian. Since the decomposition is possible to write for any operator \(A\), we can always uniquely decompose into a sum of Hermitian and anti-Hermitian operators, as desired. \(\blacksquare\)

We will call a Hermitian operator \(A\) positive-definite if the following is true for all kets \(\ket{\psi}\): \[ \mel{\psi}{A}{\psi} > 0 \] If we loosen the restriction and allow for this element to be \(0\) by swapping for \(\geq\) in the inequality, we call that type of Hermitian operator positive semidefinite.

One more important theorem on Hermitian operators:

Theorem 2.2 (Product of Hermitian Operators) The product of two Hermitian operators \(A\), \(B\) is Hermitian if and only if \(A\) and \(B\) commute.

Proof. The proof is quite simple, but because we have a biconditional, we must prove both directions.

(\(\implies\)) Starting with the forward direction, we will prove that if the product of two Hermitian operators is Hermitian, then they must commute. If the product of \(A\) and \(B\) is Hermitian, then we have: \[(AB)^{\dagger} = B^{\dagger}A^{\dagger} = (B)(A) = AB\] This last equality implies that they commute: \[[A,B] = AB - BA = 0 \quad\checkmark\] (\(\impliedby\)) Now, we prove the reverse direction, showing that if two Hermitian operators commute, their product must be Hermitian. If \(A\) and \(B\) commute, then we have: \[[A,B] = AB - BA = 0 \implies AB = BA\] Taking the Hermitian conjugate of \(AB\) shows us that the product is Hermitian: \[(AB)^{\dagger} = B^{\dagger}A^{\dagger} = (B)(A) = BA = AB \quad\checkmark\] Having shown both directions, we are done. \(\blacksquare\)

Now, we move on to unitary operators.

Definition 2.6 (Unitary Operator) An operator \(U\) is unitary if its Hermitian conjugate is equal to its inverse: \[UU^{\dagger} = U^{\dagger}U = 1 \implies U^{-1} = U^{\dagger}\]

This definition has one immediate (and important) consequence:

Corollary 2.2 (Product of Unitary Operators) The product of two unitary operators \(U_1\), \(U_2\) will always be unitary.

Proof. The proof is quite simple. Let’s throw their product into the definition: \[\begin{align*} (U_1U_2)(U_1U_2)^{\dagger} &= U_1U_2(U_2^{\dagger}U_1^{\dagger}) = U_1U_2U_2^{\dagger}U_1^{\dagger} = U_1U_2U_2^{-1}U_1^{-1} = \\ &= U_1(U_2U_2^{-1})U_1^{-1} = U_1(1)U_1^{-1} = U_1U_1^{-1} = 1\\ (U_1U_2)^{\dagger}(U_1U_2) &= (U_2^{\dagger}U_1^{\dagger})U_1U_2 = U_2^{\dagger}U_1^{\dagger}U_1U_2 = U_2^{-1}U_1^{-1}U_1U_2 = \\ &= U_2^{-1}(U_1^{-1}U_1)U_2 = U_2^{-1}(1)U_2 = U_2^{-1}U_2 = 1 \end{align*}\] Thus, we have shown that \[(U_1U_2)(U_1U_2)^{\dagger} = (U_1U_2)^{\dagger}(U_1U_2) = 1,\] showing that the product of two unitary operators is unitary, as desired. \(\blacksquare\)

2.3.6 Eigenvalues and Eigenkets

Operators will act on kets to produce other kets (and act on bras to produce other bras). In many instances, this new resultant ket will be completely new and not entirely related to the original input ket. Sometimes, however, acting an operator on a certain ket will simply scale the original ket by some factor: \[A\ket{u} = a\ket{u}\] Here, the ket vector \(\ket{u}\) is called an eigenket of \(A\) (or, equivalently in linear algebra, an eigenvector of the matrix represented by \(A\)), and the complex number \(a\) is the associated (right) eigenvalue of \(A\)7.

We also expect that there may exist a special bra that will only be scaled under the operator: \[\bra{\nu}A = b\bra{\nu}\] In this case, we call the bra \(\bra{\nu}\) an eigenbra of \(A\) and \(b\) a (left) eigenvalue.

Remark. Notice that we labeled eigenvalues associated with eigenkets and eigenbras with the (right) and (left) prefixes, respectively. We do this, because we want to be as explicit as possible with whether the scaling factor is associated with the operator acting to the right on a ket or to the left on a bra. We will typically drop these prefixes when dealing with problems, because in finite dimensions, every right-eigenvalue will also be a left-eigenvalue (and vice versa), but this need not be the case in infinite dimensions.

Operators will typically have a bunch of associated eigenvalues. We call the set of all of these eigenvalues the spectrum of the operator. These spectra may be discrete (separate, individual points), continuous (an infinite line), or mixed (some combination of discrete and continuous). We will see examples of each of these as we develop more quantum mechanics.

Generally, there won’t really be a simple way to relate different eigenkets and eigenbras for a given operator; we will simply solve for them and make use of them. However, when we’re dealing with Hermitian operators specifically, we will get a few simplifications that actually will allow us to relate them.

We start with the first meaningful simplification:

Lemma 2.4 (Eigenvalues of Hermitian Operators) Hermitian operators possess real eigenvalues.

Proof. Let’s begin with our Hermitian operator \(A\) and one of its eigenkets \(\ket{u}\) and associated eigenvalue \(a\): \[A\ket{u} = a\ket{u}\] If we multiply both sides by \(\bra{u}\), we get: \[ \mel{u}{A}{u} = \mel{u}{a}{u} = a \braket*{u}{u} \] Let’s now also take the complex conjugate of this: \[ \mel{u}{A}{u}^{*} = \mel{u}{A}{u} = \left( a \braket*{u}{u} \right)^{*} = a^{*}\braket*{u}{u} \] Subtracting these two from each other gives: \[ (a - a^{*}) \braket*{u}{u} = 0 \] Since we know that \(\braket*{u}{u}\) is positive-definite and can only be \(0\) when \(\ket{u}=0\) (which is not a valid eigenket), this means that \(a-a^{*}=0\) for this to be true, meaning \(a^{*}=a\). If a number equals its own complex conjugate, its imaginary part must be \(0\), implying the number is real. Since we took an arbitrary eigenket/eigenvalue pair for an arbitrary Hermitian operator, this means that all Hermitian operators possess exclusively real eigenvalues, as desired. \(\blacksquare\)

Next, we have a way to relate eigenbras and eigenkets:

Lemma 2.5 (Equivalence of Left and Right-Eigenvalues) If \(A\ket{u}=a\ket{u}\) yields eigenket \(\ket{u}\) and right-eigenvalue \(a\), then \(\bra{u}A = a\bra{u}\) yields eigenbra \(\bra{u}\) and equal left-eigenvalue \(a\).

Proof. Starting with our original eigenvalue equation, we can follow the same first step as our previous simplication: \[ A\ket{u} = a \ket{u} \implies \mel{u}{A}{u} = a \braket*{u}{u} \] Now, dividing out \(\ket{u}\) from both sides yields: \[\bra{u}A = a\bra{u}\] as desired. \(\blacksquare\)

Our final simplification is more significant:

Theorem 2.3 (Eigenspaces of Hermitian Operators) The eigenspaces of a Hermitian operator corresponding to distinct eigenvalues are orthogonal.

Proof. Let’s start with two distinct eigenvalue equations for our operator \(A\): \[A\ket{u} = a\ket{u},\quad A\ket{u'} = a'\ket{u'}\] Multiplying the first equation on both sides by \(\bra{u'}\) and the second by \(\bra{u}\) gives: \[ \mel{u'}{A}{u} = a \braket*{u'}{u}, \quad \mel{u}{A}{u'} = a' \braket*{u}{u'} \] If we now subtract the Hermitian conjugate of the second equation from the first, we get: \[ \mel{u'}{A}{u} - \mel{u'}{A}{u} = a \braket*{u'}{u} - a' \braket*{u'}{u} = (a' - a) \braket*{u'}{u} = 0 \] This therefore means that either \(a'-a=0\) or \(\braket*{u'}{u} = 0\). If the first is true, then we have \(a'=a\), meaning we’re dealing with two eigenkets corresponding to the same eigenvalue (in which case we’re in one single eigenspace), or else \(\braket*{u'}{u} = 0\), which means that eigenkets corresponding to distinct eigenvalues (otherwise belonging to different eigenspaces) are orthogonal, as desired. \(\blacksquare\)

2.3.7 Observables

Will the eigenspaces of an operator always “fill up” the space upon which the operator acts? In other words, can we exclusively describe every ket vector in terms of some eigenkets? Generally, no, but there will be some special cases. An operator whose eigenspaces will “fill up” the space on which the operator acts is called complete. This is another instance where we can make tremendous simplifications with Hermitian operators. As it turns out, in finite dimensions, every Hermitian operator is complete. In other words, the sum of the dimensions of the eigenspaces equals the diemsnion of the entire space on which the operator acts.

Now, extending to infinite dimensions, we unfortunately cannot make the same generalization about Hermitian operators. However, we will be much more interested in those Hermitian operators that will be complete in infinite dimensions. We will call such complete Hermitian operators observables8.

2.3.8 Projection Operators

We introduce one final operator which we’ll need in the next section, called a projector.

Definition 2.7 (Projection Operator) A projection operator (or projector) is an observable \(P\) that is idempotent: \[P^2 = P\]

It’s immediately clear from this definition what eigenvalues projection operators possess: \[P\ket{p} = p\ket{p} \implies P^2\ket{p} = P\ket{p} = p^2\ket{p} = p\ket{p} \implies p^2 = p \implies \boxed{p =0,1}\] This means that a projection operator will separate a given Hilbert space \(\mathcal{E}\) into two orthogonal subspaces, where every ket vector in one space will be annihilated (associated with the eigenvalue \(0\)), while every ket vector in the other will be unaffected (associated with eigenvalue \(1\)). The space associated with the eigenvalue \(1\) is the space upon which our \(P\) projects.

Projection operators will be meaningful when it comes to measurements, which are crucial to our study of quantum mechanics, which is why we wanted to mention them here.

2.4 Relevant Mathematical Frameworks

You may have noticed that, throughout the introduction of the relevant formalism, we made references to certain mathematical fields and did not elaborate too much on the framework itself. Physics is fundamentally built upon a mathematical foundation, with different fields/subdisciplines each using their own mathematical frameworks. We address the relevant fundamental frameworks key to quantum mechanics below, with more information on exactly what detail we plan on going into for each, as well as where you can find more information to become better acquainted with the mathematics, should you find that useful for the physics.

2.4.1 Linear Algebra

The entirety of quantum mechanics is fundamentally built on a linear algebra framework, with operators transforming vectors in particular ways to give us meaningful information about a system. As such, a solid foundational understanding of linear algebra is quite helpful if one wants to develop a solid (and intuitive) understanding of quantum mechanics. True proficiency in linear algebra requires a semester-long undergraduate course at the bare minimum, and we do not have the bandwidth to produce an entire linear algebra textbook within a quantum mechanics book, and we do not wish to try. As such, to help with some basic ideas, we provide a very succinct development of some of the key points relevant to our study in a Linear Algebra appendix, and recommend the reader peruse an external textbook (such as the previously-mentioned Sheldon Axler’s Linear Algebra Done Right) for a more in-depth treatment.

2.4.2 Fourier Analysis

As we will soon discover for ourselves (and as the reader may already be somewhat familiar with), quantum mechanics at its core deals with a wavefunction to describe the behavior of a system. In fact, the Schrodinger equation is itself derived from the wave equation, meaning we will come to care very deeply about periodic functions. A key insight made two centuries ago by Joseph Fourier was that any continous function can be decomposed into a series of periodic functions. This is accomplished through the Fourier transform that we briefly mentioned at the beginning. A proper treatment of Fourier analysis (and all of its powerful implications) also requires a semester-long undergraduate course at the bare minimum, which we (unsurprisingly) do not have time for, so we won’t attempt to develop the full theory either. That being said, there are only a few insights that are immediately relevant to us, which we will detail in additional, shorter appendix.

2.4.3 Vector/Tensor Analysis

As we already saw throughout the development of the formalism, our fundamental unit is a vector on which we will act some operator to yield another vector. Outer products and special chained products of operators (called tensor products) will also be of great use to us when we eventually wish to couple ket spaces9. Tensors are an even more abstract object than the vectors and matrices we’ve already been dealing with, which will require us to develop new notation to navigate it.

Remark. We’ve already seen glimpses of this index notation in the Kronecker delta!

Since (as you can probably guess) we don’t have the time/need to thoroughly develop tensors, we offer a brief development of index notation, how to navigate tensors, and what they are, in yet another appendix. One may also wish to consult (INSERT GOOD TEXTBOOK I DON’T KNOW OF ANY ONES OFF THE TOP OF MY HEAD).

2.4.4 Complex Analysis

The reader may be familiar with some aspects of calculus (generalized formally to a field known as real analysis), but what happens when we start dealing with complex-valued functions? Can we perform calculus with them? As it turns out, we can, but the way we do it is not what we may be used to from real analysis. The study of how to deal with such complex functions, known as complex analysis, helps us understand how it works. We won’t need to worry about this until much later onward, but when we do inevitably require some tools, we will develop them as necessary. We will add another short appendix to help expand on some of the underlying mathematics, but we strongly encourage the reader to consult a book such as Stein and Shakarchi’s Notes on Complex Analysis for a deeper understanding.

2.4.5 Final Note

As mentioned at the very beginning of this exceedingly long chapter, there was a lot of math involved. We have made our best attempt to balance outlining the motivation to study certain mathematical concepts without being too technical, but someone who may be unfamiliar with linear algebra may still have struggled to keep up. We understand; we threw a lot at you and promised you that this setup will all have been worth it in the end. To an extent, we hope you will simply take our word for it; that being said, it is completely natural for some of this to not make sense upon first examination. As such, we cannot encourage independent research more. It is the responsibility of any aspiring physicist (or scientist in general) to close any knowledge gaps they feel they may have, and it’s unrealistic to expect any singular source to be able to do that for everything there is. Thus, if there is anything you feel we do not cover in sufficient detail (which there has to be by the nature of the content we want to address and the manner in which we wish to present it), we encourage a consultation of the materials listed in our references. You can always come back to these notes after you feel more comfortable with the math; we’ll wait. That being said, we hope we can sufficiently motivate the material that the mathematics won’t get in the way of a more fundamentally intuitive understanding of what we are developing.

This concludes our development of the fundamental formalism, but not our development of mathematical tools. That said, we will not require nearly as much math development in as dense a space moving forward. Let’s finally get into quantum mechanics…


  1. We highly encourage you to give this book a read! It covers all the formalism with a focus on rigor (we will also try to be rigorous, but only to the degree that it serves to make our point), and while it is not an explicitly physics-centric text, is a foundational text that will be expected of you in higher level theoretical physics.↩︎

  2. A vector space equipped with a properly-defined inner product is called an inner product space. For a Hilbert space specifically, we enforce extra conditions on this inner product, but these won’t concern us for a while.↩︎

  3. This is a notational shorthand that means “if and only if”.↩︎

  4. You may think initially that this proof isn’t very comprehensible, but think about the alternative: if we didn’t use Dirac notation, we’d have to write every inner product as an integral, every \(\psi\) and \(\phi\) as \(\psi(x)\) and \(\phi(x)\), which would make the expression at least twice as long, and much more intimidating.↩︎

  5. Technically, one must distinguish between the operator that sends kets to kets from the dual operator \(A'\) that sends bras to bras. However, here we can extend the definition of an operator to act on bras as \(\bra{\psi}A = \bra{\phi}\) when \(A \ket{\psi} = \ket{\phi}\), without clashing with our original definition of \(A\), and as such we can use the same symbol \(A\) for the map and its dual.↩︎

  6. This is technically a function, but not really the type of function we are used to dealing with. It is more helpful to think of it as a notational convention.↩︎

  7. The terms “eigenvalue” and “eigenvector” are built from the German eigen, which means “self”.↩︎

  8. The reasoning behind this nomenclature will become apparent in the next chapter, when we start developing physical interpretations for all of this while setting up the postulates underpinning all of quantum theory.↩︎

  9. Do not worry about what this means at the moment; we will learn about it much later.↩︎