If you’re a math educator of any “seasoning”, you know of the numerous misconceptions that exist when working with mathematical objects. Fractions are a disaster and I can write probably a book on that alone. In fact, this book that I’ve been writing for, oh now, several years, has a decent portion dedicated just to fraction misery. But then there’s Algebra and its misery.

I will admit that when I am teaching a Calculus course or a Probability / Statistics course or basically any course that’s *not* an Algebra course, I desperately just want to take Algebra has a given. That students by now know enough Algebra to properly take this next course they are taking. But, the reality is, students have holes in their knowledge. And that’s ok. I have holes in my knowledge. And so do you! But we patch holes over time. Sometimes, and to extend this quilting analogy, the patchwork is noticeable because there is a definite break in mental continuity and fluidity in topics and other times one would never know that the certain knowledge came temporally out of context.

Thus, I believe that as educators we have to be mindful of the fact that as much as we want to just point to course prerequisites and not be part of the patching process, we really ought to think about our students’ knowledge continuity as we teach our specific category of courses. It is an extra burden for us, but when that load is shared by others I think we’d find a math-healthier student body in our courses. For example, reinforcing fractions and arithmetic in Algebra means that by the time they get to a Probability course, we can actually work on probability questions rather than fraction review — I spend a non-negligible amount of time reviewing fractions when I teach probability, then I work on Algebra, and before you know it I’m not really dedicating much time to Probability. On the flip side, if I just wanted to teach Probability and not address the prerequisite deficiencies then who am I teaching to and who is learning? Maybe 5 students out of 20?

This brings me to a specific example in an introductory Probability course. I want to use this example to speak to a larger narrative about our (math educators) role in addressing student knowledge gaps.

This example is about defining variance mathematically. There are ostensibly two equations for variance [not computational formulae of which there are a few more]. Namely, these are

$$\mbox{Var}[X] = \mathbb{E}[X^{2}] – \mathbb{E}^{2}[X]$$

and

$$\mbox{Var}[X] = \mathbb{E}[(X – \mathbb{E}[X])^{2}]$$

Additionally, and of course, there is an algebraic link between these.

Before we talk about the algebra. Let’s look at the first formula. Even if these are college juniors and seniors taking this course, \(\mathbb{E}^{2}[X]\) is going to throw them off. So, the first thing I like to do is explain the notation.

$$\mathbb{E}^{2}[X] = \mathbb{E}[X]\cdot \mathbb{E}[X]$$

Learning mathematics is as much about the concepts as it is about working with the mechanics and reading / parsing the symbolism. I believe that as math educators, we’re also part language instructors or at least we should act that way.

Reading, writing, speaking … the three fundamental aspects of engaging in a language. These three things are wrapped around in “grammar”. Mathematics has its own grammar. And its written grammar can be *very* confusing.

Notice also my use of the “blackboard E”: \(\mathbb{E}\), which is a not-so-uncommon typesetting used in math texts. That’s another hurdle. Why isn’t it just \(E\) or \(\mbox{E}\) or \(\mathbf{E}\)? It depends on the textbook and the author’s preferred style! I remember I saw three of these notational styles in one course!

As you continue to read through this, start keeping in mind all these little distractions that are piling up in your students’ minds. They are not distractions for us. We’re used to it and have, in effect, filtered them out. So far, the distractions are: \(\mathbb{E}\) and \(\mathbb{E}^{2}[X]\).

Next nuisance is the inconsistency in the use of brackets or parentheses. Some texts use \(\mathbb{E}[X]\) while others opt for \(E(X)\), but I don’t think I have seen \(\mathbb{E}(X)\). Both of these mean “expectation of \(X\)”. Recall that up until this point in a student’s math career, parentheses doubled as grouping symbols in Algebra and arithmetic and then later as part of the notation for functions — \(f(x)\) read as “\(f\) of \(x\)”. It’s important to explain here what “expectation” is as a mathematical object. Is \(\mathbb{E}\) an operator? Is it a value that multiplies with \(X\)? What’s the difference between \(\mathbb{E}\) and \(\mathbb{E}[X]\)?

We can tie a bunch of this stuff back to their knowledge and understanding (hopefully) of \(f\) vs \(f(x)\). And if it turns out that this is confusing point or a knowledge gap, then great! It’s an opportunity to move forward with the course content while simultaneously closing a knowledge gap. For me, it was when I understood (finally) that one way to think of \(f(x)\) was “\(f\) applied at \(x\)” that so much of mathematics became more clear.

All of this matters deeply, because of some properties that we show about expectation. Namely things like

$$\mathbb{E}[aX + b] = a\mathbb{E}[X] + b \mbox{ where }a, b \mbox{ are constants}$$

$$\mathbb{E}[XY] = \mathbb{E}[X]\cdot \mathbb{E}[Y] \mbox{ when } X, Y \mbox{ are independent}$$

and other useful properties of “expectation as a *linear operator*“.

So, we still haven’t gotten to variance. We’ve got a few more things to address. Arithmetic mean, aka, “average” is a gateway to introducing expectation. But this, while intended to ease the student into a new concept has a side effect of locking a student into only one understanding of expectation — that it is a mathematical formality for the grade school calculation, average. Yes and no.

The discrete version of expectation is basically “weighted average”. The continuous version involves an integral. If \(f\) is the probability density of \(X\) and we have some function \(g\) that we apply to \(X\), then $$\mathbb{E}[X] = \int_{-\infty}^{\infty}g(x)f(x)\ dx$$ The observant student may see how this continuous version is similar to its discrete counterpart $$\mathbb{E}[X] = \sum_{x \in \mathcal{X}}g(x)f(x)$$

and … oops, what’s the difference between \(x\), \(X\), and \(\mathcal{X}\)? It’s not just a font change. Slow careful explanations of these things are vital. \(X\) is the random variable. \(\mathcal{X}\) is the support of \(X\). \(x\) is an element chosen from the support of \(X\). We can draw a collection of \(x\)s from a distribution, but \(x \in X\) is incorrect because \(X\) is not a set, even though lazily and abusing notation we may know what we “mean” (unintended expectation pun) by \(x \in X\).

The mental burdens are piling up.

Ok, finally let’s talk about this algebraic link. We have, by definition,

$$\mbox{Var}(X) = \mathbb{E}[X^{2}] – \mathbb{E}^{2}[X] = \mathbb{E}[(X – \mathbb{E}[X])^{2}]$$

Let’s focus on the right-hand side of \(\mbox{Var}[X]\).

$$\mathbb{E}[X^{2}] – \mathbb{E}^{2}[X] = \mathbb{E}[(X – \mathbb{E}[X])^{2}]$$

If you squint a bit like your students might, this looks like the thing we kept harping on in Algebra class as a no-no …

$$(a – b)^{2} = a^{2} – b^{2} \mbox{ NO! Don’t do this!!}$$

It is useful and necessary to go through the algebraic steps that allow us to have

$$\mbox{Var}(X) = \mathbb{E}[X^{2}] – \mathbb{E}^{2}[X] = \mathbb{E}[(X – \mathbb{E}[X])^{2}]$$

I know that our textbooks will go through this. And given how tightly cramped we are for time in the classroom, we might want to leave that as “reading” material. But, for me, this is a perfectly wonderful example of addressing math misery while still moving forward with the subject matter.

Starting with

$$\mathbb{E}[(X – \mathbb{E}[X])^{2}]$$

expand the inner expression to

$$\mathbb{E}[X^{2} -2X\mathbb{E}[X] + \mathbb{E}^{2}[X]]$$

This expansion step is a great chance to show how Algebra concepts are being used. Most students are just used to seeing \(a,b,c,x,y,z\) as variables. But, in general, and I think we should, by this point (or perhaps earlier), start using the notion of “object”. Drawing the parallel between \((X – \mathbb{E}[X])^{2}\) and \((a-b)^{2}\) either reinforces or introduces for the first time to some students that they can replace symbols with symbols!

Next, reinforce the “linear operator” properties to obtain

$$\mathbb{E}[X^{2}] -\mathbb{E}[2X\mathbb{E}[X]] + \mathbb{E}[\mathbb{E}^{2}[X]]$$

This is less about Algebra topics being reintroduced and more about “let’s think about the object we’re working with and what properties are endowed with that operator”.

At this point, or maybe earlier, you might have said “but wait, I don’t carry around all these \(\mathbb{E}[X]\) expressions, I replace it with \(\mu\).”. No objections. I prefer to wait until this step to remind students about what the heck \(\mathbb{E}[X]\) vs \(\mathbb{E}\).

My rationale for this is I want to see if I will have a rebellion here:

$$\mathbb{E}[2X\mathbb{E}[X]] = 2\cdot\mathbb{E}[X]\cdot\mathbb{E}[\mathbb{E}[X]]$$

utilizing independence of \(X\) and \(\mathbb{E}[X]\). What rebellion? I want to see if anyone objects to \(X\) and \(\mathbb{E}[X]\) being independent. So far, no one has objected and it’s not because they know that these two objects are independent of each other, but more so that they have accepted on faith that this must true. Case in point is when I ask “Why are \(X\) and \(\mathbb{E}[X]\) independent?” and I get crickets!

There is colloquial use of language and a mathematical term and its associated definition. There is a specific mathematical definition for independence in this probability context and then there is a vague every day use of the word to mean “two things that are free of each other”. Students jump towards colloquial use of words first before they invoke mathematical definitions; this is natural. As soon as I ask “why” on independence I begin to get some objections. And this is a good thing because I can walk through all the steps of the derivation and never get a pulse on what is understood or not if I don’t converse with my classroom. It’s not enough to simply say “and this step here is a result of independence of \(X\) and \(\mathbb{E}[X]\)”, I have to pull teeth and ask why so that I get a response.

So why are they independent? Well, it’s because \(\mathbb{E}[X]\) is a constant. And ironically, the easiest way to show this is to appeal to the continuous version (but both versions are fine, it’s just that the discrete version can sometimes be a little harder to parse through for students) where we can show that our bounds of integration are from \(-\infty\) to \(\infty\) and the result must be a constant (“a value”) if it exists since we “integrate out \(x\)”. This acts as a Calculus refresher / reminder.

Once we’ve gotten past this, we’re now at

$$\mathbb{E}[X^{2}] -2\mathbb{E}[X]\cdot\mathbb{E}[\mathbb{E}[X]] + \mathbb{E}[\mathbb{E}^{2}[X]]$$

Heh heh. Now the evilness has just begun because everything happens here is probability theory related. What is the expectation of an expectation? That’s the middle term. What is the expectation of the product of expectations? That’s the last term. And the first term? Why can’t \(\mathbb{E}[X^{2}] = \mathbb{E}[X\cdot X]\) not be written as \(\mathbb{E}[X\cdot X] = \mathbb{E}[X]\cdot\mathbb{E}[X]\)?

The “if-then” part of theorem reading is also important. “If \(X\) and \(Y\) are independent random variables, then \(\mathbb{E}[XY] = \mathbb{E}[X]\cdot\mathbb{E}[Y]\)”. Is it possible for there to be a choice of \(X\) and \(Y\) to not be independent but still have that the expectation of the product be equal to the product of the expectations? Our theorem doesn’t say it’s not possible. The contrapositive, however, is true. “\(A \implies B\)” is logically equivalent to “\(\mbox{not } B \implies \mbox{ not } A\)”.

Now, what of the nested \(\mathbb{E}\) expressions? It’s best to work from the inside, out. \(\mathbb{E}[X]\) is a constant if the expectation exists. So what’s the expectation of a constant? It has to be that same constant! And hence \(\mathbb{E}[\mathbb{E}[X]]\) must be \(\mathbb{E}[X]\).

I love this little exercise and I don’t mind devoting nearly an entirely lecture to this because I get so many teachable points and an incredible amount of information about my class’s math background (yes, I’m using apostrophe s for the possessive).

Once we go through this, I like to take a little more time out to talk about why we calculate average the way we do.

In summary, I think we should poke around the edges of our classes’s math background as we teach our regular content. If you are looking to bounce of an idea or want some ideas for a particular lesson, just get in touch!

### From the Twitterverse

The evolution of a more digital world!

you are asked "what time is it?" and you look at your "watch" (generally time-telling device, likely your cell phone) and see it is 12:07pm. But you don't know how many seconds have passed. How do you respond?#mathchat #timechat #FridayThoughts

— M Shah (@shahlock) July 12, 2019

I’m pretty sure fifty years ago, the results would have been different!

In joke land, only @RPhillipsMath got it! @mathtans came close!

here's a #mathriddle

When there were 9 of them, the landlord didn't mind, but as soon as there were 10, he started to charge rent. What were they?

Blog shoutout if you get it! DM me your answer ðŸ™‚#mathsed #mtbos

— M Shah (@shahlock) July 3, 2019

The answer is “ten-ants”.

In riddle land, @ITR13 and @sxpmaths got it! Can you solve this?

Blog shoutout challenge! Give me math words (lower case) except for "integral", "derivative", or "calculus" and I'll give you a score A. Your tasks?

(1) Find words with A / word length >= 4

(2) Find A for the three excepted words#mtbos #mathrecreation1/2 examples in next tweet

— M Shah (@shahlock) June 28, 2019

Your Thoughts Are Welcome. Leave A Comment!