Bigotry, Statistical Discrimination, or Just Our Mortality?

I recently got married. It was easily the happiest day of my life. The process of getting married, planning the wedding ceremony, the reception, the honeymoon were also happy memories. It wasn’t as stressful as the wedding planning books or other couples said it would be. In fact, one couple said that the month leading up to the wedding day would be really stressful and we should be prepared to get into yelling matches. That never came to pass, fortunately! But one thing, I did learn in the months leading up to the wedding day was that there are all sorts of assumptions that people had made about me, my wife-to-be, and us as soon-to-be husband-and-wife.

Let me share some vignettes — wedding and non-wedding related.


– Are you having a traditional wedding?
– Yes, we’re having a traditional Christian wedding.
– Oh, no, I meant are you doing an Indian ceremony?

Ah, “traditional” … here used as a synonym for “different” or “foreign”. And also interesting is the follow-up question to clarify. Because I am of “Indian” genetic lineage by proxy then, it’s probably a safe bet that I have some cultural affinity to Indian culture. This, I broadly classify as a generally innocuous statistical discrimination. I look Indian, plenty of Indian looking people have an Indian ceremony regardless of the ethnicity of their partner, and so it’s a decent bet that I’d probably be having an Indian ceremony. Plus, it’s a simple conversation starter.

Cultural Associations

– Are you going to be raising your kids Christian?
– Probably. Who knows.
– Are you ok with that?

Because I couldn’t possibly be Christian … [the above is from people who are unaware of my religious beliefs]. Again statistical discrimination … odds are, if you are brown like me, you’re probably not Christian.

– Wait, you describe yourself as brown?! Aren’t you Indian?

I actually don’t know how to describe myself. My typical default answer is “I’m an American with an acute understanding of Indian culture.” My skin color is in the brown family, which has little to do with my cultural affinity, let alone nationality, especially in the US. I actually don’t like being described as Indian because with that word comes a broad cultural association which isn’t correct.

– But you are from India, right?

Meh … I don’t know what “from” means. Does it mean “of the land”? Does it mean “originally born there” with no further assumptions of cultural affinity? Does it mean “allegiance to”?

– Oh I heard you went on vacation! How was India?
– I went to Belgium. Why would I go to India?

I have nothing to say to that one.

“Different” Names

My brother’s name is Tejus. It’s a “foreign” looking name. Growing up, we played a variety of organized sports. With baseball [Little League], when he would be at the bat and the announcer had to say his name, the “j” in Tejus was pronounced the same way it is pronounced in Spanish … Jose, Juan, fajitas, baja California, Pijijiapan [my favorite city name — five dotted letters in a row!].

Or what of this one?

We are pattern-matching animals. And this pattern matching can get us into some awkward situations. More recently

Ah, but this is Levenshtein distance at play. Close enough and mostly harmless I say. Heck, we’ve probably all done something similar when being introduced to a large group of people and being told names.

– Was his name Mike or Mark? Was her name Jane or Jen?

But interestingly enough, in this article I used “Jimmy”, “Jane”, and “Johnny” for kids names and was questioned on my use of non-ethnic names. Can somebody please define “ethnic” for me? Is “Jane” not an ethnic name? In a slight backpedal and an attempt to rephrase, I was then asked “What inspired me to use “generic” names?”. But generic by whose standards?

This is more statistical discrimination — clustering. Sure, I chose those names since I am writing to an English reading audience and who live predominantly in the US and Canada. I suppose I could write some type of machine learning / genetic algorithm to construct names at random from existing names. Jalpy, Jommy, Jernane, etc. But this would go to detract from the main point of the article. The names were immaterial so I chose names common and familiar to an American / Canadian audience. A name like Jalpy or Jommy is almost surely unfamiliar and hence the reader will likely end up focusing on something irrelevant. But maybe I could have used Jane, Juan, and Jignesh to be “balanced”. But this is weird just the same because of a convoluted notion of “balance” — that multiple cultural groups should be represented.

We can see that we are quickly getting into a circular mess. What culture is Juan now? Jignesh? Jane? Why are those names associated with a culture?

And using “acultural” names like Jalpy, Jommy, Jernane doesn’t solve the problem [what is the problem?]. Since we are pattern-matching animals and since these three names don’t fit a known pattern, what will we do? We’re not going to think of Roswell Greys. We’re going to think of “other” from our perspective of what is familiar. And so, Jalpy, Jommy, Jernane would end up being “foreigners” and without serious character development will remain unconnected to the reader in a not-good-writing-style kind of way.

There doesn’t seem to be any winning in this situation and hence, Jimmy, Jane, and Johnny.

Implicit Power

– Are you going to take his last name?
– No.
– Good for you!

or in other cases, the response to my wife, from other women, was one of disdain and disappointment. For the record, we toyed with joint last names — not Shah-Valentine or Valentine-Shah, but Shalentine or Valenshah. But these became more a point of amusement than anything else. In the end, neither of us felt there was any practical need for a name change and felt that the custom was antiquated.

After getting married, my wife had this repeated conversation

– What should we call you now?
– Mrs. Valentine
– Oh. You’re keeping your name, for teaching?
– No, I’m keeping my name for everything.
– Oh [with disdain or negative surprise]

More interesting to me, and I pray I won’t get crucified for writing it, is that not taking my last name was a point of power. This is interesting because retaining one’s name is a given for men and can be a point of contention for women. That’s not statistical discrimination, that’s something else. In previous generations, that was cultural norm. Today it’s more a choice than it was before. And probably in the future, it’ll be an insult to make such a demand / request. If names are going to be changed, probably the fairest way would be that both change their names.

But what of the kids? What will their last name be? Whose name lineage continues? A current cultural norm is that the kids take the father’s last name. Some families split the last names with one child keeping the father’s and the other keeping the mother’s. Some give their kids hyphenated names. But what if our child with last name of Shah-Valentine marries last name of Smith-Brown [too generic?] what will the last names of our grandkids be? Shah-Valentine-Smith-Brown? This is a mess. Note that it is not a norm for all children to take the mother’s last name.

But then there are standard cultural jokes masking the insults that they are.

– [Referring to my wife and with a snicker] So when’s the boss going to be home?

Because it’s a “joke” that the woman is in charge of the relationship. Because we have phrases like “who’s wearing the pants in that relationship?”. Skirts can’t be in charge. It’s a joke. But it’s not a joke. It’s a deeply ingrained and accepted mockery. It seeps into our subconscious and gains foothold there. And from within the recesses of our minds spring forth the small discrepancies in how men and women are treated. And these grow larger and become cultural norms or institutional bigotry.

Institutional Bigotry

There’s always a need to filter. I filter. You filter. We all filter. We are not infinitely lived beings that we have the luxury of time to experience different lives. And even if we were infinitely lived, we can’t experiences all experiences simultaneously. That is we don’t have a quantum existence in that we try all paths simultaneously.

I filter who I listen to on Twitter, for example. I know that there are some people whose tweets are vapid 99% of the time. So, as a general measure I don’t follow them and I ignore everything that they have to say. It’s playing the odds and it’s smart filtering. Maybe that person will say something interesting 1% of the time, but the signal-to-noise ratio is too weak for me to waste my energy in the anticipation of some worthwhile tweet. My time is too valuable for that.

Institutions [abstract and real] filter similarly. Corporations do the same. One recent example of (unintended?) bigotry masquerading as data science is related to certain questions on employment applications, specifically those related to criminal history. A ban on asking about criminal history was intended to give those with a criminal history a fairer chance at restarting their lives. Unfortunately, this is something that employers want to filter on and in the absence of that question, race was used as a proxy. Here’s a link to the paper if you want to at least read the abstract. The authors call this “statistical discrimination”, but it’s of the worst kind. It’s of the kind where we take aggregate statistics about a class of people and then attribute to every individual characteristics implied by those statistics. Of course, this is how statistics is done. In the absence of any uniquely identifying information, the best guess (point estimate) for a complete stranger is the center of the distribution conditioned on the (generic) information provided. But is this what we should be doing?

Racism of the form “Hey, you. We don’t want your kind.” is poorly received. It’s arbitrary in its discrimination and is dehumanizing. But! If we attach data and filter as per the data, that is somehow seen as a more “objective” discrimination — statistical discrimination if you will.

Oh you live in that neighborhood? Your car insurance rates are going to be higher since there are above “normal” accident rates there. But we’ve gotten to a point where the problem of discrimination feels chicken-and-egg like. It’s a poor neighborhood with high crime rate. Insurance will necessarily cost more pro rata. And it makes sense statistically, at least — and we generally accept. But why does that neighborhood have a high crime rate? Why is it poor?

Virtually no town has socio-economic classes intermingled across the blocks. It’s not aesthetically pleasing for starters. There is almost always a “nice” section and a not-so-nice section. Individuals do this en masse. They huddle around those like them and push away those not like them.

How do you keep the “riff-raff” out of your town? High property taxes [but really you’re just trading one type of riff-raff for another to be cynical]. Where will the riff-raff go? to the poor(er) towns. And from here the economic feedback cycle begins as does the self-sustained filtering. Some areas get revitalized, but all that means is that the poor have been shoved over somewhere else.

We do this at schools regularly when we filter students based on grades. You got a C in your course? Well, then you can’t take the honors class next semester / year. That C was supposed to be a proxy for the level of mastery of the subject by the course’s end. Aside from the general problems that exist with typical averaging of exams and quizzes [dependent, weighted time series data] to create a single number snapshot of a student’s performance, there’s what happens with that number. It’s used to filter — academically talented vs not talented. And repeated use of this leads to longer term effects. Can you be a musician? an artist? an engineer? a mathematician? Some disciplines require the student to show talent at a young age. Latent talent isn’t as readily supported as early talent. How many of us are dormant mathematicians? musicians? artists? but were never able to explore because of the systemic filtering that took place?

Interestingly, schools don’t generally filter on socio-economic status — towns do that via property taxes. Or some private institutions filter for their own sake based on factors that are most conducive to producing good metrics for the school.

Some data-based segregation is probably good, some a necessary evil, and still some purely evil. A general question is do we know how we are using the data [formally collected or within our own intuition] to make decisions?

New here? Check out the "About The Blog" page and come say "Hello" to @shahlock