*In a camp, there are 79 kids; 27 of them are younger than twelve, 33 are girls, and 30 are boys that are twelve or older. Fill in this chart:*

Girls | Boys |

*In a camp, there are 79 kids; 27 of them are younger than twelve, 33 are girls, and 30 are boys that are twelve or older. Fill in this chart:*

Girls | Boys | All | |

Younger than twelve | |||

Twelve or older | |||

All |

The real question was to demonstrate that **this chart can only be filled in only one possible way**. So great, let's first enter in what we know:

Girls | Boys | All | |

Younger than twelve | 27 | ||

Twelve or older | 30 | ||

All | 33 | 79 |

Now, since the "All" must be equal to the sum of the parts, you have enough information to find out two of the cells, namely

Girls | Boys | All | |

Younger than twelve | 27 | ||

Twelve or older | 30 | I must be 79-27=52 | |

All | 33 | I must be 79-33=46 | 79 |

Now we're here:

Girls | Boys | All | |

Younger than twelve | 27 | ||

Twelve or older | 30 | 52 | |

All | 33 | 46 | 79 |

Then there are two more cells we know:

Girls | Boys | All | |

Younger than twelve | I must be 46-30=16 | 27 | |

Twelve or older | I must be 52-30=22 | 30 | 52 |

All | 33 | 46 | 79 |

Now we're here and finally you know the last cell:

Girls | Boys | All | |

Younger than twelve | 16 | 27 | |

Twelve or older | 22 | 30 | 52 |

All | 33 | 46 | 79 |

There are two ways in which we can fill in the last cell - either 33-22 or 27-16 - but either way it's 11! That's no coincidence. Now we have the full table:

Girls | Boys | All | |

Younger than twelve | 11 | 16 | 27 |

Twelve or older | 22 | 30 | 52 |

All | 33 | 46 | 79 |

Now let's go back to the original question: **why was there only one way to generate this table?** Remember at each step above there was no choice in what we were able to do. From the outset certain cells **were already forced to be a certain value**. As we calculated what those values were, new cells had the same property and we really had no **choice** at all while filling in the table.

So, from a fifth grader's perspective that's sufficient. But realize that this is, at the core, a question about degrees of freedom. We were given a matrix and the **constraint** that the last entry in each row must be the sum of the other elements, and the same situation holds for columns. And we showed that once you know 4 entries in the table, you know them all.

But not quite - you must be given 4 **independent** entries from the table. For example, you could start with this information and still generate the full table:

Girls | Boys | All | |

Younger than twelve | 11 | 16 | |

Twelve or older | 22 | 30 | |

All |

However, you can't start here and fill out the whole table:

Girls | Boys | All | |

Younger than twelve | 11 | 16 | 27 |

Twelve or older | 22 | ||

All |

What's the difference? Think about putting the entries in one at a time. Once you have 11 and 16 in the table above, **adding in 27 to the table doesn't count as adding in new information**. It doesn't count because it's already established.

The talk is titled Fitting Square Pegs in Round Holes with Data. It covers just three of the myriad of situations in which we **want** data in a certain way, yet

The talk is titled Fitting Square Pegs in Round Holes with Data. It covers just three of the myriad of situations in which we **want** data in a certain way, yet we can only **get** data in the format we get it.

Check out the deck and the corresponding R-based example material here.

]]>With all the hotness around data science, it's inevitable that a bunch of schools are opening up special programs around data science, but the

]]>With all the hotness around data science, it's inevitable that a bunch of schools are opening up special programs around data science, but the shit that most people forget to realize is that **data science is built upon mathematics**.

So if you don't know your fundamentals in math, then you're fucked. You can probably get away for a while producing some data porn, but in whatever soon-to-be-failed startup you join, you will get some data and end up **describing** it, even though the real value is in making **useful conclusions** from it.

It's tough to do an interview that involves actual data analysis in an on-site interview (some firms give homework-like exercises; we don't although I do like the idea) but let's go through some problems we ask and why we like them.

*A bus runs every 15 minutes outside my apartment. If I come down at some random time, how long, on average, will I have to wait before I catch a bus?*

Plenty of people I have interviewed can't even give me their intuition on what the answer is, which usually implies that the candidate is going home without a job offer.

Even for those that have the intuition for the answer, getting it rigorously is still problematic for most, because they can't comprehend **what it means**. What are we describing here? We have a certain amount of time we're waiting, and it's unknown. This is a random variable. In our scenario, this follows the uniform distribution $Unif(0, 15)$, and so once we are here we simply plug and play:

$$ E[Unif(0, 15)] = \int_0^{15} x\left(\frac{1}{15}\right)dx $$

Check it out on Wolfram Alpha in case you can't do it yourself, but math don't lie: it's 7.5.

Okay - so this was only the warm-up for the actual problem. Next time - the real problem.

]]>But as the Ashley Madison hack reminded me, in this world nothing will ever be deleted anymore.

]]>So all of my thoughts and works here will be

]]>So all of my thoughts and works here will be of my own, and will represent me. But these will be my main points:

- Math isn't the end-all. Specifically, to be a good mathematician this day and age you need to be a decent computer programmer.
- Data science is hot thing, but it's little more than mathematical modeling and a shit-ton of marketing.

But there's going to be a lot of cool math in this blog, too, so join along and learn something new every day.

Specifically, I'm going to be working through my blog during the semesters at the same pace of a normal course, so let's kick open the ol' calculus book and learn together.

]]>