The birthday problem — Part 2: The statistical shortcut

2025-01-13 Written by Dimitris Kokoretsis

#birthday-problem #probability #simulation #Monte-Carlo #statistics #r-programming

In this post, we will answer a question that we previously failed at: What is the probability that two students in a classroom share the same birthday? This is described as a paradox because for a classroom of realistic size, the probability is surprisingly higher than intuitively expected. Previously, we could only solve this for very small classrooms of 2 or 3 students. The reason? Too much data. We will now get around this obstacle by the means of random sampling.

The birthday problem — Part 2: The statistical shortcut

Read more →

How to get data from websites (fast)

2024-01-05 Written by Dimitris Kokoretsis

#data-collection #web-scraping #HTML #r-programming

So far in this blog, we’ve tried to answer questions by creating “fake” artificial data, and finding patterns in it. That’s fun and all, but sometimes we just need data from the real world. Take for example the simple question: Which are the 10 most densely populated countries? In this post we’ll focus on data collection instead of analysis. As long as the data is accessible and somewhat structured in a webpage, we’ll see how to extract it for further processing and analysis - without tedious copy-pasting but with a code-based process called web scraping.

Read more →

The birthday problem — Part 1: When counting fails

2022-11-30 Written by Dimitris Kokoretsis

#birthday-problem #probability #counting #algorithmic-complexity #r-programming

After the “toy problem” of the previous post, I decided to take a chance at a more tangible, real-world question. Consider this one: What is the probability that two students in a classroom share the same birthday? Intuitively (and correctly), you might think it depends on the total number of students. But for a realistic classroom of around 20 students, the probability should be quite low - whatever “low” means. After all, how many times have you witnessed that?

The birthday problem — Part 1: When counting fails

Read more →

Drawing random biscuits: from probability to data

2022-08-28 Written by Dimitris Kokoretsis

#probability #frequency #statistics #Monte-Carlo #simulation #r-programming

I stumbled upon a probability exercise when I had just built enough confidence with R scripting, and I was looking for “toy problems” to play with. It was a simple brain teaser, barely more complicated than the well-known coin toss. I could solve it with pen and paper, but I wanted to squeeze out of it as much as I could. My central thesis in this post is that, if we formulate our assumptions well, we can turn questions about probability into questions about data.

Drawing random biscuits: from probability to data

Read more →

The birthday problem — Part 2: The statistical shortcut

How to get data from websites (fast)

The birthday problem — Part 1: When counting fails

Drawing random biscuits: from probability to data

Main posts featuring Functions