We start by giving a definition of probability. It is easier to understand with an example. Imagine you have a fair coin. Obviously, the probability is 1/2. Now try some exercises.
– 99% of books that have “introduction” and “probability” in the title
Probability is defined as a measure. Distribution functions, expectations, and many others are defined using the Lebesgue integral. Most people who read an introduction to probability do not know what a Lebesgue integral is, so textbook authors avoid talking about it. It’s stupid, because it guarantees that you will have holes in your understanding of the basics.
In a uniform distribution over [0, 1], the realized event is guaranteed to have zero probability. A uniform distribution over ℝ is impossible. Some distributions are neither discrete nor continuous, nor a mixture of the two. It is possible for a set of outcomes to have undefined probability, even though every outcome x (technically, every set {x}) has a probability. All of the above was mind-melting to me in college, but it’s actually quite trivial. Here’s what helped me:
Pick up Jaynes’s Probability Theory: The Logic of Science and read the first few chapters. There are a lot of reviews complaining that this book is extremely opinionated. This is good. Opinionated is what you need.
Rather than relying on measure theory, Jaynes derives probability as the (only possible) extension of propositional logic that can handle uncertainty. He arrives at the Kolmogorov axioms, so his probability theory is the same, only the motivation is different. This is the only introductory author I know that asks “does probability really have to sum to one? What if it didn’t?”. He makes Bayes rule feel as inevitable as modus ponens, and helps you realize that, actually, things make sense. The book gets dense quickly, but reading just the first few chapters is enough.
Possibly at the same time, start Marco Taboga’s Lectures on Probability Theory and Mathematical Statistics. It is a conventional introduction to probability, but it is clear, accessible, builds up the narrative starting with axioms, and explicitly points out every time it simplifies things to avoid measure theory. When that happens, you can look up the measure-theoretic definition that the book provides on Wikipedia or o3 and get a grasp of it quickly. “A random variable is a measurable function from a probability measure space to a measurable space” isn’t difficult to understand once you get there, and is infinitely better than “a random variable is a value that has some randomness” — the most common substitute. Taboga covers the law(s) of large numbers, the central limit theorem, and basic statistics (t-tests from first principles!). He has a good explanation of frequentist vs Bayesian statistics. The book is quite long, but simple, and you can read it cover to cover. If you’re like me, at this point your insecurities about the basics of probability will disappear.
Then you can tackle a good second textbook, like Grimmett’s Probability and Random Processes, which will lead you all the way to martingales and diffusion. It also has fun exercises.
Also, if you are curious about the actual Lebesgue integral, the first-principles way would be to pick up Terence Tao’s Analysis, an undergraduate course that starts at the Peano axioms, ends at the Lebesgue integral, and proves pretty much everything along the way. But honestly, Wikipedia is enough.