Dec 95 min read

Back to Basics: Logistic Regression – Less Math, More Intuition

Hey there, data enthusiasts! Today we're going to explore logistic regression in a straightforward, intuitive way while consciously avoiding too much math.

What is Logistic Regression?

Logistic regression is a type of statistical model that helps us predict outcomes that fall into one of two categories. Imagine it like a decision-making tool that helps classify data into distinct groups, like yes/no, spam/not spam, or pass/fail.

Unlike linear regression, which is used to predict continuous values, logistic regression is all about classification.

The core of logistic regression is the logistic function (also called the sigmoid function), which takes in all kinds of data and outputs a value between 0 and 1. This value represents the probability of a certain outcome. If the probability is closer to 0, it means one outcome is more likely; if it’s closer to 1, it suggests the other outcome is more likely.

For instance, imagine you’re trying to predict whether it will rain tomorrow—just a simple "yes" or "no." Logistic regression helps us do this by giving us a probability: how likely it is to rain. The result is a number between 0 and 1, where closer to 1 means "very likely" and closer to 0 means "not likely."

But probabilities are tricky to work with directly because they’re limited to values between 0 and 1, but the math behind logistic regression makes it work seamlessly. Logistic regression uses some clever tricks to make predictions work mathematically.

Thresholds in Logistic Regression

Deciding on yes or no decisions need to be based on a cut-off for probability, which is ideally 0.5 (50-50 probability of an event to happen).

However, in real world scenario, choosing the threshold for logistic regression depends on the specific problem and the cost or consequences of making incorrect classifications (false positives or false negatives). A default threshold of 0.5 is common, but it's not a strict rule. The threshold should be determined based on the business context, the data, and the priorities of the problem. Here are examples and scenarios to illustrate how thresholds can be decided:

1. Customer Churn Prediction

• Scenario: Predicting whether a customer will leave a subscription service.

• Cost of Error:

o False Negative: Predicting "No churn" when the customer will actually leave means losing revenue.

o False Positive: Predicting "Yes churn" and taking retention actions unnecessarily incurs minor costs.

• Threshold Decision: Lower the threshold (e.g., 0.4) to maximize recall and reduce churn risk.

2. Medical Diagnosis Example

• Scenario: Predicting whether a patient has a disease (Yes/No).

• Cost of Error:

o False Negative (predicting "No" when the patient has the disease) is much worse than a False Positive (predicting "Yes" when they don't).

o Missing a true case could lead to untreated illness, while a false positive might lead to unnecessary tests.

• Threshold Decision: Lower the threshold (e.g., 0.3 or 0.4) to increase sensitivity (True Positives) and reduce the chance of missing cases.

Why is Logistic Regression Important in Binary Classification?

Logistic regression is crucial for binary classification due to its simplicity and interpretability. Here’s why:

Simple and Easy to Understand: easy to implement and explain.
Interpretable: Helps identify the contribution of features to predictions.
Efficient: Works well with smaller, simpler datasets and linearly separable data.
Works Well with Linearly Separable Data: If your data can be divided with a straight line, logistic regression is perfect.
Probability Output: It gives you probabilities, so you understand the level of confidence in the decision.

Although the output of logistic regression is probabilistic, the underlying process involves a series of transformations of input values: from inputs to a linear combination, to log-odds, through the sigmoid function, and finally to probabilities. The transformed data is then used for training.

But why such series of transformations is needed? Let’s dive into these processes.

Input Features: The model takes input features X =[x1,x2,...,xn].
Weights: Each input is associated with a corresponding weight W=[w1,w2,...,wn].
Linear Combination: The model computes a linear combination of the input features whose result is represented by z.

Here, z represents the raw score or logit.

Sigmoid Function:
- The raw score z is passed through the sigmoid function to map it to a probability
- The output is a value between 0 and 1, representing the probability of the positive class (P(Y=1∣X)
- The sigmoid function inherently models the log-odds of the probability
- After calculating the log-odds, the sigmoid function converts it back to a probability
Probability to Decision: Based on the threshold, classify the outcome as 1 (positive class) or 0 (negative class).

While input features and linear combination seems pretty straight forwards, why do we still need to do multistep transformations? Why can’t we skip all these steps and just predict probabilities directly? For this, we need to understand core differences between the components viz odds, logits, probability.

Odds vs. Probability

Probability is just "how likely is something to happen?" For example, if the chance of rain is 80%, the probability is 0.8.
Odds go a step further: they compare the likelihood of something happening to it not happening. If the probability of rain is 80%, the odds are 4-to-1 because it’s 4 times more likely to rain than not.

Odds give us a way to think about likelihoods in relative terms.

What is the Log of Odds?

The log of odds (also called the logit) is simply a mathematical way to stretch and re-scale probabilities into a range where we can fit a straight line.

It is an interpretation of the linear combination, which maps directly to probabilities via the sigmoid function.

Probabilities near 0 or 1 behave non-linearly, making it harder for the model to fit them accurately. That’s why we transform them using the log-odds.

Think of the log of odds as a "middle step" that makes it easier to build a prediction model.

Why Not Use Probabilities Directly?

Probabilities are tricky to handle because they are bound between 0 and 1. A straight line (as in linear regression) could predict invalid probabilities, like -5 or 1.5, which don’t make sense. Using the log of odds allows us to avoid this issue because it can handle any number.

What is the Sigmoid Function?

Once we’ve modelled the log of odds (a number that can go from -∞ to +∞), we need to bring it back to a probability so it makes sense to humans. That’s where the sigmoid function comes in. The sigmoid function takes this "stretched out" logit value and squashes it back into a range between 0 and 1.

Think of the sigmoid as the final step that turns the model’s output into something meaningful—a probability we can interpret, like "there’s a 75% chance of rain."

Why Use Log of Odds Instead of Log of Probability?

The log of odds is special because it gives us a complete picture of the likelihood of an event happening relative to it not happening. It’s like saying, "It’s not just 80% likely to rain—it’s 4 times more likely to rain than not." This comparison is crucial for building a balanced and interpretable model.

The log of probabilities, on the other hand, ignores the "not happening" side of the equation. That’s why the log of odds is the better choice.

Final Summary Flow

Input Features (X) → Linear Combination (z).
Log-Odds (z=wTX+b) → Sigmoid Function (σ(z))
Sigmoid Function → Probability (P(Y=1∣X)).
Training: Adjust weights and biases using gradient descent to optimize predictions.

Logistic regression might not be the most sophisticated model, but when the task is straightforward and you need a fast, interpretable, and reliable answer, it’s often the best tool for the job. It’s like your dependable friend who gives you clear, no-nonsense advice when you need it!

References:

→ https://www.geeksforgeeks.org/understanding-logistic-regression/

→ https://www.theanalysisfactor.com/what-is-logit-function/

→ https://www.statlect.com/fundamentals-of-statistics/logistic-classification-model

Khushi J, Data Scientist in Fintech

Back to Basics: Logistic Regression – Less Math, More Intuition

What is Logistic Regression?

Thresholds in Logistic Regression

1. Customer Churn Prediction

• Scenario: Predicting whether a customer will leave a subscription service.

• Cost of Error:

• Threshold Decision: Lower the threshold (e.g., 0.4) to maximize recall and reduce churn risk.

2. Medical Diagnosis Example

• Scenario: Predicting whether a patient has a disease (Yes/No).

• Cost of Error:

• Threshold Decision: Lower the threshold (e.g., 0.3 or 0.4) to increase sensitivity (True Positives) and reduce the chance of missing cases.

Why is Logistic Regression Important in Binary Classification?

Sigmoid Function:

Probability to Decision: Based on the threshold, classify the outcome as 1 (positive class) or 0 (negative class).

Odds vs. Probability

What is the Log of Odds?

Why Not Use Probabilities Directly?

What is the Sigmoid Function?

Why Use Log of Odds Instead of Log of Probability?

Final Summary Flow

References:

Recent Posts

Comments

Contact Us