normal distribution:

a tale of two thoughts

What is “Normally Distributed Data”?

Normally distributed data forms a bell-shaped curve when plotted on a graph.

Most values cluster around the average, with fewer values appearing as you move further from the average in either direction.

The Purpose of Data Analysis

Data analysis helps filter out random noise to detect meaningful patterns or signals, similar to hearing a specific tune in a noisy room.

The Myth of Needing “Normally Distributed Data”

There’s a belief that for process behavior charts (tools to monitor process changes over time) to work, the data must be normally distributed.

This belief dates back to 1935 when E. S. Pearson misunderstood Walter Shewhart’s method of filtering noise.

Pearson’s Statistical Approach

Pearson’s method to filter noise involves:

    1. Choose a Proportion (P): Decide how much noise to filter out, commonly 95% (P=0.95) or 99% (P=0.99).
    2. Identify a Test Statistic (Y): Create a function based on the data.
    3. Find the Probability Model (f(y)): Determine how Y behaves under certain conditions.
    4. Determine Critical Values (A and B): Use the curve’s area to find the points that correspond to P. Any Y value outside this range is considered a signal.

This method works well if we know the correct probability model for our data.

However, we rarely have enough data to accurately determine this model.

Shewhart’s Alternative Approach

Shewhart tackled the problem differently:

    1. Fixed Critical Values: Instead of relying on specific probability models, Shewhart used fixed, generic critical values.
    2. [Average ± 3 Sigma]: He chose symmetric limits around the average, extending three standard deviations (sigma) in both directions.

This method is effective regardless of the data’s distribution and ensures that almost all data points (close to 100%) fall within these limits.

 Why Shewhart’s Method Works

Shewhart’s approach avoids the need to determine the exact probability model, making it simpler and more robust for practical use.

By using generic critical values, his method can detect signals effectively without being constrained by the data’s distribution.

While Pearson’s statistical approach is useful when the correct probability model is known, Shewhart’s method provides a practical and flexible alternative.

It simplifies the process of filtering out noise and detecting signals in data, making it highly valuable for real-world applications.

source: Here

Sincerely,

Lindsay Alston