Binning Methods for Data Smoothing

Binning Methods for Data Smoothing

The binning method can be used for smoothing the data.

Mostly data is full of noise. Data smoothing is a data pre-processing technique using a different kind of algorithm to remove the noise from the data set. This allows important patterns to stand out.

Unsorted data for price in dollars

Before sorting: 8 16, 9, 15, 21, 21, 24, 30,   26, 27, 30, 34

First of all, sort the data

After Sorting: 8, 9, 15, 16, 21, 21, 24, 26, 27, 30, 30, 34

Smoothing the data by equal frequency bins

Bin 1: 8, 9, 15, 16

Bin 2: 21, 21, 24, 26,

Bin 3: 27, 30, 30, 34

Smoothing by bin means

For Bin 1:

(8+ 9 + 15 +16 / 4)  = 12

(4 indicating the  total values like 8, 9 , 15, 16)

Bin 1 = 12, 12, 12, 12

 

For Bin 2:

(21 +  21 + 24 + 26 / 4) =  23

Bin 2 = 23, 23, 23, 23

 

For Bin 3:

(27 + 30 + 30 +  34 / 4) = 30

Bin 3 =  30, 30, 30, 30

 

Smoothing by bin boundaries

Bin 1: 8, 8, 8, 15

Bin 2: 21, 21, 25, 25

Bin 3: 26, 26, 26, 34

 

How to smooth data by bin boundaries?

You need to pick the minimum and maximum value. Put the minimum on the left side and maximum on the right side.

Now, what will happen to the middle values?

Middle values in bin boundaries move to its closest neighbor value with less distance.
Unsorted data for price in dollars:

Before sorting: 8 16, 9, 15, 21, 21, 24, 30,   26, 27, 30, 34

First of all, sort the data

After sorting: 8, 9, 15, 16, 21, 21, 24, 26, 27, 30, 30, 34

Smoothing the data by equal frequency bins

Bin 1: 8, 9, 15, 16

Bin 2: 21, 21, 24, 26,

Bin 3: 27, 30, 30, 34

Smooth data after bin Boundary

Before bin Boundary:  Bin 1: 8, 9, 15, 16

Here, 1 is the minimum value and 16 is the maximum value.9 is near to 8, so 9 will be treated as 8. 15 is more near to 16 and farther away from 8. So, 15 will be treated as 16.

After  bin Boundary:  Bin 1: 8, 8, 16, 16

Before bin Boundary:  Bin 2: 21, 21, 24, 26,

After  bin Boundary:  Bin 2: 21, 21, 26, 26,

Before bin Boundary:  Bin 3: 27, 30, 30, 34

After  bin Boundary:  Bin 3: 27, 27, 27, 34

Binning Methods for Data Smoothing
Figure Binning Methods for Data Smoothing

Advantages (Pros) of data smoothing

Data smoothing clears the understandability of different important hidden patterns in the data set.

Data smoothing can be used to help predict trends. Prediction is very helpful for getting the right decisions at the right time.

Data smoothing helps in getting accurate results from the data.

Cons of data smoothing

Data smoothing doesn’t always provide a clear explanation of the patterns among the data.

It is possible that certain data points being ignored by focusing the other data points.

Example of binning for data smoothing

Sorted data for Age:   3, 7, 8, 13,        22, 22,  22,  26,      26, 28, 30, 37

How to smooth the data by equal frequency bins?

  • Bin 1: 3, 7, 8, 13
  •  Bin 2: 22, 22, 22, 26
  •  Bin 3: 26, 28, 30, 37

 

How to smooth the data by bin means?

  •  Bin 1: 8, 8, 8, 8
  • Bin 2: 23, 23, 23, 23
  • Bin 3: 30, 30, 30, 30

How to smooth the data by bin boundaries?

  • Bin 1: 3, 3, 3, 13
  • Bin 2: 22, 22, 22, 26
  • Bin 3: 26, 26, 26, 37

Binning data in excel

Step 1: Open Microsoft Excel.

Step 2: Select File -> Options.

Step 3: Select Add-in -> Manage -> Excel Add-ins ->Go.

Step 4: Select Analysis ToolPak and press OK.

Data smoothing in Excel

Step 5: Now select all the data cell and then select ‘Data Analysis’. Select Histogram and press OK.

How to make data bins Excel

Step 6: Now, mention the input range. For example, here i am selecting the Cell Number A1 to A13 as an input range and cell number C4:C5 as bin range. Select the Chart output and press OK.

data binning with excel

Now, Enjoy the results.

Data binning with histogram and graphs

Data Smoothing Commands

There are many other techniques of data smoothing. Exponential smoothing is one of them.

Data Smoothing Command

 

What will apply to the data set?
MovingMedian moving medians
MovingSttistic moving statistics
ExponntialSmoothing exponential smoothing
LinearFilter linear filter
moving average moving averages
WeightedMovingAverage weighted moving averages

Exponential smoothing

Exponential smoothing is a technique for smoothing the time series data. Exponential smoothing can smooth the data using the exponential window function.

Advantages of Exponential Smoothing

  1. Exponential Smoothing is easy to learn and apply.
  2. It gives more significance to recent observations.
  3. It gives more significance to recent observations.
  4. Exponential Smoothing leads to accurate predictions.

Disadvantages of Exponential Smoothing

  1. Exponential Smoothing leads to the predictions that lag behind the actual data trend.
  2. Exponential Smoothing cannot handle the data trends very well.

Some other data smoothing techniques are Moving Average Smoothing, Double Exponential Smoothing, and Holt-Winters Smoothing.

What data is used for smoothing method of forecasting?
Moving Averages
Exponential Smoothing
Double Exponential Smoothing
Triple Exponential Smoothing
Holt’s Linear Exponential Smoothing

C++ program to perform Data cleaning in data mining

Important topics to know:

  • binning is a method to manage noisy data. optimal binning in python.
    binning by clustering
  • equal width binning python
  • equal frequency binning python
  • binning machine learning
  • equal width binning in r
  • discretization by binning