Binning Methods for Data Smoothing
Binning Methods for Data Smoothing
The binning method can be used for smoothing the data.
Mostly data is full of noise. Data smoothing is a data pre-processing technique using a different kind of algorithm to remove the noise from the data set. This allows important patterns to stand out.
Unsorted data for price in dollars
Before sorting: 8 16, 9, 15, 21, 21, 24, 30, 26, 27, 30, 34
First of all, sort the data
After Sorting: 8, 9, 15, 16, 21, 21, 24, 26, 27, 30, 30, 34
Smoothing the data by equal frequency bins
Bin 1: 8, 9, 15, 16
Bin 2: 21, 21, 24, 26,
Bin 3: 27, 30, 30, 34
Smoothing by bin means
For Bin 1:
(8+ 9 + 15 +16 / 4) = 12
(4 indicating the total values like 8, 9 , 15, 16)
Bin 1 = 12, 12, 12, 12
For Bin 2:
(21 + 21 + 24 + 26 / 4) = 23
Bin 2 = 23, 23, 23, 23
For Bin 3:
(27 + 30 + 30 + 34 / 4) = 30
Bin 3 = 30, 30, 30, 30
Smoothing by bin boundaries
Bin 1: 8, 8, 8, 15
Bin 2: 21, 21, 25, 25
Bin 3: 26, 26, 26, 34
How to smooth data by bin boundaries?
You need to pick the minimum and maximum value. Put the minimum on the left side and maximum on the right side.
Now, what will happen to the middle values?
Middle values in bin boundaries move to its closest neighbor value with less distance.
Unsorted data for price in dollars:
Before sorting: 8 16, 9, 15, 21, 21, 24, 30, 26, 27, 30, 34
First of all, sort the data
After sorting: 8, 9, 15, 16, 21, 21, 24, 26, 27, 30, 30, 34
Smoothing the data by equal frequency bins
Bin 1: 8, 9, 15, 16
Bin 2: 21, 21, 24, 26,
Bin 3: 27, 30, 30, 34
Smooth data after bin Boundary
Before bin Boundary: Bin 1: 8, 9, 15, 16
Here, 1 is the minimum value and 16 is the maximum value.9 is near to 8, so 9 will be treated as 8. 15 is more near to 16 and farther away from 8. So, 15 will be treated as 16.
After bin Boundary: Bin 1: 8, 8, 16, 16
Before bin Boundary: Bin 2: 21, 21, 24, 26,
After bin Boundary: Bin 2: 21, 21, 26, 26,
Before bin Boundary: Bin 3: 27, 30, 30, 34
After bin Boundary: Bin 3: 27, 27, 27, 34
Advantages (Pros) of data smoothing
Data smoothing clears the understandability of different important hidden patterns in the data set.
Data smoothing can be used to help predict trends. Prediction is very helpful for getting the right decisions at the right time.
Data smoothing helps in getting accurate results from the data.
Cons of data smoothing
Data smoothing doesn’t always provide a clear explanation of the patterns among the data.
It is possible that certain data points being ignored by focusing the other data points.
Example of binning for data smoothing
Sorted data for Age: 3, 7, 8, 13, 22, 22, 22, 26, 26, 28, 30, 37
How to smooth the data by equal frequency bins? |
|
How to smooth the data by bin means? |
|
How to smooth the data by bin boundaries? |
|
Binning data in excel
Step 1: Open Microsoft Excel.
Step 2: Select File -> Options.
Step 3: Select Add-in -> Manage -> Excel Add-ins ->Go.
Step 4: Select Analysis ToolPak and press OK.
Step 5: Now select all the data cell and then select ‘Data Analysis’. Select Histogram and press OK.
Step 6: Now, mention the input range. For example, here i am selecting the Cell Number A1 to A13 as an input range and cell number C4:C5 as bin range. Select the Chart output and press OK.
Now, Enjoy the results.
Data Smoothing Commands
There are many other techniques of data smoothing. Exponential smoothing is one of them.
Data Smoothing Command
|
What will apply to the data set? |
MovingMedian | moving medians |
MovingSttistic | moving statistics |
ExponntialSmoothing | exponential smoothing |
LinearFilter | linear filter |
moving average | moving averages |
WeightedMovingAverage | weighted moving averages |
Exponential smoothing
Exponential smoothing is a technique for smoothing the time series data. Exponential smoothing can smooth the data using the exponential window function.
Advantages of Exponential Smoothing
- Exponential Smoothing is easy to learn and apply.
- It gives more significance to recent observations.
- It gives more significance to recent observations.
- Exponential Smoothing leads to accurate predictions.
Disadvantages of Exponential Smoothing
- Exponential Smoothing leads to the predictions that lag behind the actual data trend.
- Exponential Smoothing cannot handle the data trends very well.
Some other data smoothing techniques are Moving Average Smoothing, Double Exponential Smoothing, and Holt-Winters Smoothing.
What data is used for smoothing method of forecasting?
Moving Averages
Exponential Smoothing
Double Exponential Smoothing
Triple Exponential Smoothing
Holt’s Linear Exponential Smoothing
C++ program to perform Data cleaning in data mining
Important topics to know:
- binning is a method to manage noisy data. optimal binning in python.
binning by clustering - equal width binning python
- equal frequency binning python
- binning machine learning
- equal width binning in r
- discretization by binning