Data Stream Mining – Data Mining

Data Stream Mining – Data Mining

In this tutorial, we will cover the basics of  Stream Mining in Data Mining.

What is Streaming?

The stream is a term that can be used when media is sent in a continuous stream of data and the media can play as it receives to the receiver.

Streaming is a technique in which data is sent by the sender in the compressed form over the Internet. After receiving the data, it shows immediately to the user. Streaming does not mean to store the full data to the storage devices(e.g., hard drive).

Advantages of data streaming

The user doesn’t need to wait to download a file to play it. Because the media is sent in a continuous stream of data it can play as it arrives.

Examples of data streaming

1.    Real-time ATM transactions

2.      Sensors data

3.    Listening to music – (When no file to download)

4.    Watching videos on Youtube or Dailymotion, Netflix – (When no file to download)

5.    Live events, (When no file to download), just a continuous stream of data.

6.      Internet traffic (just visit the site, uses the bandwidth of website and data can be read-only online)

7.    Telephonic conversations

8.     Data generated by communication networks.

Datastream mining can be considered a subset of general concepts of machine learning, and knowledge discovery, and data mining.

Software and Tools for Data Stream Mining

  1. RapidMiner
  2. MOA (Massive Online Analysis)

MOA (Massive Online Analysis) Stream Mining Tool

  • MOA is the free open-source software. MOA has several machine learning algorithms.
  • Some of the most used algorithms of MOA are regression, classification, clustering, and outlier detection. MOA supports bi-directional interaction with Weka(Data Mining Tool).

RapidMiner Stream Mining Tool

RapidMiner is the commercial software and useful for the following domains;

    • knowledge discovery
    • Data mining
    • Machine learning ETC.