Big Data definitely the buzzword which you get to hear all around you from day one till today it is the fastest growing technology in recent years and is all set to reach Great Heights and revolution.
Today we will talk about Big Data Technologies which change the world of information technology and also about few emerging Big Data Technologies which are capable enough to take over the IT world sooner to new position and horizon.
So we shall start from scratch, we shall understand what is a Big Data Technology and why do we need it, later we should understand the two main types of Big data Technologies, after that I will take you to the world of top BIG Data Technologies where we will be surfing the crucial ones, and finally get into the interesting part where we will be looking into the few upcoming big data Technologies.
I hope I made myself clear with the agenda so let us begin with the first topic.
Big Data definitely the buzzword which you get to hear all around you from day one till today it is the fastest growing technology in recent years and is all set to reach Great Heights and revolution.
Today we will talk about Big Data Technologies which change the world of information technology and also about few emerging Big Data Technologies which are capable enough to take over the IT world sooner to new position and horizon.
So we shall start from scratch, we shall understand what is a Big Data Technology and why do we need it, later we should understand the two main types of Big data Technologies, after that I will take you to the world of top BIG Data Technologies where we will be surfing the crucial ones, and finally get into the interesting part where we will be looking into the few upcoming big data Technologies.
I hope I made myself clear with the agenda so let us begin with the first topic.
What is Big Data technologies?
Big Data technologies can be defined as a technological mechanism that can process and extract information from an extremely complex and large data rates that's the traditional data processing software could never deal with.
Now as you understand the basic definition of Big Data technology lets understand why we need it. We need Big Data Technologies so that we use it to perform accurate analysis to generate conclusions and predictions so as to minimize the risks in real-time for future work. Sooner we will talk deeper about this.
Now let’s talk about the two main and major categories into which the Big Data Technologies classified into, the Big Data technologies are mainly classified into two types Operational Big-Data and Analytical Big-Dat
Operational Big-Data
It is all about the normal day-to-day data which we usually generate, the data that the organizations produce, which might include the online transactions, social media, or the data from particular college, school, etc. you can even consider this to be a kind of raw data which is used to feed up the analytical Big Data Technologies. A few examples include online ticket bookings such as bus tickets, train tickets, flight tickets movie tickets, and much more. The next one is Online Shopping which might include Amazon, Walmart, etc. The next one is social, media I guess this one doesn't need much explanation, data from large social media sites such as Facebook, Instagram, WhatsApp, and a lot more fall under the Operational Big Data. Let us take one last example of operational big data and simple one which is related to the information of a particular organization, for example, the employee details of a multinational company.
Analytical Big-data
I feel you are already guessing what exactly do that would be Analytical Big Data is, yet let me explain this further to you. Analytical Big Data is a little complex than the Operational Big Data, to be more accurate the Analytical Big data is where the actual performance comes into the picture and where the few crucial important business decisions take place based on analyzing the Operational data. A few examples are Stock Marketing, carrying out the Space Mission where every single bit of information is crucial, Weather Forecast Information where civilians will be aware of any main natural disasters that may happen. Medical Field, where a particular patient's medical health status can be monitored, and future decisions on
maintaining his health would be taken and many more.
Top Big Data Technologies
:The Big Data Technologies
smashed into 4 fields as below
1-Storage
2-Analytics
3-Mining
4-Virtualization
Now
let us deal with the technologies falling within these fields, their features,
capabilities, companies using them.
Big Data Technologies in Data Storage
Below
are the most important tools and technologies in the Storage Field:
Apache
Hadoop
§ Hadoop designed to work in Distributed Data
processing environment.
§ Use commodity Hardware
§ Designed to process data in different machines
and different locations with high speeds and low cost
§ Developed by Apache Software Foundation in year
2011
§ Written in JAVA
§ Current Stable version: Hadoop 3.11
Now let us see the companies
which are using Hadoop:
MongoDB
No SQL Databases documented
offered like MongoDB offered direct alternatives for rich schema for the large
Databases. This allows MongoDB great flexibility when dealing with large
volume of Databases at distributed architecture. Below are highlight the
feature of this type of Database:
>Developed by MongoDB in the year of 2009
>Written in: C++, Go, JavaScript, and Python
>Current Stable version: MongoDB 4.0.10
Below the list of known
companies using MongoDB:
RainStor
The RainStor is developed and
designed to manage and analyze the large data for big enterprises, it uses
de-duplicated technologies to organize the process of storing a large amount of
data for reference.
Below a list of features of this
technology:
> Uses De-Duplicated techniques
> Originally developed for internal use of the ministry of defense of UK.
> Developed by RainStor Software company in the year 2004
> Originally developed for internal use of the ministry of defense of UK.
> Developed by RainStor Software company in the year 2004
> Works like SQL
> Current Stable version: RainStor 5.5
Below a list of the companies
using RainStor
Splunk Hunk
Splunk Hunk is a kind of
multiple players with multiple capabilities. So let's discuss about Hunk, Hunk
lets you access data on a remote set of clusters, it allows you to use Splunk
search process language to analyze your data. With Hunk you can report and
virtualize a large amounts of data from your Hadoop and SQL databases.
Below a summary of the features
introduced by Splunk Hunk:
> Access data from remote Hadoop clusters
> Developed by: Splunk INC, in year 2013
> Written in JAVA
> Current Stable version: 6.2
Big-Data Technologies in Data Mining
Below a list of most important
technologies in Data Mining Category:
Presto
Presto is an open-source
distributed SQL query engine designed for running analytical queries again data
storage from different sizes starting from Gigabytes to BetaBytes. Presto allows
querying data from where it lives, it allows querying standard or proprietary
databases. The single query from Presto can combine data from multiple sources.
Presto is targeted by analysts who wish to have response time ranges from sub
seconds to minutes.
Below summary list about
Presto:
> Open Source Distributed SQL Query Engine
> Developed by Apache Foundation in the year of 2013
> Written in: JAVA
> Current Stable version: Presto 0.22
Below a list of companies using
Presto:
RapidMiner
It is a centralized solution
which has powerful features and graphical user interface, that enables the user
to create, deliver, and maintain predictive analytics.
Below a list of common features:
> Powerful and robust Graphical User Interface
> Developed by RapidMiner in 2001
> Written in JAVA
> Current Stable Version: RapidMiner 9.2
Below are the companies use
RapidMiner:
ElasticSearch
ElasticSearch is a search engine
based on the Lucent library. It provides a distributed multitalented which is keep
full search engine with an HTTPS user interface and schema-free JASON document.
Below a list of ElasticSearch:
> Based on Lucent Library
> Developed By Elastic NV in the year 2012
> Written in JAVA
> Current stable version: ElasticSearch 7.1
Below a list of companies using
ElasticSearch:
Big-Data Technologies in Data Analytics
Below a list of most important
technologies in Data Mining Category:
Kafqa
It is a distributed streaming
platform, what does that mean?
The streaming platform has 3
capabilities: Publish, Subscribe and Consume
Below a list of this tool
features:
> Distributed streaming platform
> Developed by Apache Software Foundation in 2011
> Written in: Scala, JAVA
> Current stable version: Apache Kafqa 2.2.0
Now let us look at the
companies using Kafqa:
Splunk
Splunk is used to capture,
index and correlate the real-time data in a searchable repository from which it
can generate graphs, reports, alerts, dashboards, and visualization. Splunk is horizontal technology use for application management, security, compliance, business
and web analytics.
Below a summary of Splunk used
in Data Analytics:
> Used in Application management, security, and web
analytics
> Developed by Splunk INC in year 2014
> Written in: AJAX, C++, Python, XML
> Current stable version: Splunk 7.3
Below list of companies using
Splunk:
KNIME
KNIME allows users to visually
create data flows, selectively execute some or all analysis steps and inspect
the results, models in interactive views
Below a list of the features for
this tool
> Used to create data flows
> Uses Extension mechanism
> Developed by KNIME in year 2008
> Written in: JAVA
> Current Stable version: KNIME 3.7.2
Below a list of the main
companies using KNIME:
Apache Spark
The well-known big data
framework, Spark in the general execution engine in which Spark platform and
its functionality built. It provides in-memory computing capabilities to
deliver speed a generalized execution model to support a wide variety of
applications, and JAVA, Scala and Python EPS for ease of development.
Features Summary:
> Cluster computing tool
> Developed by Apache Software Foundation
> Written in: JAVA, Scala, Python, R
> Current Stable version: Apache Spark 2.4.3
Let’s see who is using Apache
Spark from the companies:
R Language
R is programming language and free
software environment for statistical computing and graphs supported by R
Foundation. It is widely used by statisticians and data miners.
Below list of common features:
> Statistical computing and graphics
> Developed by: R Foundation in 2000
> Written by: Fortran
> Current stable version: R-3.6.0
Companies using the R
Programing language:
Blockchain
The major capabilities of
Blockchain is smart contract, privacy, consciences. You can append-only a distributed system of records across a business network, while in smart
contract the business time are embedded in the transaction database and executed
with transactions. The major features of privacy are ensuring an appropriate
visibility, making the transaction secure authenticated and verifiable.
Below a summary of the
Blockchain technology:
> Append distributed system od Records
> Business terms embedded transactions
> Transaction authentication
> Network verified transactions
> Developed by: Bitcoin
> Written in: JavaScript, C++, Python
> Current Stable version: 4.0
Sample of the companies using
Blockchain:
Big-Data Technologies in Data Visualization
This section will cover the
last category of Big-Data and its technologies and the most used software and
tools
Tableau
The main features that Tableau
can offer are, mobile-ready dashboards, data notifications, dashboard
commenting, create no-code data queries, translates codes to visualization,
interactive dashboards, and metadata management
The list of features listed
below:
> Creates No-code data queries
> Can import all ranges of data sizes
> Developed by: TableAU in the year of 2013
> Written in: JAVA, C++, Python, C
> Current stable version: TableAU 8.2
> Sample of the companies using
TableAu are:
Plotly
Plotly is mainly used to
create graphs in a faster and more efficient way. Plotly API library is support
in Python, Matlab and Julia.
Below list of Plotly features:
> Creates graphs faster and more efficient
> Developed by: Plotly in year of 2012
> Written in: JavaScript
> Current stable version: Plotly 1.47.4
Main companies using Plotly:
Emerging Big-Data Technologies
In this section, we will be discussing the upcoming Big Data
Technologies, covering them along with their features and the companies
depending on them
TensorFlow
Below the list of features of this tool
> End-to-End
open-source platform for machine learning
> Developed
by: Google Brain team in 2019
> Written
in: Python, C++, CUDA
> Current
stable version: TensorFlow 2.0 Beta
The major companies planning to use TensorFlow are:
Apache Beam
Below list of main features of this tool:
> Provides
Portable API layer
> It
can be executed in different execution engines
> Developed
by: Apache Software foundation in 2016
> Written
in: JAVA and Python
> Current
stable version: Apache Beam 0.1.0
Below list of companies planning to use Apache Beam:
Docker
The list of features for this
tool are:
> Create,
deploy and run applications using containers
> Developed
by: Docker INC in 2003
> Written
in: GO
> Current
stable version: Docker 18.09
List of companies planning to
depend on Docker are:
Apache AirFlow
Here is the list of features
of this technology:
> Work
flow automation and scheduling system used to manage data pipelines
> Define
the required particular tasks needed in the workflow to provide easier
maintenance, and testing
> Developed
by: Apache Software Foundation in 2019
> Written
by: Python
> Current
stable version: Apache AirFlow 1.10.3
Companies using AirFlow:
Summary
Hope by end of this article
you are now familiar with what Big Data is? What are the fundamental of Big
Data technologies? The main types of Big data? And the used technologies in
big data field. In addition to the main companies using the different Big data
technologies
0 Comments