Hadoop Technology and No SQL: An outline

Hadoop is not any kind of database ,but it’s a software ecosystem that  allows for huge parallel computing. It is like a system which enables some kinds of  NoSQL distributed database that can allow data to be distributed among thousand of servers consisting of servers having a little reduction in  the performance.

A collection of Hadoop ecosystem is constituted through MapReduce, a computational model which  takes intense data processes in input and distributes the computation across an infinite number of servers ( referred as Hadoop cluster in general ). It has changed the scenario by supporting through large processing needs of big data , a huge amount of data procedure may take 20 hours of the time of processing in a central relational database system , which may only take three  minutes if distributed across a large Hadoop cluster of servers where all of the are processing in parallel. So now you will be able to understand what is Hadoop.

What is NoSQL

NoSQL (generally referred as not only SQL) provides  a completely different framework of a database which allows higher performance , faster  processing of information at large scale . In the other way, one can say no SQL is a database infrastructure that has been very well used and adapted fulfilling the heavy demands of the data base of big data.

Unlike relational databases of highly structured architecture, NoSQL database is not structured hence the efficiency of NoSQL can be easily achieved .NoSQL is all about the concept of distributed databases where the unstructured type of data may be stored across multiple processing nodes and mostly across multiple servers .. This distributed architecture provides the facility to NOSql database to extend itself horizontally as the data continues to overflow , just we need of adding more hardware to keep up , with no slowdown in its performance . The solution to handling some of the biggest data warehouse  has many times been the “ NOSQL” So now you have an introduction that what is no SQL.

MapR, what is hadoop

 

Big Data Hadoop: Key points

Hadoop can  be only used for basic ETL transformations. Real data analysis is  to be done in Oracle and BI tools.


This is a true fact that  – programming anything which is slightly complicated with its solution  on Hadoop is a real trouble . There’s also a large load of marketing – Oracle enables  you to believe that Hadoop is just an alternate  way to get data to the  Oracle database. However in the present situation more and more easy  languages and tools to work directly with Hadoop, also  there are many smart people who are already doing  advanced processing over  Hadoop, processing that can’t be done on Oracle, or can’t be done that quick. Categorizing Hadoop as an  ETL-only can miss most  of its value.

Hadoop is not precise  or accurate
There is  no surety  what they mean by that accuracy.
If the idea states that   Hadoop is normally used for storing  data that is worse  than what normally goes into any  DW system, than its true, but misleading one . DW by its definition has cleaned-up data. There is no  magic going on – if you want clean data then you will  need for  processing  it.  It is also a fact  that  Hadoop often stores such  data that needs greater processing in comparison with your OLTP system. But its not because Hadoop is inaccurate, instead  because Hadoop provides the one and only way for businesses  management  and processing up  this type of data – i.e.  application logs, social media and  images.
On the other hand, also  if you think that Hadoop Big Data is not accurate because some contents are lost, calculations work on a  sample of the data,where  analysis is done over  inconsistent data or whenever data gets  often damaged. This is pretty much incorrect.

Hadoop is real time
It is not that kind of instead,  Hadoop is built for batch processing. You load the data, generally in a lot  since Hadoop works best over  large chunks  of  data. Then you create map-reduce  jobs for  processing  the data and harvesting  the results. The smallest fastest job that you can imagine of creating, takes several seconds to run. There are few  attempts to build real-time Hadoop, but it isn’t a main branch use case or product.

Hadoop is high performance
Its not that performance providing. Hadoop is an example of a  system that scales  without giving better  performance on talking on a per-node basis. The smallest usable cluster to be used is around 20 nodes. And the clusters grows extremely large indeed. 1700 at LinkedIn. about 20,000 at Yahoo. In comparison  to Oracle where one node often beats the performance of 2 and a 24 node RAC is considered large. Obviously, a lot of automation is required to manage a cluster of the  size. Because Hadoop makes it quite  easy to add nodes and scales so well in this way, there is often few optimization done on a single server. First of all, its written in Java , also with  its long code path from MapReduce job to the disk controller. Its really  hard to know  what you want to see is actually what is going to  happen. MapR is selling up the faster file system for Hadoop, thereby proving that there is still many chances  for improvement. But I think that much could be improved by paying  up attention to what the software does a single server level and tuning the hardware and OS in matching . Of course, very few people are capable for  tuning in  an entire storage pile  from Java to disk controllers.

What is SQL

Database SQL : SQL stands for Structured Query Language. Simply describing , this is a language  which is used for  communicating  with the database. SQL is a standard programming and interactive language that helps in  getting  information from the database and update it. SQL database provides skilled performance predictable for  data protection, scalability with zero downtime and all this with almost about null   administration. SQL Database supports SQL server tool, APIs, and  several libraries which make it helpful and easier for one to easily access  to just move and extend to the cloud. So this is a brief  description of SQL database.

No SQL Database: It is referred to  as non-SQL or we can also say as a an alternative  non-relational database. A non-SQL database provides a mechanism of  data storage and retrieval which is also being modelled by means other than  the classical  tabular relations that are used in relational database or SQL. NoSQL provides different varieties and guides with  a wide range of architecture and technologies that help in  solving  big data as well  as scalability performance issues that are not in relational databases.

NoSQL is very useful in the case  when an enterprise has to access and analyze a huge amount of unstructured data which  is remotely stored on various virtual servers in the cloud. Also, there is a myth saying  NoSQL prohibits SQL which is not true. While it can be said that fewer of the NoSQL systems are fully nonrelational.

 Examples of NoSQL Database

  • Riak- Riak is an open source key-value database that is scripted in Erlang. It has the fault tolerance replication and also an automatic data distribution providing excellent performance.
  • MongoDB- This is the most popular non-relational database system, mostly among the start-ups. It is almost an open source so that’s why it is free with the good customer services.
  • Apache’s CouchDB- This is the true DB for the web base services. It basically uses the JSON data exchange format for storing its documents. JavaScript for indexing, combining and transforming those
  • Oracle NoSQL- Oracle’s entry into the NoSQL category.

Types of NoSQL Databases

 

TYPES PERFORMANCE SCALABILITY FLEXIBILITY COMPLEXITY
Key Value Store High High High None
Column Store High High Moderate Low
Document High Variable High Low
Graph Database variable Variable High high
  • Key Value Model: This is the least complex model of NoSQL option that stores data in a schema of the database. Example- Azure, Riak.

 

  • Column Store: It can also be treated as wide Column store. It stores up data tables as columns than  the respective rows. It’s not only an inverted table,  but it’s more than that. As sectioning of the columns allows maximum  potential in scalability and high performance.

 

  • Document Database: It takes the key value concept and adds, even more, its complexity too. Each document in it consists of its own data and unique key which is then used to retrieve  up the data. It is one of the best options for retrieving, managing and storing data that is document oriented.

 

  • Graph Database: It includes the data that is interconnected in itself and is best represented as a graph.

 

 

As big data continues down over the  path of growth, there is no doubt that these innovative approaches i.e firstly utilizing No SQL database architecture and  secondly Hadoop software that will be central for allowing companies reaching  full potential with data. In addition to this , the rapid advancement of data technology has sparked in  a rising demand to hire up  the next generation of upcoming technical geniuses who can build up these powerful infrastructure. The cost of this  technology and the talent may not be that cheap, but for all of the value that big data is capable in  bringing to table, companies are finding that it is  quite  worthy investments going to lead successful results in the upcoming scenario .

 

Leave a Reply

Your email address will not be published. Required fields are marked *