Hadoop is not any kind of database ,but it’s a software ecosystem that allows for huge parallel computing. It is like a system which enables some kinds of NoSQL distributed database that can allow data to be distributed among thousand of servers consisting of servers having a little reduction in the performance.
A collection of Hadoop ecosystem is constituted through MapReduce, a computational model which takes intense data processes in input and distributes the computation across an infinite number of servers ( referred as Hadoop cluster in general ). It has changed the scenario by supporting through large processing needs of big data , a huge amount of data procedure may take 20 hours of the time of processing in a central relational database system , which may only take three minutes if distributed across a large Hadoop cluster of servers where all of the are processing in parallel. So now you will be able to understand what is Hadoop.
What is NoSQL
NoSQL (generally referred as not only SQL) provides a completely different framework of a database which allows higher performance , faster processing of information at large scale . In the other way, one can say no SQL is a database infrastructure that has been very well used and adapted fulfilling the heavy demands of the data base of big data.
Unlike relational databases of highly structured architecture, NoSQL database is not structured hence the efficiency of NoSQL can be easily achieved .NoSQL is all about the concept of distributed databases where the unstructured type of data may be stored across multiple processing nodes and mostly across multiple servers .. This distributed architecture provides the facility to NOSql database to extend itself horizontally as the data continues to overflow , just we need of adding more hardware to keep up , with no slowdown in its performance . The solution to handling some of the biggest data warehouse has many times been the “ NOSQL” So now you have an introduction that what is no SQL.
Big Data Hadoop: Key points
Hadoop can be only used for basic ETL transformations. Real data analysis is to be done in Oracle and BI tools.
This is a true fact that – programming anything which is slightly complicated with its solution on Hadoop is a real trouble . There’s also a large load of marketing – Oracle enables you to believe that Hadoop is just an alternate way to get data to the Oracle database. However in the present situation more and more easy languages and tools to work directly with Hadoop, also there are many smart people who are already doing advanced processing over Hadoop, processing that can’t be done on Oracle, or can’t be done that quick. Categorizing Hadoop as an ETL-only can miss most of its value.
Hadoop is not precise or accurate
There is no surety what they mean by that accuracy.
If the idea states that Hadoop is normally used for storing data that is worse than what normally goes into any DW system, than its true, but misleading one . DW by its definition has cleaned-up data. There is no magic going on – if you want clean data then you will need for processing it. It is also a fact that Hadoop often stores such data that needs greater processing in comparison with your OLTP system. But its not because Hadoop is inaccurate, instead because Hadoop provides the one and only way for businesses management and processing up this type of data – i.e. application logs, social media and images.
On the other hand, also if you think that Hadoop Big Data is not accurate because some contents are lost, calculations work on a sample of the data,where analysis is done over inconsistent data or whenever data gets often damaged. This is pretty much incorrect.
Hadoop is real time
It is not that kind of instead, Hadoop is built for batch processing. You load the data, generally in a lot since Hadoop works best over large chunks of data. Then you create map-reduce jobs for processing the data and harvesting the results. The smallest fastest job that you can imagine of creating, takes several seconds to run. There are few attempts to build real-time Hadoop, but it isn’t a main branch use case or product.
Hadoop is high performance
Its not that performance providing. Hadoop is an example of a system that scales without giving better performance on talking on a per-node basis. The smallest usable cluster to be used is around 20 nodes. And the clusters grows extremely large indeed. 1700 at LinkedIn. about 20,000 at Yahoo. In comparison to Oracle where one node often beats the performance of 2 and a 24 node RAC is considered large. Obviously, a lot of automation is required to manage a cluster of the size. Because Hadoop makes it quite easy to add nodes and scales so well in this way, there is often few optimization done on a single server. First of all, its written in Java , also with its long code path from MapReduce job to the disk controller. Its really hard to know what you want to see is actually what is going to happen. MapR is selling up the faster file system for Hadoop, thereby proving that there is still many chances for improvement. But I think that much could be improved by paying up attention to what the software does a single server level and tuning the hardware and OS in matching . Of course, very few people are capable for tuning in an entire storage pile from Java to disk controllers.
What is SQL
Database SQL : SQL stands for Structured Query Language. Simply describing , this is a language which is used for communicating with the database. SQL is a standard programming and interactive language that helps in getting information from the database and update it. SQL database provides skilled performance predictable for data protection, scalability with zero downtime and all this with almost about null administration. SQL Database supports SQL server tool, APIs, and several libraries which make it helpful and easier for one to easily access to just move and extend to the cloud. So this is a brief description of SQL database.
No SQL Database: It is referred to as non-SQL or we can also say as a an alternative non-relational database. A non-SQL database provides a mechanism of data storage and retrieval which is also being modelled by means other than the classical tabular relations that are used in relational database or SQL. NoSQL provides different varieties and guides with a wide range of architecture and technologies that help in solving big data as well as scalability performance issues that are not in relational databases.
NoSQL is very useful in the case when an enterprise has to access and analyze a huge amount of unstructured data which is remotely stored on various virtual servers in the cloud. Also, there is a myth saying NoSQL prohibits SQL which is not true. While it can be said that fewer of the NoSQL systems are fully nonrelational.
Examples of NoSQL Database
- Riak- Riak is an open source key-value database that is scripted in Erlang. It has the fault tolerance replication and also an automatic data distribution providing excellent performance.
- MongoDB- This is the most popular non-relational database system, mostly among the start-ups. It is almost an open source so that’s why it is free with the good customer services.
- Oracle NoSQL- Oracle’s entry into the NoSQL category.
Types of NoSQL Databases
|Key Value Store||High||High||High||None|
- Key Value Model: This is the least complex model of NoSQL option that stores data in a schema of the database. Example- Azure, Riak.
- Column Store: It can also be treated as wide Column store. It stores up data tables as columns than the respective rows. It’s not only an inverted table, but it’s more than that. As sectioning of the columns allows maximum potential in scalability and high performance.
- Document Database: It takes the key value concept and adds, even more, its complexity too. Each document in it consists of its own data and unique key which is then used to retrieve up the data. It is one of the best options for retrieving, managing and storing data that is document oriented.
- Graph Database: It includes the data that is interconnected in itself and is best represented as a graph.
As big data continues down over the path of growth, there is no doubt that these innovative approaches i.e firstly utilizing No SQL database architecture and secondly Hadoop software that will be central for allowing companies reaching full potential with data. In addition to this , the rapid advancement of data technology has sparked in a rising demand to hire up the next generation of upcoming technical geniuses who can build up these powerful infrastructure. The cost of this technology and the talent may not be that cheap, but for all of the value that big data is capable in bringing to table, companies are finding that it is quite worthy investments going to lead successful results in the upcoming scenario .