Join the Community Java Q&A

Hadoop – Big Data overview

What is Big Data?

Big Data is a term used to describe large collections of data or data sets that may be structured, unstructured or semi-structured and grow so large and quickly that it is difficult to manage with regular database or analytic tools such as Cognos, Hyperion, SAP business objects and Informatica.

Driving growth of Big Data

Today, we have more generators of data than ever before. The data creators include mobile phones, laptops, tablets, desktop computers and sensors and so on.

Interesting statistics providing examples of this data explosion are:

  • There are more than 2 billion internet users in the world today.
  • 4.6 billion Mobile phones in 2011.
  • 7 TB of data are processed by Twitter every day.
  • 10 TB of data are processed by Facebook every day.
  • Interestingly, approximately 80% of these data are unstructured.

With this massive quantity of data, businesses need fast, reliable, deeper data insight.

Hadoop-Big-Data

Challenges of Big Data

  • Capturing and managing lots of information from various sources.
  • Working with different types of data.
  • Extracting value from the colossal amount of information.
  • Data explosion over years.

Over a history that spans more than 25 years SQL database servers have traditionally held several Gigabytes of information—and reaching that milestone took a long time. In the past 10 – 15 years with the help of warehousing analytic tools (Cognos, Hyperion, Teradata) we were able to store up to terabytes. And in the last 5 years with distributed and parallel computing we are storing several petabytes of information. This entire new data explosion with variety of structured, unstructured and semi structured data has placed IT organizations under great stress to extract value from the information.

New Data Categories

Data in the relational databases is structured information with transactional data– the kind that fits in rows and columns. While in the world of Big Data, sub-transactional data plays a big part. 80% of the Big Data is semi-structured and unstructured. Here are a few examples.

  • Twitter tweets
  • Facebook messages
  • Photos / images
  • Video
  • Audio
  • Text messages
  • Shopping cart information
  • XML documents

Why we need to extract value from the Big Data

Big Data can provide new insights into everything about your enterprise. Results / reports generated from Big Data would help your enterprise to increase the business. Here are a few of them.

  • Knowing about position of organization vs your competitors.
  • Strategies you can implement to increase profitability.
  • Comparing prices of similar products for two e-Commerce organizations, there by implementing respective business strategies.
  • The way you deliver products and services to the market place.
  • To know trends of the users Ex: Extracting user interests from Facebook users.
  • The way customers locate and interact with you.