Introduction To BIG DATA

Ayush Gupta
10 min readSep 16, 2020

Hello IT Enthusiastic People A very warm welcome to this “world of Information” →The Data

In this Technological world We talk about AI, ML, Cyber-Security, and many more.

I am referring to all the other bigger terms we are working in today’s IT world for these technologies or individual sector to work we require “The Data”. I think that you all will agree to this point.

Why this data is too important ?

Data allows organizations to more effectively determine the cause of problems. Data allows organizations to visualize relationships between what is happening in different locations, departments, and systems.

  1. Improves People’s Life
  2. Make Informed Decisions
  3. Make a Analytics, Researches [Either Medical or related to any other field] Preserved
  4. Find Effective Solution to Problems
  5. Stops Guessing Games
  6. Let us Avoid Complex Algorithms [As by using Advanced or upgraded version of those Algorithms]

We have endless facts list which makes the data too important for us to be preserved for our researches in the world IT not only for this but also for the Technologies to work.

I know you all will be having many Question that how this much big amount of data is managed, used effectively, Stored and many more.

Before addressing these Question and Do You know that answer to all these Question has a single Solution. But before Addressing that let me share some stats and facts from popular Companies.

1. Every 2 days we create as much data as we did from the beginning of time until 2003

→ There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days — Eric Schmidt, CEO of Google at Google’s 2010 Atmosphere Summit.

2. Google now processes over 40,000 search queries every second on average which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide

→ “When Google was founded in September 1998, it was serving ten thousand search queries per day” since 2006 it has served more search queries than that every second. If you go to, you can see how Google searches have been performed today. Fun fact, after Googles initial 17,000% growth rate year after year, search query growth has leveled out to around 10% — 15% a year.

3. By the end of 2017, SNS Research estimates that as much as 30% of all Big Data workloads will be processed via cloud services as enterprises seek to avoid large-scale infrastructure investments and security issues associated with on-premise implementations

4. Facebook users send on average 31.25 million messages and view 2.77 million videos every minute

5. Microsoft Azure along with open-source framework Storm and Hadoop built an integrated cloud-based solution with the main objective of big data improving the episodes of fraud recognition.

Here are some key daily statistics highlighted in the infographic:

  • 500 million tweets are sent
  • 294 billion emails are sent
  • 4 petabytes of data are created on Facebook
  • 4 terabytes of data are created from each connected car
  • 65 billion messages are sent on WhatsApp
  • 5 billion searches are made

By 2025, it’s estimated that 463 exabytes of data will be created each day globally — that’s the equivalent of 212,765,957 DVDs per day!

So Such big problem related to data is in front of you all and combining these words that is BIG + DATA = BIG DATA.

Know let me Introduce you all the Great Term and Technology actually I guess you might have heard “BIG DATA”.

→ For Watching Live Usage Data over Internet

1. [ ]

2. [ ]

3. [ ]

What is this “BIG DATA” ?

Big data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

The use of Big Data is becoming common these days by the companies to outperform their peers. In most industries, existing competitors and new entrants alike will use the strategies resulting from the analyzed data to compete, innovate and capture value.

Big Data helps the organizations to create new growth opportunities and entirely new categories of companies that can combine and analyze industry data. These companies have ample information about the products and services, buyers and suppliers, consumer preferences that can be captured and analyzed.

The best examples of big data can be found both in the public and private sector. From targeted advertising, education, and already mentioned massive industries (healthcare, insurance, manufacturing or banking), to real-life scenarios, in guest service or entertainment. By the year 2020, 1.7 megabytes of data will be generated every second for every person on the planet, the potential for data-driven organizational growth in the hospitality sector is enormous.

Brief History of Big Data:

Big Data is by the way is not the new term for the world.

  1. It all started in the 18000 BCE as per facts , The earliest examples we have of humans storing and analyzing data are the tally sticks. The Ishan go Bone was discovered in 1960 in what is now Uganda and is thought to be one of the earliest pieces of evidence of prehistoric data storage. Paleolithic tribespeople would mark notches into sticks or bones, to keep track of trading activity or supplies. They would compare sticks and notches to carry out rudimentary calculations, enabling them to make predictions such as how long their food supplies would last.
  2. In 2400 BCE, The abacus — the first dedicated device constructed specifically for performing calculations — comes into use in Babylon. The first libraries also appeared around this time, representing our first attempts at mass data storage.

Since The article is already to lengthy so to cut short that I have provided some link inside articles you can refer these: [ ]

For Evolution of Big Data refer: [ ]

But In year 1663, John Graunt dealt with “overwhelming amounts of information” as well, while he studied the bubonic plague, which was currently ravaging Europe. Graunt used statistics and is credited with being the first person to use statistical data analysis. In the early 1800s, the field of statistics expanded to include collecting and analyzing data.

Since then evolution in the field of Data management started, Now directly Jumping to when BIG DATA term was Introduced correctly around 2005, when it was launched by O’Reilly Media.

After This up-to know BIG DATA is on the verges of upgrade. As still many problems are faced even after BIG DATA is Introduced as this Technology is superb but still needs many upgrade or may be we might require it’s better version or some newer tech. which will more effective than BIG DATA.

Problems Faced In Big Data:

Facts Regarding BIG DATA Problems faced by companies:

  1. The USA has to bear with a cost of $600 billion annually due to bad data or poor data quality.
  2. A study has successfully revealed that more than 37.5% of enterprises believe that analyzing the big data is their biggest challenge.
  3. It was expected that by 2020, one-third of the total data will be stored or pass through the cloud and we will be left with almost 35 zettabytes of data.
  4. It has been noticed that visualization is simply becoming a buzzing concept as it makes the analysis of data an easier chapter. According to an Information Week Business Survey, 45% of the 414 respondents mentioned the ease-of-use challenges with complicated software and less technical people as the second biggest obstacle in accepting BI or analytics products.
  5. For a typical Fortune 1000 company, just a 10% increase in data accessibility will result in more than $65 million additional net income

Big Time Big Data Statistics

  • The big data analytics market is set to reach $103 billion by 2023.
  • Poor data quality costs the US economy up to $3.1 trillion yearly.
  • In 2020, every person will generate 1.7 megabytes in just a second.
  • Internet users generate about 2.5 quintillion bytes of data each day.
  • 95% of businesses cite the need to manage unstructured data as a problem for their business.
  • 97.2% of organizations are investing in big data and AI.
  • Using big data, Netflix saves $1 billion per year on customer retention.
  • WhatsApp users exchange up to 65 billion messages daily.
  • People are generating 2.5 quintillion bytes of data each day.
  • Nearly 90% of all data has been created in the last two years.
  • Unstructured data is a problem for 95% of businesses.
  • By 2023, the big data industry will be worth an estimated $77 billion.
  • Big data will entirely depend on automated analytics systems by 2020.

For reference : [ ]

why is big data important?

Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. In his report Big Data in Big Companies, IIA Director of Research Tom Davenport interviewed more than 50 businesses to understand how they used big data. He found they got value in the following ways:

  1. Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data — plus they can identify more efficient ways of doing business.
  2. Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately — and make decisions based on what they’ve learned.
  3. New products and services. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs.

→ Once analyzed, this data helps in a multitude of ways.

  1. In healthcare, it helps avoid preventable diseases by detecting them in their early stages.
  2. It is immensely useful in the banking sector, where it aids in recognizing illegal activities such as money laundering.
  3. In meteorology, it helps study global warming.
  4. It also helps in conducting Marketing strategies by analyzing Public demands.
  5. Helps in making stats in the field of Cyber threads.
  6. Helps in making strategies for effective use of strategies

For more articles on Big data refer :→


Best Big Data Tools:

  • Apache Hadoop
  • Apache Spark
  • Flink
  • Apache Storm
  • Apache Cassandra
  • MongoDB
  • Kafka
  • Tableau

Brief About BIG DATA Using HADOOP

Hadoop is an open-source software framework used for storing and processing Big Data in a distributed manner on large clusters of commodity hardware. Hadoop is licensed under the Apache v2 license. Hadoop was developed, based on the paper written by Google on the MapReduce system and it applies concepts of functional programming.


Using the solution provided by Google, Doug Cutting and his team developed an Open Source Project called HADOOP.

Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others. In short, Hadoop is used to develop applications that could perform complete statistical analysis on huge amounts of data.

“ Thanks For Visiting And Stay Tuned For More Hope That You Like The Content. “

!!!! A very warm Thanks to Vimal Data Sir for Introducing to such a great technology !!!!