As Title, Big Data insights

Abstract — Steve Jobs, one of the greatest visionaries of our time was quoted in 1996 saying “a lot of times, people don’t know what they want until you show it to them”[38]indicating he advocated products to be developed based on human intuition rather than research. With the advancements of mobile devices, social networks and the Internet of Things (IoT)enormous amounts of complex data, both structured & unstructured are being captured in hope to allow organizations to make better business decisions as data is now vital for an organizations success. These enormous amounts of data are referred to as Big Data, which enables a competitive advantage over rivals when processed and analyzed appropriately. However Big Data Analytics has a few concerns including Management of Data-lifecycle, Privacy & Security, and Data Representation. This paper reviews the fundamental concept of Big Data, the Data Storage domain, the MapReduce programming paradigm used in processing these large datasets, and focuses on two case studies showing the effectiveness of Big Data Analytics and present show it could be of greater good in the future if handled appropriately.



  1. What is Big Data?(Section II)
  2. 2)Why is the transformation from traditional analytics to Big Data analytics necessary?(Section II)
  3. 3)How to meet demand for Computing Resources?(Section II)
  4. 4)What implications does Big Data have on the evolution of Data Storage? (Section III)
  5. 5)What are the inconsistencies of Big Data? (Section IV)
  6. )How is Big Data mapped into the knowledge space? (Section V)

A. What is Big Data?

  • Volume–Current data existing is in petabytes, which is already problematic; it’s predicted that in the next few years it’s to increase to zettabytes(ZB)[39]. This is due to increase use of mobile devices and social networks primarily.
  • Velocity–Refers to both the rate at which data is captured and the rate of data flow. Increased dependability on live data cause challenges for traditional analytics as the data is too large and continuously in motion.
  • Variety–As data collected is not of a specific category or from a single source, there are numerous raw data formats, obtained from the web, texts, sensors, e-mails, etc. which are structured or unstructured. This large amount causes old traditional analytical methods to fail in managing big data.
  • Veracity–Ambiguity within data is the primary focus in this dimension –typically from noise and abnormalities within the data.

B. Big Data analytics transformation

Fig. 1.Collaborative Big Data platform concept for Big Data as a Service[34]
  • Centralizing all aspects of storage and processing procedures in big data onto one platform
  • Ensures easy and rapid access to view other data; allowing developers to focus on their own work (algorithms/service)
  • Privacy coverage: Currently there are over 4.6 billion mobile phone subscribers. In addition, there are over 1 to 2 billion persons accessing the Internet at any given time[1]. Two of the most widely known social platforms:
  • Facebook has over 1 billion active users monthly, accumulating over 30 billion bits of shared content.
  • Twitter also has mass amounts of data, serving as a platform for over 175 million tweets a day[1].

C. MapReduce

Fig. 2.Map function
Fig. 3.Reduce function
  1. Absence of a standardized SQL query language:
    i.Current solution, providing SQL on-top of MapReduce [29]: Apache Hive -stipulates an SQL-like language on top of Hadoop [31]
    ii.Deficiency in data management features like advanced indexing and a complex optimizer[29]
    iii.NoSQLSolutions-MongoDB& Cassandra–enable queries similar to SQL HBase uses Hive [32]
  2. “Limited optimization of MapReduce jobs” [29]:
    i.“Integration among MapReduce, distributed file system, RDBMSs and NoSQL stores”[29]


A. Evolution of Data Storage

  • Late1920s–IBM takes successfully redesigns Basile Bouchon’s punch card invention, generating 20% of their revenue in the 1950s [17]
  • 1952 –IBM announces first magnetic tape storage unit; standard data storage technology in the 1950s and still in use in the entertainment industry [16]
  • 1956 –IBM invents first hard drive first hard drive capable of holding up to 5MB, pushing to 1GB in 1982, and a few TB currently
  • 1967 –First floppy disk is created, initially storing up to 360KB on a 5.25-inch disk leading to a 3.5-inch disk capable of storing 1.44MB[18]
  • 1982 -Concept of the compact disk (CD) is invented in Japan, CD-ROM is later developed with storage capacity of 650MB to 700MB; Equivalent to 450 floppy disks [19]
  • 2000 –USB flash drives debuts; similar to the floppy disks, data storage capacity improved overtime and continually improves [20]
  • 2010s –“The Cloud is estimated to contribute more than 1 Exabyte of data”[20]
  • “Storage infrastructure must accommodate information persistently and reliably”
  • “A scalable access interface to query and analyze a vast quantity of data”


A. Management of Data life-cycle

B. Data Privacy & Security

C. Data Representation

  • Presentation of data must be designed not to merely display singularity of data but rather reflect the “structure, hierarchy, and diversity of the data, and an integration technique should be designed to enable efficient operations across different datasets”[35]





Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store