Skip to main content

You got it or not? How to tell if Big Data is something you need to worry about in your application

Big Data - Büyük Veri nedir

Part 1 in our Big Data Series

So you have an application with lots of data. Lots and lots of data. But is it Big Data? You wouldn’t be the first person to have trouble knowing when you’ve crossed the threshold into the modern, sometimes intimidating world of Big Data. That’s Data with a capital D. 10,000 records? 10 million records? One terabyte of data?

The answer is more complicated than a number. Yes, the volume of data is a huge factor, but you also have to consider the complexity of the data, how quickly new records come in and how quickly the data needs to be accessible to users.

A simple definition of Big Data

Michael Driscoll, Founder and CEO of Metamarkets, has this simple definition of Big Data: data that is distributed. In a nutshell, he says that you cross the threshold when your data can no longer be stored on one computer. Here’s a chart he used to explain this:

 

Class

Size

Manage with

How it fits

Examples

Small

<10 GB

Excel, R

Fits in one machine’s memory

Thousands of sales figures

Medium

10 GB – 1 TB

Indexed files, monolithic DB

Fits on one machine’s disk

Millions of web pages

Big

> 1 TB

Hadoop, distributed DBs

Stored across many machines

Billions of web clicks

 

Another common definition says that you cross the threshold into Big Data at the point when existing techniques and technology used to manage your data aren’t good enough anymore. This means normal hard drives can’t store it all, processing times slow down, searching or analysis takes too long, servers get too hot, the new records are coming in faster than you can ingest them, etc. You need to implement more sophisticated techniques and technologies — open source products like Spark or Hadoop, new ways to do ETL processing, more sophisticated load balancing, smarter search tools, etc.

Both of these definitions give us a simple starting point, but the ubiquitous use of distributed, cloud-based architectures for convenience and cost have blurred the line significantly between medium and big. Now, rather than a clear break when your server can no longer store / process all of your records, and you have to move to a distributed architecture (and someone at your company can say “we’ve definitely got Big Data now!”), instead it can be an incremental growth from a few AWS servers to more and more computing power and storage space until one day… surprise! you’ve transitioned to Big Data without even realizing it.

Defining what Big Data is graphic

The 4 V’sof Big Data

Another way of defining Big Data is what geek scholars have called the 4 Vs of Big Data – Volume, Variety, Velocity and Veracity. IMB’s Big Data & Analytics Hub has this helpful infographic, which explains the Vs: https://www.ibmbigdatahub.com/infographic/four-vs-big-data

Volume refers to the amount of data, Variety refers to its type and structure, Velocity is how quickly the new data is coming in and needing to be used, and Veracity is a measure of how accurate or trustworthy it is. Any combination of the first three factors can make an application cross that threshold into the Big Data realm, as shown in this diagram from Data Science Central: https://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data

So, what do we make of all of this? At the rate we’re collectively creating new information, which this fascinating article and infographic from Cloudtweaks quotes at 2.5 quintillion bytes of data every day –that’s 2.5 followed by 18 zeroes! – it won’t be long before every system or application we use will need to follow big data principles. Maybe in trying to define where the Big Data threshold is, asking whether your data is just plain data… or Data, you’re asking the wrong question – instead, you should be asking whether your systems, applications, technologies, search tools, and infrastructure are scaled properly for the amount of data you’re ingesting and the needs of your users. Or is your data just wasting space on your servers?

In our upcoming articles in this Big Data series, we’ll explore this question more by looking at how to get the most value out of your Big Data, how reporting is impacted by the introduction of Big Data, how ERP systems help businesses deal with their data challenges, and some tools and techniques you can be using right now to improve your system’s performance. Stay tuned for Part 2, How Big Data Changes Reporting.

703-444-2500
+43 1 319 15 19
+49 721 96 72 30
703-444-2500
+38 061 21 37 855
+420 241 931 544
+31 (0)85 3033 555
+421 903 717 980
+55 - 11 5054 - 5500
+852-2793-3317
+91 77559 04373
+62 (21) 293 19 366
+603-56124999
++94 76 666 9070
+66 81 6297375
+359 2 423 61 56

Contact

Your consent can be withdrawn at any time by sending an email to [email protected] . We assure you that we will treat this information as strictly confidential and that it will be used by abas Software GmbH and abas partners only (privacy policy).

North American Headquarters

703-444-2500
abas USA
45999 Center Oak Plaza
Suite 150
Sterling, VA20166

North American Headquarters

703-444-2500
abas USA
45999 Center Oak Plaza
Suite 150
Sterling, VA20166

Infocom Ltd.

+38 061 21 37 855
bul. t. Shevchenko, 56
Zaporozheye
69001

amotIQ sro

+420 241 931 544
Belnická 603
252 42 Jesenice u Prahy

ABAS Business Solutions Nederland BV

+31 (0)85 3033 555
abas Netherlands
Beilerstraat 24
9401 PL Assen

amotIQ sro

+421 903 717 980
amotIQ sro
Palarikova 36
900 28 Ivanka pri Dunaji

SHP Informática Ltda

+55 - 11 5054 - 5500
SHP Informática Ltda
Alameda dos Jurupis, 452, 7°andar - Conjunto 73/74
Sao Paulo - SP 04088-001

abas Business Solutions Limited

+852- 2793-3317
abas Hong Kong
1621, New Tech Plaza
34 Tai Yau Street
Hong Kong

abas Force India Pvt. Ltd.

+91-77559-04373
abas India
303, Aspiro Complex, Opp. Thyssenkrupp,
Pimpri Stn. Road, Pimpri,
Pune 411018

PT. abas Information Systems

+62 (21) 293 19 366
abas Indonesia
11620 Taman Aries, Jakarta barat Jakarta
Grand Aries Niaga G1-2H

Synchro RKK Sdn Bhd

+603-56124999
abas Malaysia
Sunway Geo Avenue,
Jalan Lagoon Selatan
Sunway South Quay,
Bandar Sunway
Subang Jaya Selangor 47500

Providence Global Pvt Limited

+94 770415387
Providence Global Pvt Limited
752/1
Dr De Silva Mawatha
Colombo 09

Wisdom Information Systems

+66-29340451/52/53
Wisdom Information Systems
208/5 Ladprao Soi 126
Bangkok 10310

abas Бизнес Сълюшънс България ООД

+359 2 423 61 56
abas Бизнес Сълюшънс България ООД
жк. Младост 1А, бл.553А, офис 1
1729 София