This article is written by Swesh Saurabh and pursuing a Diploma in Technology Law, Fintech Regulations, and Technology Contracts. This article has been edited by Zigishu (Associate, Lawsikho).
This article has been published by Sneha Mahawar.
Table of Contents
Before understanding what is “Big Data”, first we need to understand where this term has come from and what it has replaced. Earlier, we had the term “Data” only, which we now call small data. But to be clearer, there is no such term as Small Data, it’s just the Data. But now when we have a new term i.e. Big Data, we use the term Small Data so that we can compare these two just by their names.
Before big data, generally, we used to talk about traditional data or traditional databases. And the actual story of Big Data starts in the 1970s. In 1970, the scientist E. F Codd designed the whole database and also made huge evolutions in Relational Database (RDBMS). If we talk about RDBMS, in this we store the data in the form of structure data. It means that before storing the data we had to make a structure. So in RDBMS, we use tables, and through these tables, we used to make a structure and then store the data in it.
Structured data means the data for which a proper or standard format is available. But the problem is that the most structured data is not used in the present time.If we talk about all the data which currently exists in the world, only 10% of the total data is structured data and the rest 90% is in an unstructured format. Unstructured data contains photos, videos, etc. for example if we look into a website, some websites contain only photos, some websites contain only videos, and some websites contain a mixture of both and text also. What it means is that we have the data but in unstructured form and we don’t make any structure for that data and it’s not even possible now to make a structure for this huge amount of data that we have now.
So we can say that for unstructured data the term which we use is Big Data and for structured data, we have the term “data” which we now call Small Data.
What is big data
Here in the term Big Data, the word “Big” stands here for nothing but the size of the Data.
So, Big Data means a huge amount of data that is so large that no traditional data management tools can store it or process it efficiently. And it is not the end of this; the Data is now growing exponentially with time.
3 Vs of big data
All the structured data that we have generally started in KBs, MBs, and maximum in GBs or TBs. Now the data which we have is not in terabytes but it is present in thousands of Terabytes (Pet bytes) and Exabytes (1000 PBs). Now we use these units because the size of the data which is present is so huge.
Let’s understand this with a simple and relatable example.
Facebook generates 4 Pet bytes of data daily. Billions of photographs, billions of likes, around more than 2 billion users it has. So all the data which is being created daily is created on what basis? The answer is that they have proper storage available to store this data.
Velocity talks about the speed at which the data is increasing. If we talk about the small data or structured data, it is increasing gradually. Whereas Big data is increasing exponentially. It means that the data which is generated on a daily basis is increasing so fast.
If we have a look at Facebook only, more than 350 million photos are being uploaded on a daily basis and the total number of photos till now is around 300 billion. If we talk the number of likes that Facebook receives in a day is more than 4 million. So, all this data that is being created at this velocity is itself a challenge.
Let’s understand this with a simpler example, suppose the total amount of data that is present in the world till now is ‘X’. Then 90% of this data X has been created in the last 4-5 years. From this, you can easily conclude how fast the data is being created.
Variety refers to the different types or nature of data that is present; it could be both structured and unstructured. In earlier days, the sources of data that were considered by most of the applications were in the form of spreadsheets and databases. Nowadays, data in the form of photos, monitoring devices, audio, emails, videos, PDFs, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining, and analyzing data.
Storage of data
Small data is stored generally in a centralized format, for example, the data of universities or colleges are mostly present in structured forms like data of attendance, library. To save this data we can use the centralized environment which means that your data is present locally to you.
But big data cannot be stored in one place. Like facebook cannot store its data in one place. They have data centers all over the world and they keep everything distributed at different places at different centers. It means the data is present globally; the reason behind this is that if all the data is kept in one place then a single point of failure could occur and all the traffic would be at one place.
For structured data we were using SQL servers of Microsoft, Oracle, all of these software were designed to deal with, manage and store structured data. But when this data becomes Big Data, then all this software cannot be used for that because everything like processing power, storage capacity got increased. That’s why now we use technologies like Hadoop, Spartz, etc.
Big data and increasing cyber crimes
There has been an exponential increase in the frequency and type of cyber attacks with the rapid growth of the internet. Cybercriminals are exploring various new ways to commit cybercrimes and to get access to computer systems and networks illegally. The question is that is there any role of Big Data in it.
Nowadays corporations are allowing their partners and customers to access data in different ways to facilitate collaborations but this is helping the cyber attackers by making the networks more vulnerable to cyber-attacks.
There has been a corresponding increase in the hacking skills of cyber attackers with the advent of Big Data. The evading of traditional security measures such as signature-based tools is now a thing of the past.
Global cybercrime continues to increase at a rapid pace and effective chief information officers need to get better at anticipating criminal behavior to provide effective and efficient risk management. As both information risk and cyber security threats increase, organizations need to move away from reacting to incidents and towards predicting and preventing them. While organizations don’t always need to understand how the attack works from an in-depth technical perspective, they do need to understand how the attacker gets past their defense wall.
In a recent information security forum held in 2016, big data was stated to be one of the five major global security threats.
The large data sets if aggregated, stored and processed without security measures then could make a huge amount of information vulnerable to cyber attacks.
Goodman (2015) stated that big data is more prone to cyber-attacks and that too with an ensured sense of causing bigger damage to a large number of people at the same time, creating a chaotic situation.
Big data and India
On the other hand, if we look at the bright side, big data can also be used as a tool to strengthen cyber security. The same has been recognized by the Data Security Council of India by emphasizing the need for managing big data to develop a framework of cyber security at the national level. Big data has already been used in cyber security in terms of locating weak links in cyber security walls, real-time surveillance, fraud detection, and guarding vulnerable areas in social media.
Laws on data privacy in India
After learning all the possibilities of your personal data being at risk, you must be thinking or worried about the laws of our country to prevent such kind of data leak. Unfortunately, there isn’t any comprehensive and dedicated data protection legislation in India. However, we have the Information Technology Act, 2000 in which there are some provisions related to Data theft, etc.
Information Technology Act, 2000
Section 43A- Compensation for failure to protect data
Under this section, it is clearly stated that if any corporate possesses or handles any data of any person which is sensitive in nature and it fails to maintain and provide rational care and security procedure because of negligence on its part and because of that the person suffers any damage then, in that case, the corporate shall be liable to compensate that person for such damage, which may not exceed five crore rupees.
Section 66- Computer Related Offenses
Under this section, the punishment is mentioned for a person who fraudulently indulges into an act which is referred in Section 43, the punishment under this section is for imprisonment up to three years or with a fine which may go up to rupees five lakhs or both.
Section 66C- Punishment for identity theft
Personal and unique information like digital signatures is the most targeted area by cyber attackers. This section somewhat can be seen as a step of protection by inducing fear of being punished. According to this section, whoever knowingly and dishonestly tries to copy and use unique identification details of a person for example digital signature or password shall be punished with imprisonment for a term up to 3 years and fine which may go up to 1 lakh rupees.
Section 72- Breach of confidentiality and privacy
This section states that if any person has gained access to any type of electronic record such as a book, register etc and has disclosed the information of such electronic records without the consent or knowledge of the owner of such record shall be punishable with imprisonment up to for a term of two years, or fine up to one lakh rupees, or both.
The Honorable Supreme Court in Puttaswamy (Retd.) & Anr Vs Union of India and Ors., held that the right to privacy is protected as a fundamental right under Articles 14, 19 and 21 of the Constitution of India.
The IT Act of 2000 is not enough to be treated as data privacy or information protection legislation. There is an urgent need of specific law in India for data protection that will ensure privileges of information which will restrict the utilization of the data collected and information gathered for such other purposes other than for which it was collected. IT law is totally focused on electronic signatures, e-governance, key infrastructure and cybercrimes.
Example of a cyberattack using big data
If we look at Big Data in cyber crimes, Hackers are targeting big targets with a bigger impact on common people’s lives. The size and tendency of cyberattacks are increasing. To understand its magnitude here is an example: in November 2016 in one of the biggest ‘ Denials of Service Attacks’ in the US, hackers targeted databases with citizens’ valuable information and forced many websites ( including Twitter and Netflix) to go offline. The attack led to 1.2 terabytes of data being transferred to the victim’s computer forcing the server to go off-grid.
Prevention from data breaches
With the rise in instances of cyber crimes, the measures to tackle them effectively have also evolved. For example, Palantir- is a cyber security mechanism used by the US Military against threats of cyber terrorism. This was an example of an organization building its security mechanism, however, there are other organizations (usually private business organizations) who hand over their data to a third party to store securely in ‘cloud storage’.
One such example of cloud storage-based data security is ‘Assure Cyber’ by BT security which is a British-based organization that examines possibilities of cyber threats, scans for perceived attacks and prevents data loss due to cyber-attacks.
With the rapid growth of technology and data produced, big data analysis is going to be the next big thing.
Big data analytics need to be carried on with a strategy by all organizations uniformly to reach optimal levels of cyber security. The data platform needs to have proper administration over the security of the data. In addition to that, there has to be a scrutinizing mechanism that could process a huge amount of unsaturated data as well.
It is also important that experts from the different departments in the company meet on a regular basis and update each other about ways to tackle possible breaches for emerging threats.
A recently held study by KuppingerCole and BARC based on purposive sampling from over 50 countries found that 94% of companies are exposed to cyber threats and 62% of respondents believe that the number of cyber threats in the past year has increased. In terms of Information and Technology trends, the majority of companies take big data as an effective solution to cybercrime (88%) in the future. This shows that companies do understand the importance of big data analytics. However, at the moment, only 20% of companies are using big data analytics as a cyber security tool.
The large-scale implementation of big data analytics is yet to take place. It was also found that the reasons for lack of implementation are due to certain reasons which are poor awareness or seriousness about data security or high cost involved in the initial phase of implementing big data analytics or companies not collecting relevant data or maybe lack of technical experts in the area of big data analytics.
Our lives are entangled in a matrix of big data which is being continuously created in cyberspace. The lives of citizens are stored in the form of information in cyberspace and there is every possibility that this data could be attacked by cybercriminals if they tend to breach the security wall. It is now evident that to protect big data the best tool would come from the big data itself. Big data analytics is the need of the hour and the sooner the companies and government realize it, work for it and implement it, the better it is for our security.
- Big Data to fight cybercrime – PromptCloud
- Big data – Wikipedia
- he Rise of Big Data Analytics in Cyber Defense (omnisci.com)
- Why Big Data Is on the Rise | Foreign Affairs
Students of Lawsikho courses regularly produce writing assignments and work on practical exercises as a part of their coursework and develop themselves in real-life practical skills.
LawSikho has created a telegram group for exchanging legal knowledge, referrals, and various opportunities. You can click on this link and join: