Big Data

Big Data

Have you ever thought about how much data you produce per day? Gone are the days when “data” only meant the numbers on your Excel file. Nowadays, whether you are booking your flights online or sending your friend an e-cheque, you generate some sort of data. Needless to say, we need a mechanism to store these data, and here comes the creation of what we called “Big Data”.

Big Data is a collective term referring to data sets that are literally “too big” that exceed the capacity of any traditional data processing applications. That said, we need newer applications that can meet this growing demand. Apache Hadoop is an open-source software framework that allows online storage and data processing on the basis of clustered commodity hardware. It was created jointly by Mike Cafarella and Doug Cutting back in 2015 (How scary it is that very few people know it?). The two major components of Hadoop are the Hadoop Distributed File System (HDFS) and the Map Reduce processing charter, which are both created inside Google. Long story short, the HDFS is a portable system written in Java. It manages how different data files are split and stored across the cluster. Each of these data entries has a “namenode” and a cluster of these nodes forms the entire HDFS portal.

Here comes the other major component- MapReduce. It makes parallel processing possible on distributed servers. What it does is that it breaks down larger data blocks into smaller data sets called tuples. These tuples can then be manipulated to the desired format through a set of mathematical formulas. When the process is over, HDFS would manage the storage of the information and the distribution of final output. To put it another way, the “Map” function would be the process of processing data into what we called “key-value pairs” and distributing them into appropriate “nodes” for the following “Reduce” function.

Hadoop is the “missing stone” between the possibility of working with big data, and the current limitations we have with traditional database management systems. But how exactly can Big Data assist the financial services industry? Firstly, data governance can be greatly improved because of big data development. One of the major achievements could be observed in the institutional side of the bank. It could begin to adopt and be influenced by the retail side of business in terms of clients target, goal setting and marketing. For instance, there are different B2B businesses that leverage big data in an attempt to create better client intelligence. Some larger mutual fund managers would also improve their data collection mechanism from wealth advisor networks and interactions between financial agents, creating better market forecast and product application.

Probably not too much related to Fintech but here comes another interesting big data application- find your love! In the recent years mobile dating apps such as Tinder have become the most chic way of dating! These apps make sure of different algorithms in terms of big data in order to generate your desired “match”. As big data is getting more common, it could not only benefit the government and large corporations, but also individuals like you and me. In essence, big data could probably be the most sought-after market research strategy in the coming decades!

Wilfred Yau is polyglot currently pursuing an Honours Degree at the University of Toronto. He received full scholarships from the University for his undergraduate studies, with a year abroad at University College London. He is doing a Double-Major in both Economics and Linguistics with a Minor in Statistical Science. He is a language enthusiast, with the hobby of picking up different local languages whilst travelling. His research interests for future studies lie in the fields of sociolinguistics and computational linguistics. He is interested in seeing how the development of Natural Language Processing is going to change the global economic and linguistic landscape.