Today, all sectors have integrated technology into their day-to-day business. Technology brings along with it growing data sets. Hence, processing and analysing such data sets is becoming more and more vital. But there is a key question that needs answering. Who would synthesise and give meaning to such scattered data existing in various forms? Enter data science.
Harvard Business Review classifies Data Scientist to be “The Sexiest Job of the 21st Century”. A McKinsey report forecasts the demand for data scientists to multiply so rapidly such that it would outgrow the supply by 50%. Surely, in the upcoming years, this would be one of the sought-after fields wherein job aspirants would opt to build their career in.
(Did you know? Big companies such as Amazon, L’Oreal, Viacom 18, British American Tobacco, and many others are now hiring Data Scientists!)
Data- The Modern Age Oil
With its dramatic ups and lows, crude oil prices may witness fluctuations; thus, taking a toll on the oil producing nations. But, data remains to be priceless as it will form the crux of decision–making in future.
At the same time, an analogy can be drawn between data and oil. Certainly, data and crude oil are precious resources. But, we cannot derive value from both unless they are processed and refined. As miners extract crude oil, in the same way, Data engineers extract data and Data Scientists refine data. However, the supply chain of data is not as complex and cumbersome as that of oil. After the oil is extracted, it needs to be then transported to tankers. Which then is routed through pipelines to be finally stored in storehouses? With the tremendous rise of Cloud Computing services, transportation and storage of data have never been easier!
What is Data Science?
Data Science is nothing but a science of making sense of data. To explain, it involves the usage of automated methods to analyse data and extract information or insights to find the unknown. It creates data products which help in decision making. This further aids in driving business value and building confidence.
The three main components of data science
Data Scientist- At a glance
Here is a snapshot of all that you want to know about Data Scientists-
What skills do I need to become a Data Scientist?
Drew Conway’s Venn diagram clearly illustrates the three skill sets, viz., Mathematics & Statistics, Programming & Database, and Domain Expertise, that are required to be a data scientist.
You think you don’t possess extensive knowledge or expertise in all the areas? No problem. Data Science is said to be a team sport. Thus, a data scientist need not necessarily be strong in all the mentioned fields. So, you can still build a career in data science if you either have strong analytical or programming skills. This would suffice to make you a valuable player in the data science team!
Skills of a Data Scientist
I. Mathematics and Statistics
Understanding of basic mathematics and statistics is crucial for a data scientist. Statistical knowledge helps to interpret and to analyse the data that is collected. Data Science becomes magical as applying brilliant mathematical concepts to the data yields unexpected insights! The basic concepts are Descriptive and Inferential Statistics, Linear Algebra, Graphing, etc.
II. Programming and Databases
This is the area that separates one from being a statistician or an analyst. For any given data, one needs to write programs to query and retrieve data from databases or apply machine learning algorithms. One should have a good grasp on data science libraries and modules.
Python and R provide some prebuilt libraries. Both can be used by simply importing them into programs. Thus, this makes them good programming languages to start with. Although Microsoft Excel is a great tool for processing data, it is only suitable when working with small or medium data sets. But when it comes it to Big Data, Python and R are much better. They also provide greater flexibility and control to the user.
Common packages of Python and R
Database Systems act as a central hub to store information. These can be SQL-based or NoSQL-based. Relational Database includes PostgreSQL, MySQL, Oracle, etc. with Hadoop, Spark, and Mongo DB being among the others.
III. Domain Expertise
This involves asking the right questions by filtering the relevant data from the entire data universe. Data Scientists need to interpret the data by understanding its structure. They also need to know the problem(s) they are solving. For instance, if they are solving an online advertising problem, they should understand the type of customers visiting their website, their interaction with the website and the meaning of such data.
IV. Machine Learning
This is an integral part of Data Science, used to create predictive models. Machine Learning algorithms are very powerful, thereby eliminating the need to create a new algorithm. Having said this, one should know the common ones such as dimensionality reduction, supervised and unsupervised algorithms.
Types of Machine Learning algorithms
You’ve used machine learning dozens of times in a day, without even realising that you are doing so! (E.g.- searching on Google requires machine learning) The reason for Google Search to work so well is because the machine learning software knows how to rank pages. The feature of auto-recognition and tagging of friends on Facebook is also due to machine learning.
How can we solve real world problems with Data Science?
All the tools and resources will help to resolve problems. But, the main aim should be to identify the right problem and to find the right tool or model which would be used while solving the problems. Data Science is now impacting myriad areas.
Based on the movies viewed previously, Netflix uses collaborative filtering algorithms to recommend movies to its users.
Many social media sites are also using data science. Whether it is recommending to you new connections on LinkedIn or new products on Amazon. Whether it is personalising your Facebook feeds or suggesting to you new people to follow on Twitter. Data Science does it all!
OTHER ONLINE SERVICES
Many online apps such as eHarmony (a dating Site), Uber, Spotify, Stitch Fix, Hulu, Bombfell, Pandora, etc. are collecting and using real-time data to customise and improve their users’ experience. Personalization not only increases customer engagement and retention but also improves conversion rates; thus, positively impacting the organisational bottom-line.
All of the above may be tech domains, but Data Scientists also work in other domains such as-
Scientists are working on analysing genome sequence.
Physicists use data science concepts while building and analysing 100 TBs of astronomical data. (You should check this out- Interview: Kirk Borne, Data Scientist, GMU on Big Data in Astrophysics)
Toronto Raptors, an NBA team, is installing cameras on basketball courts. They collect huge amounts of data on a player’s movement and playing styles. The team then analyses game trends (based on the data collected) thereby improving coaching decisions and team performance.
Want to be a Data Scientist? A piece of advice
- Do you love data?
- You consider yourself an eye for identifying trends and patterns?
- Do you have strong foundations of statistics, code writing, and programming?
- Are you comfortable with handling unknown facts?
- Do you think you can deliver to impatient stakeholders?
- Do you think you can convince them of your findings?
If you’ve answered a yes to these questions, then Data Science is the right career for you!
So all you need to do is the master at one least one language or tool of your choice. Also, get your hands dirty by doing some projects or competitions! Now there are high chances that you would make a talented data scientist!
(Note- Some data science job descriptions are titled “Consumer Insights Manager”. Hence, you can search for jobs with this title too, and increase your chances of being a Data Scientist!)
Data Science Learning Resources and Communities
These learning resources will make you better placed in bagging that dream data science job!
R Programming Language: Click here
Python Programming Language: Learn Python the hard way