Data Science Explained in Simple Terms for Everyone

in #stemgeeks3 years ago (edited)

data.jpg

Data is the new gold. Let me explain why.

The boom of the internet brought massive data along with it and since then a lot of things have changed about how organizations conduct surveys to gather data as well as how this data is handled.

Every information you have ever put out there about yourself is data. The famous example is Facebook. Facebook has information about you, your friends and family, where you work, what you are up to, etc.

All these are information (Data) that can be analyzed for profit or for improving existing methods. All a business startup needs to do is gain access to this data.
Then they can find out if there is a market for the product or software they want to launch, which age grade uses certain products, what are the reviews on existing products, how they can improve or make theirs better, etc.
Armed with this knowledge, organizations know better what to do and how to do targeted ads.
Hence, better profit than some random producing and selling.

But Just like Gold, data extracted is usually impure and would need some refining. This is where Data Scientists come in.

You can think of Data Scientists as skilled gold miners.
The process is like this
- Dig for gold (extract data using various methods)
- Refine gold (clean data in order for it to actually make sense)
- Mold gold (build data models that can be used to make good impact)

Of course, the processes involved are intricate but you get the big picture.

Simply put, Data Science combines knowledge of statistics, math and programming to gather, analyze and interpret data. There is a clear difference between a Data Analyst and Data Scientist.
A Data Analyst would tell you the basics like what is the percentage of unemployed population while a Data Scientist would build a model that predicts the population that would be employed/unemployed based on certain factors.

Skills Required from a Data Scientist

  • Data Collection
    You could pass questionnaires around or send out google forms for people to fill, or extract data from an already existing database. Also data can be gotten from the internet through a process called web scraping. Since social media is one of the foremost go-to places for information, some people build bots that use sentiment analysis (How you feel about a product/service) to gather required information.

  • Data Cleaning
    Often, Data scientists spend 70-80% of their time cleaning data. Irrespective of the various methods involved in data collection, there would always be a margin of error. For example, having duplicate data because someone pressed the enter button twice or having missing data, spelling typos, incorrect data because some people didn’t want to disclose details, inconsistent data, irrelevant info, outliers, mixture of different data types (like having numbers on a column that should only be Yes/No), etc.
    Data cleaning is the most important step because incorrect data would lead to incorrect results/predictions. Imagine the losses an organization would make if they were to invest in a wrong prediction. In other words, garbage in, garbage out.

chart.jpg
source

  • Data Visualization
    Most times the stakeholders in an organization are not so tech savvy and explaining the data under consideration would require that you break things down for proper communication. This would require using charts and plots to explain effectively. There are many software programs that are popularly used – Ms Excel, PowerBI, Tableau, Plotly.

  • Database handling
    Every organization has a record of the people and products that form their organization. This record is usually referred to as a Database. Having knowledge of how to handle databases using popular software like MySQL and MongoDB (among many others) is an important skill in a Data Scientist's toolbox.

  • Machine learning
    This is the part where Data Analysts throw in the towel and give space for Data Scientists to carry on. Machine learning basically means training your computer to learn from your patterns and be able to act independently. Machine learning is the reason your smartphone starts suggesting words for you after using it for a couple of days. Netflix movie recommendations, your washing machine resuming from where it stopped before power outage, those are all machine learning results. Knowledge of Python and/or R is needed to achieve this.

PS. Most of my examples in this article revolve around data science for businesses but it is noteworthy that data science is very important for medical care and has been frequently used in this regard. For example, there are data science algorithms that help to simulate how drugs will act in the human body and cut out long laboratory experimentations.

TL;DR version:
The analogy I used to compare Data Science here is Gold refining. Data comes unprocessed and a Data Scientist would need to combine knowledge of statistics, math and programming to turn data into beautiful and valuable information. Data Science is a very interesting field and it's a skill that is in very high demand with relatively low supply.

image source