What is Data Science
Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in either structured or unstructured forms.
From scientific discovery to business intelligence, data science is changing our world. The dissemination of nearly all information in digital form, the proliferation of sensors, breakthroughs in machine learning and visualization, and dramatic improvements in cost, bandwidth, and scalability are combining to create enormous opportunity.
The field also presents enormous challenges, thanks to the relentless increase in the volume, velocity, and variety of information ripe for mining and analysis.
It employs concepts and techniques from mathematics, statistics, information science, and computer science, in particular from machine learning, classification, cluster analysis, data mining, databases, and visualization.
“Data scientist” has become a popular occupation with the Harvard Business Review dubbing it “The Sexiest Job of the 21st Century” and McKinsey & Company projecting a global excess demand of 1.5 million new data scientists.
What do Data Scientists do?
Data scientists use their data and analytical ability to:
- find and interpret rich data sources
- manage large amounts of data despite hardware, software, and bandwidth constraints
- merge data sources
- ensure consistency of datasets
- create visualizations to aid in understanding data
- build mathematical models using the data
- present and communicate the data insights/findings.
They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to produce and present results with dashboards.
How to Build Your Profile for MS in Data Science?
Thinking for pursuing an MS in Data Science (or, Machine Learning)?
Head to the Home of Data Science and Machine Learning – Kaggle Competition!
Kaggle is a platform for predictive modelling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective.
Kagglers come from a wide variety of backgrounds, including fields such as computer science, computer vision, biology, medicine, and even glaciology. It also includes many of the world’s best-known researchers, including members of IBM Watson’s Jeopardy-winning team and the team working on Google’s DeepMind. Many of these researchers publish papers in peer-reviewed journals based on their performance in Kaggle competitions.
How does Kaggle Competitions Works?
- Companies and organizations prepares the data and a description of the problem. Kaggle frame the competition, anonymize the data, and integrate the winning model into their operations.
- Participants, like you, experiment with different techniques and compete against each other to produce the best models. Work is shared publicly through Kaggle Scripts to achieve a better benchmark and to inspire new ideas. Submissions are made through Scripts or through private manual upload. For most competitions, submissions are scored immediately (based on their predictive accuracy relative to a hidden solution file) and summarized on a live leaderboard.
- After the deadline passes, the host company pays the prize money for the winning solution. many companies recruit participants based on their place on the leaderboard, final score, and submitted scripts.
- Alongside its public competitions, Kaggle also offers private competitions limited to Kaggle’s top participants.
What Kaggle competition should a beginner start with?
I’d start with the tutorials first just to make sure you have a good grasp of the primary tools and techniques that most people use: https://www.kaggle.com/wiki/Home
Afterwards, Titanic: Machine Learning from Disaster is a good competition to start. It will prep you with fundamentals of data science – the data size is manageable, the problem is interesting, and you need minimum overhead in terms of computational requirements.
If you aren’t decided on your weapon of choice, I would suggest that you start with R. The tutorial can be found at Titanic: Machine Learning from Disaster. Follow this up with Python, Titanic: Machine Learning from Disaster.
Since your objective is learning, the most important place for you is the Kaggle forum. There is just tons of valuable information buried in those posts. What worked, what didn’t work, the issues others are facing, interesting patterns and visualizations, and neat tricks. I find it to be the best “practical” data science guide out there.
Once you have a sound footing, maybe in a couple of weeks, the next step would be to try something with text data like Sentiment Analysis on Movie Reviews.
Add to that some competition that uses audio and/or video data. There could be a few running or you can always dig up the old ones like Challenges in Representation Learning: Facial Expression Recognition Challenge and The Marinexplore and Cornell University Whale Detection Challenge
Career in Data Science
A career in Data Science involves statistics, mathematics, business, economics and Computer Science.
After a Master’s in Data Science, you can work in various sectors such as finance, healthcare, consulting, retail or consumer products – basically any field where there is lots of data and there is a requirement to analyze large data sets to develop custom models and algorithms to drive business solutions.
With regard to Data Science, the primary focus is on applications rather than research. You use some knowledge from Computer Science (data structure, deep learning, computer vision, natural language processing, machine learning) in your data science role.
Typical employers include Walmart, Tesla, Intuit, Collective Health and numerous financial/trading companies on Wall Street.
The average salary for a job in Data Science in the US is about $113,000 as per Glassdoor. Another source – Payscale – puts the median salary at about $93,000.
Let’s have a look at the application of data science in different fields.
#1 Data Science in Retail
With online commerce, retail data is increasing exponentially in terms of volume, the velocity at which data is being generated and their value for the kind of insights and profit they could offer. As per McKinsey’s report on Big Data, retailers using big data analytics could raise their operating margins by as much as 60 percent.
The following points are a few of the applications of big data in retail:
- Customer Experience: Personalized recommendation based on purchase history, sentiment analysis, predictive analytics for improving customer experience across all channels and devices
- Merchandising: Improving layout, product placement and promotional display, identify cross-selling opportunities
- Marketing: Location-based personalized offers on mobile phones, real-time pricing, better targeted campaigns
- Supply chain logistics: real-time inventory tracking and management, demand-driven forecasting, route optimization and efficient GPS-enabled transportation
#2 Data Science in Health Care
In the US, health care expenses represented 17.6% of the GDP in 2013 with annual spend of $2.6 trillion. Out of this, $600 billion was consumed by waste and fraud. By 2020, this figure is estimated to rise to nearly 20%.
Big Data has the potential to help physicians make better decisions across the board – from personalized treatments to preventive care, while, at the same time, slashing the cost of providing health care services.
The following list details some of the applications of big data in retail:
- Personalized medicine: Create a personalized treatment plan based on individual biology using data from various sources including clinical trials, electronic medical records, online patient network, genomics research etc
- Genomics: Inexpensive DNA sequencing and next-generation genomic technologies are changing the way health care providers do business. They are getting better understanding of the genetic bases of drug response and disease by combining genomic data with other data in disease research.
- Predictive analytics and preventive measures: Some examples are: Mount Sinai Medical Center reduced its readmission rate, Texas Health identified high-risk patients to offer them customized interventions and Methodist Health System predicted patients who will need high cost care in future.
- Patient monitoring and home devices: Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can take vital stats of the patients every minute of the day. Personal ECG heart monitor, medical monitoring devices and mobile applications are cropping up daily.
#3 Data Science in Finance
There has been a flood of financial data in recent times from various sources such as social media activity, mobile interactions, server logs, real-time market feeds, customer service records, transaction details and, of course, information from existing databases.
The following list details some of the applications of big data in finance:
- Sentiment analysis: Use natural-language processing, text analysis and computational linguistics to discover what people really think.
- Automated risk credit management: Alibaba has successfully used big data to offer loans to entrepreneurial online vendors without any collateral by using their transaction records, customer ratings, shipping records and a host of other info.
- Real-time analytics: helps in fighting financial fraud, improve credit ratings and providing more accurate pricing
- Predictive analytics: For example, whether certain customers are likely to pay off their credit cards using the demographic characteristics of customers’ neighborhoods and making calculated predictions.
#4 Big Data in Telecom
Mind Commerce, a market research firm, predicts that the big-data-driven telecom analytics market will grow by nearly 50 percent from 2014 to 2019 and forecasts that by the end of 2019, the market will be up to $5.4 billion in annual revenue.
Here are some applications of big data in finance:
- Personalized services: applications include determining a subscriber’s lifetime value, reveal cross-channel insights and avoid customer churn
- Network optimization: using real-time and predictive analytics, analyze subscriber behavior and create individual network usage policies
- Location-based initiatives: use geo-fencing and sensor technology data scientists can predict a subscriber’s location and specific data needs with stunning accuracy to, for example, create targeted offers, when a subscriber is in a super market
- Churn prevention: combine variables such as calls made, minutes used, number of texts sent, average bill amount and behavior such as visiting competitor’s website to predict the likelihood of subscriber changing to a competitor for bargains
There are similar applications of big data in other domains such as Utilities, Travel and Transportation, Insurance, Pharmaceutical, Manufacturing, Gaming, Hospitality, Biotech and Energy.
Let’s quickly compare a career in Data Science with a career in Machine Learning.
Career in Machine Learning
Machine learning is the study of how computers can learn complex concepts from data and experience, and seeks to answer the fundamental research questions underpinning the challenges outlined above.
The field of machine learning crosses a wide variety of disciplines that use data to find patterns in the ways both living systems, such as the human body and artificial systems, such as robots, are constructed and perform.
Whether it’s being applied to analyze and learn from medical data, or to model financial markets, or to create autonomous vehicles, machine learning builds and learns from both algorithm and theory to understand the world around us and create the tools we need and want.
In a Machine Learning job, you are expected to solve new and emerging technical challenges related to human-machine interactions.
In your role, you will utilize core computer science and engineering skills like high-performance computing, distributed systems and applied math.
You are expected to have 5+ years of experience in programming parallel and distributed systems, debugging low-level problems, performance analysis and optimizations, and numerical methods.
Also include – experience in using machine learning techniques for classification, regression, or ranking problems, experience in building predictive models for recommendations or personalization, design and implementation of shipping, innovative consumer products etc.
Typical employers include Facebook, Amazon, Apple, Google and Microsoft.
Check out this tip to learn more about MS in Machine Learning.
How to shortlist universities for MS in Data Science?
Factors important to identify best universities in machine learning / data science
1) University reputation (rankings)
This factor is important in general but more so for the data science programs. This is because most of them are relatively new, i.e. around 2-4 years old and it’s difficult to establish credibility in the industry in such a short duration. – Thus, the university brand name plays a key role on how your candidature will be perceived in the industry after completing the degree. No doubt, your knowledge would always matter more, but university reputation plays a crucial role for new courses.
Location plays a pivotal role in practical learning opportunities outside the campus. Practical training typically comes in the form of internships, capstone projects, weekend hackathons, etc. Given that data science is a highly application-oriented domain, practical training would play a crucial role in your overall development. – While you are in the program, its location can have quite an impact on your profile in terms of getting good internship opportunities. Also, a strong data science community gives access to specialized skill meetups and hackathons. For instance, the data science communities in cities like New York or Silicon Valley will be much stronger than other suburban locations. – After the program, a good location definitely helps with the job search as there will be ample employment opportunities.
I believe this is the most important aspect and the first thing which you should check out. The curriculum actually tells you what subjects you’ll be studying and gives an impression about the relevance of the program for you. Typically, coursework is divided into core courses (compulsory courses) and electives. You should also check out the list of courses from which you can choose the electives. – Curriculum flexibility i.e. the ratio of elective courses, is another important factor. It can vary from as high as 60-70% in some courses to almost none in others.
4) Industry Collaborations
Since most of the programs in data science related courses are professional, industry collaborations will play a key role in your experience through the program. You should check out the particular companies, which domain they belong to, what sort of activities are conducted like technical talks, research collaboration, capstone projects, etc.
5) First Hand Experience
The first step is to log into the university’s website and have a look at the details of the program. You can do a first level filtering based on the evident information on website. But, an equally important aspect is to talk to people who are already studying there as well as the university’s alumni. You can definitely apply to all the colleges you like, but for making the final choice, I can’t over-emphasize the importance of this step, which will give you a true picture about the college administration and recognition in the industry. These factors are really hard to judge from any university’s website. Also, given that these programs are mostly new, the amount of discussions on third-party websites like Quora are also limited. – If you’re wondering how to find these people, again LinkedIn and Facebook are your best friends!
6) Program name is not so important!
The traditional philosophy – ‘Don’t judge a book by its cover’ works in this case as well. Since Data Science (and Machine Learning) is a non-traditional program, you’ll find all sorts of names like Masters in Analytics, Masters in Business Analytics,Masters in Data Science, Masters in Predictive Analytics, Masters in Marketing Analytics, Masters in Information Systems, etc. Trust me, names can be very misleading. Although, they do give you an idea of what the program is all about, the name of the program should definitely be your last concern, if at all!
13 Schools for MS in Data Science that you can consider
The following schools are some of the best schools that offer programs in Data Science, and you can consider these for your reach, match and safe shortlist.
#1 University of Southern California
Program: MS Data Science
The MS in Computer Science – Data Science provides students with a core background in Computer Science and specialized algorithmic, statistical, and systems expertise in acquiring, storing, accessing, analyzing, and visualizing large, heterogeneous and real-time data associated with diverse real-world domains including energy, the environment, health, media, medicine, and transportation.
Courses: Foundation of artificial intelligence, Analysis of algorithms, Databases
Electives: Information retrieval and web search engines, high performance computing and simulations, advance topics in database systems, foundation of data management, machine learning, probabilistic reasoning, advanced big data analytics, foundation and application of data mining, optimization: theory and algorithms, information visualization, building knowledge graph, numerical analysis, applied probability
#2 Columbia University
Program: MS Data Science
The Master of Science in Data Science allows students to apply data science techniques to their field of interest. Our students have the opportunity to conduct original research, included in a capstone project, and interact with our industry partners and faculty. Students may also choose an elective track focused on entrepreneurship or a subject area covered by one of our seven centers.
Who should apply – Individuals looking to strengthen their career prospects or make a career change by developing in-depth expertise in data science. Candidates for the Master of Science in Data Science are required to complete a minimum of 30 credits, including 21 credits of required/core courses and 9 credits of electives.
Required Courses: Probability theory, algorithms for data science, statistical inference and modeling, computer system for data science, exploratory on data analysis and visualization
#3 University of Rochester
Program: MS Data Science
The Goergen Institute for Data Science offers a STEM-accredited MS program in data science. This program allows students to study the broad area of data science or to concentrate their studies in one of the following areas:
- Computational and statistical methods
- Health and biomedical sciences
- Business and social science
The program can be completed in either one year or one and a half years of full-time study. Each graduate will receive a degree conferred by the University’s School of Arts and Sciences.
#4 University of Massachusetts Amherst
Program: MS CS (With concentration in Data Science)
The Computer Science Masters with a Concentration in Data Science was created to help meet the need for expanded and enhanced training in the area of data science. It requires coursework in Theory for Data Science, Systems for Data Science, Data Analysis and Statistics.
Aerial photo of computer science buildingThe Masters Concentration in Data Science teaches you to develop and apply methods to collect, curate, and analyze large-scale data, and to make discoveries and decisions using those analyses.
#5 University of Washington Seattle
MS Data Science
The Master of Science in Data Science at the University of Washington gives you the technical skills to extract knowledge from large, noisy, and heterogeneous datasets — big data — to provide insights that people and organizations can use.
Our interdisciplinary curriculum was developed by leading faculty from six top-ranked departments and schools at the UW, with input from top companies looking to hire data science professionals. In this program, you’ll build deep expertise in managing, modeling and visualizing big data to meet the growing needs of industry, government, nonprofit and research organizations today.
#6 University of California Irvine
Program: Master of Science in Business Analytics
MSBA program at The Paul Merage School of Business at UC Irvine is a STEM-designated, intensive one-year, full-time degree program for students with or without work experience. Taught by world-class faculty and leading researchers in the field, graduates of this program will be eminently qualified for big data and analytics careers.
Merage School Virtual TourThe MSBA offers flexible curricular tracks to align with your desired career path:
- Data Analytics – actuaries, database administrators, financial analysts, information security analysts, management analysts, survey researchers
- Marketing Analytics – market research analysts, marketing specialists, marketing managers, sales managers, survey researchers
- Operations Analytics – analytics specialist, financial analysts, management analysts, operations research analysts, statisticians, transportation, storage and distribution managers
#7 New York University
Program: MS in Data Science
The Master of Science in Data Science is a highly-selective program for students with a strong background in mathematics, computer science, and applied statistics. The degree focuses on the development of new methods for data science.
We live in the “Age of the Petabyte,” soon to become “The Age of the Exabyte.” Our networked world is generating a deluge of data that no human, or group of humans, can process fast enough. A new discipline has emerged to address the need for professionals and researchers to deal with the “data tidal wave.” Its object is to provide the underlying theory and methods of the data revolution. This emergent discipline is known by several names. We call it “data science,” and we have created the world’s first MS degree program devoted to it.
The curriculum is 36 credits, and offers two ways to structure the graduate program that gives students the opportunity to pursue a specialization through tracks. NYU will offer a limited number of tuition scholarships to selected students admitted to the program. All applicants for admission will be considered for these awards on a competitive basis.
#8 University of Texas Dallas
Program: MS CS (Data Science Track)
STRATEGICALLY LOCATED in the middle of the Telecom Corridor, which is home to hundreds of hightech companies, the Computer Science Department is in the midst of a growth phase that includes addition of new programs in cybersecurity, information assurance, data sciences and interactive computing, hiring of a large number of new faculty, and a steep increase in external research funding.
Courses in Data Science Track: Statistical Methods for Data Sciences, Big Data Management and Analytics, Design and Analysis of Computer Algorithms, Machine Learning
#9 Michigan Technological University
Program: MS Data Science
Our degree will provide you with a broad-based education in data mining, predictive analytics, cloud computing, data-science fundamentals, communication, and business acumen. Additionally, you will gain a competitive edge through domain-specific specialization in disciplines of science and engineering. You will have the freedom to explore and develop your own interests in one or more domains.
#10 Illinois Institute of Technology
Program: MS Data Science
In Illinois Tech’s Master of Data Science program, you learn to explore data using high-level mathematics, statistics, and computer science. In particular, you learn how to analyze data, visualize your results, and articulate your discoveries. You will leave the program with the ability to think about the real problems that need to be solved, not to simply find technical solutions.
You will learn to question underlying premises and reformulate issues, explore and improve the structure of available data, create and evaluate models, construct and test hypotheses, draw conclusions, and determine if the results make sense in the real world. You will then learn to communicate these results to specialists and non-specialists alike.
The program is offered to full-time students on the Mies Campus, just minutes south of the Chicago Loop, a global finance center, whose businesses rely on amassing data in finance, healthcare, retail, manufacturing, consumer services, tourism, professional sports, and cultural activities. In this international city on the shores of Lake Michigan, data science students have the opportunity to engage in Chicago’s thriving tech community. Moreover, the City government allows and encourages access to its publically available wealth of municipal data.
#11 Cleveland State University
Program: MS Computer & Information Science
The Information Systems (IS) track in the Master of Computer and Information Science (MCIS) program at Cleveland State University is a specialized degree program designed to prepare students for careers as information professionals. The IS track is housed within the Monte Ahuja College of Business. The coursework combines a blend of technology and management-oriented courses designed to prepare the next generation of technology managers to lead enterprises in innovative ways. Specializations within the program allow students to build specific skill sets in areas such as: information security, information technology management and business analytics.
#12 Rochester Institute of Technology
Data science, a term first coined in 2008 by data analytics leaders at Facebook and Google is a new and inherently multidisciplinary field – combining computing, mathematics, statistics, and the sciences – devoted to the management and analysis of massive, mostly unstructured data. RIT’s approach to data science is distinctly different from the existing programs. Firstly, this degree is career focused, aiming to equip students with practical skills to handle large-scale data management and analysis challenges that arise in their daily work. The career-focused degree will also significantly benefit from one of the world’s largest co-op programs at RIT, which brings in practical problems, real world data, and software tools commonly adopted in industry to enrich our curriculum.
Secondly, the program is highly interdisciplinary and domain driven, focusing on domain specific problems and solutions. It also provides students the opportunities of interdisciplinary study. Important domains (e. g, biology, physics, and statistics) are judiciously selected and integrated as part of the curriculum to provide customized, domain-specific training to next-generation data scientists.
#13 Wayne State University
MS Data Science and Business Analytics
The Mike Ilitch School of Business and College of Engineering have developed a novel Interdisciplinary Master of Science in Data Science and Business Analytics program, which is designed to help students excel in both industry and academia.
This novel Master’s program is designed to provide students with a broad range of data science and business analytics knowledge and skills. Each student will need to select one of the three major concentrations to provide a specialized track to give them more in depth knowledge and skills in that area of specialization.
When applying to the program, applicants will have to select which Program and Major Concentration they want to apply for:
- Data Science and Business Analytics – MS Business Program – Data-Driven “Business” Concentration
- Data Science and Business Analytics – MS in Engineering Program – Advanced “Analytics” Concentration
- Data Science and Business Analytics – MS in Engineering Program – Computational “Engineering” Concentration
At following schools, you would find Data Science, Business Analytics and related programs:
- Carnegie Mellon University: MS in Computational Data Science
- Stanford University: MS in Statistics: Data Science
- Georgia Institute of Technology: MS in Analytics
- MIT Sloan: Master of Business Analytics
- Columbia University: Master in Data Science
- Michigan State University: MS in Business Analytics
- University of Washington: MS in Data Science
- University of Southern California: MS in Data Informatics
- University of Chicago: MS in Analytics
- New York University: MS in Data Science
- Northwestern University: MS in Analytics
- North Carolina State University: MS in Analytics
- Texas A&M University: Masters in Analytics
- University of Cincinnati: MS in Business Analytics
- Arizona State University: Master in Business Analytics
- Illinois Institute of Technology: Master of Data Science