7 Skills You Need to Succeed in Data Science (and How to Show Them Off in a Job Search)

7 Skills You Need to Succeed in Data Science (and How to Show Them Off in a Job Search) was originally published on The Muse, a great place to research companies and careers. Click here to search for great jobs and companies near you.

Nearly a decade after Harvard Business Review called data science the sexiest job of the 21st century, becoming a data scientist is still a great career choice. The median salary for a data scientist is $122,840, according to the Bureau of Labor Statistics, and BLS expects the field to grow much faster than average over the next decade.

As a content creator at Dataquest, an online data science education platform, I’ve seen firsthand how becoming a data scientist can be absolutely life-changing. And in the many hours I’ve spent talking to learners, data scientists, and hiring managers, I’ve also come to understand what it takes to work in data science and what skills you’ll need to make yourself a strong candidate.

Before we dive into what those skills are, it’s important to mention two things about applying for data science jobs.

First, “data science” is a vaguely defined term, and the requirements for a “data scientist” or “data analyst” can vary from job to job. Different companies also give these jobs different titles; a “data scientist” at one firm might be a “machine learning engineer” at another, so be sure you cast a wide net when looking for jobs and read each job description carefully. It can help to search for job postings that list your skills rather than just searching for “data scientist” jobs.

Second, the cardinal rule for any job hunt is that employers are looking for proof you can do the job they need done. This rule is particularly crucial in a data science job hunt because data science is a relatively new field and there are no universally trusted education credentials. If you don’t have experience working in the field, you’ll need to showcase projects you’ve built that prove your skills in a portfolio.

These can be data science projects you built for school, independent projects you created on your own, or some mix of the two. The best and most relevant-to-the-job ones should be highlighted on your resume and they should all be accessible via a clickable GitHub link. (Don’t know what GitHub is yet? Don’t worry! We’re getting to that.)

If you want to land a data science role, you’ll need to make sure you’re developing these seven skills recruiters and hiring managers are looking for—and showing them off in your applications.

Search for open data science, data analyst, and machine learning jobs on The Muse

1. Programming

There’s no getting around it—if you want to work in data science, you need to learn to code. Specifically, you need to learn to code in either Python or R, the two programming languages used regularly in the data science world.

The good news? You don’t need to learn both. Either one is fine, although one or the other might be a better choice depending on your goals. Python is the more popular choice in the business world; R is more widely used in academia and research.

You really can’t go wrong with either one, but if you choose Python, be careful when you’re selecting learning resources. While R is focused on working with data and performing statistical analysis, Python is much more versatile. That can be a good thing, but if you take a generic Python course, you may end up wasting time learning things you don’t really need for data science work.

Once you’ve picked a language, you’ll also need to learn the key libraries used for data science work. Libraries are like tools that supplement the base programming language, and they’re there to make your life easier. For example, they contain pre-written functions, allowing you to perform common data tasks with just a line or two of code. Writing the same functionality from scratch in the base language would take you much longer. In Python, common data-focused libraries include numpy, pandas, matplotlib, and scikit-learn. In R, you’ll probably want to learn the popular libraries of the tidyverse.

Additionally, it’s helpful to pick up some code-related workflow skills that’ll help you function more effectively in the real world. An understanding of Git and GitHub is mandatory—these are tools that help you store and manage different versions of code and collaborate with other programmers. A solid command of the UNIX command line (also called terminal, bash, etc.) isn’t strictly required, but it can help you work more efficiently by speeding up tasks like text file processing. Command line skills are also sometimes needed for working with cloud data, and they can make it easy to automate otherwise time-consuming processes like setting up a new teammate’s system with all of the tools and access they need.

How to Develop This Skill

There are dozens of lecture-based video courses on platforms such as Coursera and EdX and thousands of YouTube videos on a wide variety of data science topics. You may want to search for “data science courses” rather than “programming courses” to ensure what you end up learning is actually relevant to data science. Many data science textbooks are also worthwhile, and some are available online for free.

I’m biased, but in my opinion, the best bang-for-your-buck option is an interactive online platform that lets you write and run code as you’re learning. Dataquest (where I work) and Codecademy are examples of this kind of platform, and the primary advantage they offer is getting you immediately hands-on and writing code to apply everything you learn.

You can learn data science programming from a wide variety of sources. Just remember that watching someone else code isn’t the same thing as knowing how to write the code for yourself—if you’re taking a video-based course, be sure to set lots of time aside to apply what you’re learning by actually writing and running code.

How to Demonstrate It in Your Job Search

Your programming skills should be listed in the skills section of your resume, but they must also be evident in the data science projects you’re showcasing on both your resume and your GitHub page. Be sure that the code on your GitHub is clean, clear, and commented—you want any employer who opens up a project to be able to tell immediately that you’ve got solid programming chops.

Your programming skills are also likely to be tested during the interview process. Every job is different, but it’s likely you’ll encounter one of the following:

  • A technical interview that tests your coding skills (which may entail answering questions verbally, or writing real or pseudocode on a whiteboard).
  • An on-site project where you’re asked to complete a coding task in a set period of time.
  • A take-home project where you’re asked to complete a coding task and return it by a specified date.

Read More: 7 Questions You Should Be Ready to Answer in Any Data Science Interview

2. SQL

Regardless of which programming language you choose, you also need to learn SQL. SQL, which can be pronounced “S.Q.L.” or “sequel,” is what’s called a query language. Essentially, it’s a specialized kind of programming language that you use to request and filter information from a database.

SQL often gets overlooked by aspiring data scientists. It’s a very old language and it’s kind of boring when compared to something like deep learning. But make no mistake, SQL is an essential skill for data science work because most companies store their data in some form of SQL-based database. In fact, even in 2020, data scientists and data analysts used SQL more than either Python or R!

How to Develop This Skill

Just as with programming, there are a wide variety of online options for learning SQL, including video courses, texts, and interactive platforms. Mode Analytics has a free SQL tutorial that is well-liked and doesn’t require any prior experience. Most online platforms that teach data science programming and other data science skills also have courses that cover SQL.

How to Demonstrate It in Your Job Search

Include projects with SQL work on your resume and in your GitHub. And if you get an interview, be sure you spend some time brushing up—employers know how important SQL is, so it often comes up in technical interviews or on take-home projects during the interview process. You may be quizzed on SQL basics like the syntax for an inner or outer join, or you may be asked to write and run real queries or sketch them out on a whiteboard.

3. Handling Messy Data

This is really an umbrella term that covers a couple of different, but closely related, skills.

The first is data cleaning, a critical skill for anyone who aspires to work with data. Data cleaning is everything that you have to do to an existing data set to get it ready for analysis, including tasks such as fixing formatting, cleaning up typos, and dropping duplicate entries. Data cleaning isn’t most people’s favorite part of the job, but it is an essential one. And don’t worry! You’ll be doing all of this cleaning using your programming skills, not combing through spreadsheets by hand.

The second skill is working with unstructured data. Unstructured data really refers to any data that isn’t coming to you as a pre-existing data set, and thus isn’t clearly structured. Streaming data from social media, for example—a raw, real-time feed of everything posted to a platform—is unstructured data. You have to write the code that filters, sorts, and categorizes it to create the data set you want to analyze, and that’s a skill that employers value.

How to Develop This Skill

Practice working with dirty data sets and try a few projects where you collect your own. For a first project, try working with something like Twitter streaming data—this is unstructured data, but it’s analyzed frequently, so you’ll be able to find lots of tutorials (like this one) and code examples to help you get the hang of it.

How to Demonstrate It in Your Job Search

Again, this is something that you show in the projects on your resume and your GitHub. On your resume, in the bullet points under your highlighted projects, include a line or two detailing what you had to do to clean and structure the data. For example, you might say something like:

  • Filtered streaming tweet data via the Twitter API, cleaned tweets with regex, and tokenized them for VADER sentiment analysis.

You should also be ready to talk about how you approach handling messy data in your interview, whether you get asked about it directly or not. You will certainly be asked about your projects more broadly. In your answers, you can weave in the context of how you collected and cleaned the data prior to your analysis.

4. Machine Learning/“AI”

This is the part of data science that many aspiring learners get excited about, and with good reason! Machine learning is incredibly cool, but it can also start to feel pretty daunting when you look into it because it is a large and complex field.

The good news is that you don’t have to know everything! To get a foothold in the industry, you’ll just need a solid grasp of the most popular algorithms. For example, you’ll want to be sure that you can implement and explain popular model types including linear and logistic regressions, Naive Bayes,  classification and regression trees (CART), k-nearest neighbors algorithm (KNN), k-means, principle component analysis (PCA), and random forests. (If that all sounds like gibberish, don’t worry! It’s not as bad as it sounds, and you don’t need to know it all right now—you’ll get there eventually!)

If you aspire to work in a specific field within data science, or at a particular company, then you may need to develop more experience in a specific area of machine learning. For example, gaining a deep understanding of Natural Language Processing (NLP) algorithms and techniques isn’t necessary for a generalist data science role, but it would be necessary to get a job on a team that’s working on something related to NLP, like speech recognition.

How to Develop This Skill

There are many online courses and tutorials that teach machine learning. But by the time you reach this stage of your studies, you may find it’s best to focus on learning by doing, taking on personal projects that force you to work with different machine learning models as a way to challenge yourself. Competition sites like Kaggle can also be a great source of both learning and motivation when it comes to machine learning techniques.

How to Demonstrate It in Your Job Search

Projects! Do I sound like a broken record yet? Without prior work experience, projects are really the only way you can demonstrate this skill on a job application. You can talk about your Kaggle competitions, for example, and point to notebooks on the site as part of your portfolio. In any case, be sure that you have at least one or two machine learning projects listed on your resume, and don’t forget to describe the model you built in the bullet points under the project title. For example, you might write something like:

  • Predicted home sale prices with 97% accuracy by building a multivariate KNN model.

5. Communication

When people talk about data science skills, soft skills like communication are often overlooked. But this actually might be the most important skill for data work. After all, the best analysis in the world is still only useful if you can get people to understand it and convince them to act on it.

“You need to be able to interact and explain things,” says Edouard Harris, cofounder of SharpestMinds, which connects aspiring data scientists with mentors to help them land jobs in the field. “The job isn’t all about working with data, it’s working with people, too.”

Data visualization is a key skill here, because while non-technical colleagues aren’t going to be able to understand your code, everyone can understand a bar graph. But if it’s not presented clearly, visualized data can mislead or confuse. As you learn how to create charts with your code, it’s worth taking a little time to study data viz design. Design skills can make your work more attractive, but more importantly, they’ll help you highlight the most important parts of your results and avoid confusing your audience with superfluous information.

Written and spoken communication skills are important, too. Data scientists are often asked to share reports about or present their work. They also often have to collaborate with colleagues who work in both technical and non-technical roles. So you’ll need to be able to present your conclusions in a way that makes sense to everyone and you’ll also need to be able to understand what non-technical colleagues need from you.

How to Develop This Skill

Practice, practice, practice. Form good habits by writing reports and explaining what’s happening in your code in your notebooks as you build projects. (In data science, your programming work often happens inside “notebook” software that allows you to mix explanatory text, live code snippets that can actually run, and charts and images. This allows you to work, write, and run code in a programming environment, but also add text, generate images, and add other context that makes your work easier for other people to understand at a glance).

Try describing one of your projects to a non-technical friend or relative. Can you explain it? Are they drawing the conclusions you want them to? Can you answer any questions they have about what things mean or how you arrived at certain insights?

Of course, there are also courses and tutorials available, particularly in the realm of data visualization and design. Here’s a guide to data visualization design, along with some tips.

How to Demonstrate It in Your Job Search

This should be evident in the projects showcased on your GitHub—they should feature clean, clear, attractive charts, and they should be presented in a notebook format, with layman-friendly explanations interspersed with the code, covering the process, analysis, and conclusions.

And of course, your application materials and interview are also opportunities to demonstrate clear communication skills. Your resume should be well-written and results-oriented—this communicates that you understand what matters to your audience. In the interview itself, you’ll need to be ready to talk about your projects on both technical and non-technical levels. Be able to explain what you did, technically, and also be ready to explain why you made the decisions you did, what your results mean, etc. Being able to speak clearly to both technical and non-technical interviewers about your project work is one of the best ways to demonstrate your communication skills.

6. Critical Thinking and Problem Solving

While in a data analyst role, you will typically be given problems to solve; as a data scientist, you’ll be often expected to find insights on your own. Curiosity, critical thinking, and problem solving are key.

It’s important to remember that in most jobs, the right questions are the ones that impact the company’s bottom line. Not every analysis you could do is going to be worth your time. To be able to tell the difference, you’ll need critical thinking skills and a solid understanding of your company, your competitors, and your industry.

How to Develop This Skill

This is a skill you’ll continue to develop on the job, but it’s definitely something you can practice and work on even before you get hired. Building data science projects specific to a company can be a great way to make yourself stand out when you apply for a job there, and they’re also a great way to practice this kind of thinking.

You can even practice this without taking the time to actually build the project, since what’s really important here are the questions and thought processes. Do a little research on a company and then ask yourself: What kinds of things might positively impact their bottom line? What kinds of data would you need to investigate them? What types of analyses would you perform? How would you make the case that this data and these analyses are important to the business?

How to Demonstrate This Skill in Your Job Search

This can be demonstrated via the projects you choose to highlight on your resume. The more relevant these projects are to the real job, the more likely it is that your application will stand out.

“The thing that gets you the best outcomes tends to be aiming yourself as focused as you reasonably can be toward a particular domain,” Harris says. In fact, it’s often worth spending the time to pick out a few jobs that you’re very interested in and build projects specifically for them, working with data that addresses a real business question they’re facing. More than one employer in data science has told me about a candidate who shot to the top of their list by building a company-specific project and reaching out directly. It shows that you have the technical skills to build the project and that you’ve already started thinking about and engaging with business questions relevant to the company. Employers like that.

Keep in mind, however, that this time-consuming approach is high-risk, and only worth it if you have a human contact at the company who you’re fairly sure will actually look at your submission.

Once you reach the interview stage, demonstrating critical thinking and problem-solving skills will be even more important. Interviewers will test your technical skills, but they’ll also be probing to see if you’ve spent time thinking about their business and the things that matter to them. Most interviewers will give you an opportunity to ask them questions. That’s a great time to highlight that you’ve really considered their business problems by asking very pointed, company-specific questions—about their data, their business or market, etc.

Just be careful that you’re not asking a question with an answer that is, for example, easily available on the company’s website! There are no shortcuts here. You have to do your research beforehand and spend time thinking of incisive, meaningful questions.

7. Statistics

Statisticians sometimes joke that data science is just a hyped-up version of statistics, a profession that’s been around for many decades. There’s some real truth to that, too. Data scientists may be using coding languages and machine learning models that statisticians in the past could only dream of, but under the hood, it’s all statistics.

You don’t need a PhD in math to become a data scientist, but you do need a solid understanding of probability and statistics. This will help you identify what types of analysis are appropriate and assess your results to be sure they’re accurate and meaningful. In other words, statistics knowledge is the difference between knowing your conclusion is valid and merely hoping it is.

How to Develop This Skill

If you’re still in school, you can probably take a class. Most colleges and many high schools offer probability and statistics courses. If you’re not in school anymore, the usual options apply: There are many online courses (Khan Academy is particularly well-liked for teaching math topics) and a variety of other stats resources out there. Most data science–focused learning platforms also teach statistics, and these courses may help you learn specifically how to apply statistical methods in your programming language of choice.

How to Demonstrate It in Your Job Search

Statistics skills won’t be the flashiest thing on your resume. In fact, they don’t really need to be listed on your resume at all, although you could demonstrate your knowledge by mentioning which statistical methods you used when you write the bullet points that describe each of your projects.

As with all of these skills, this knowledge should be woven throughout your projects; this is a skill that’s demonstrated in the doing. But it can be a good idea to name the types of analyses you’re performing and, in the project notebook, explain what’s happening under the hood.

You may also be quizzed about why you chose the statistical methods you used in a particular project during a job interview, so it’s important to be able to explain what you did and why! For example, in your notebook (and in an interview), you could explain that in your population analysis project, you chose stratified sampling rather than random sampling or cluster sampling to avoid the possibility of bias and ensure that subgroups are represented in the sample. Being able to back up your choices by talking about the math that underlies them demonstrates that you know what your code is doing and you know your data well enough to have chosen a statistically valid way to analyze it.

This may all sound like an awful lot, but don’t worry! You can take it step by step, and you really don’t need any experience to get started. I’ve seen many learners go from total beginners to full-time data scientists. You could be the next one. All it takes is the courage to try. Good luck!

By Charlie Custer - The Muse
The Muse
Expert advice to answer your career questions.