If you read through job postings for data scientists and data engineers, you will find overlap in the required knowledge, skills, and education. In fact, a company’s goals for the two positions can sound similar. At the end of the day, though, a data scientist is different from a data engineer. A data scientist cleans and analyzes data, answers questions, and provides metrics to solve business problems. A data engineer, on the other hand, develops, tests, and maintains data pipelines and architectures, which the data scientist uses for analysis. The data engineer does the legwork to help the data scientist provide accurate metrics.
“On some job platforms, you will see more data engineering jobs posted than data science jobs,” said Adam Tashman, Co-Director of the M.S. in Data Science (MSDS) Online program. “We recognize that our master’s students may have an interest in data engineering, which is why we are launching a new course this fall.” The three-credit Data Engineering class teaches students the essential environments and tools for data engineering. Topics include Linux, software development and testing, database design and construction, creation and deployment of containers, and data load/transform/extraction. According to Tashman, “the course will be very applied.”
Regardless of whether a student is interested in data engineering or data science, the Data Engineering course is expected to be a popular elective. Students start with learning Linux, an operating system that is used at the enterprise level by many companies. “It’s important they are really solid with Linux and are taught best practices,” said Tashman. “Many will need to work with Linux in a future role, particularly for cloud computing.”
Another component of the class is teaching students how to do their own testing. Tashman gives an example as to why this is important. “What often happens is you hand off your work to someone else to write the tests and that introduces problems. You know your work, but the second person may not know exactly how it works and something is lost in translation.” Tashman believes it’s better to know how to write your own test and then pass those tasks on to others.
Students in the MSDS Online program are given some database foundation, but the Data Engineering course will take it further and teach students how to build the databases. Student exercises mirror how a data engineer would interact with a customer. What do they need to ask to spec out a job, what is the end goal, and how much load does it need to support? Tashman believes this applied exercise will help them build better databases and prepare them for future work.
“Big tech companies like Facebook will include interview questions that include a coding exam and expect you to write SQL queries,” said Tashman, who before joining the School of Data Science worked as a data scientist for startups and as an analytics consultant in finance and technology. “If you don’t have the practice, you’ll be totally blindsided and drop out early. But if you know and have practiced it, then you should do well.”