My First Six Months as a Data Scientist
MSDS `21 alum Amanda West wrote a blog post in Towards Data Science about her first six months on the job, what she has learned, and advice for current and future data scientists.
The technical and non-technical lessons I’ve learned
Data science is cool, and yet I don’t think anyone who was a kid in the 2000s or earlier dreamt of doing it when they grew up. For me, I first wanted to be a veterinarian, then a park ranger, a dog trainer, a writer, and finally an economist. I still kind of want to acquire those last two job roles like some kids dream of being a lawyer-doctor-astronaut (or lawyer-doctor-mermaid, but let’s be realistic, mermaids are too fabulous to study something as dry as the law).
Of course, goals change over time, and I obtained my (masters) degree in was data science from the University of Virginia in May. A few months after graduation, I landed my first full-time position as a data scientist. I was ecstatic to be able to put my skills to use and prove myself to my new peers. I’ve also felt in over my head about 426 times in the 135 days since I started.
This is a shortlist of the technical and non-technical lessons I’ve learned since jumping from classroom to the field of data science. As a disclaimer, this is of course just my experience and everyone’s will be a little different.
1. You’ll Use <20% of the Tools Learned in School
I really enjoy programming in R. I even did my coding interview for said job in it when given the choice. Nonetheless, I use a mix of Python and the terminal (in VS Code) for 95% of my tasks at work, SQL about 5% of the time, and R exactly 0% since I started. As a result, the classes I took in R (about half the curriculum) have become exponentially less useful for me than those taken in Python. Many of the assignments too — such as web scraping, NLP, Apache Spark, or Tableau — I just haven’t employed at all. And that’s just par for the course, because it’s hard to guess what exactly you’ll do in the job you take.
If you are someone who is really committed to coding in specific language(s), I’d recommend asking your recruiters early on what the team uses. Even if you could “technically” code in the language of your choosing, if the team uses something else it’s going to make code reviews and integrations that much harder. For me, I actually do enjoy getting better at Python, but if it had been in something else like Scala I don’t think I would have liked it as much.
2. AWS (& Cloud Services) is King
AWS and I had not been much more acquainted than a firm handshake when I began, but now I use it on a daily basis. AWS is also notoriously bloated with a million and one offerings that make it hard to know what service you need for a given task. To make matters worse, searching for how to do something in AWS will often point you to 5 services that all sound basically the same.
To combat this, I’ve been casually studying for the AWS Cloud Practitioner certification using ExamPro. There’s just so much to learn, but as I go through it I find myself making connections and learning about services that might be handy to my job in the future. Eventually I’ll also take the exam, which will hopefully be a good resume-booster, too. I don’t think I’ll ever be an AWS wizard, but if I can become a relatively competent AWS paladin then I could live with that.
Services like Google Cloud and Azure are also popular, but if you don’t know which one you might use, I’d honestly still recommend AWS; TechJury.net found that AWS had a 76% share of new enterprise cloud adoption in 2020. Nonetheless, most companies employ cloud services to some extent and its good to know the basics of how they work and what they offer.
3. Understanding Hardware is Important to Effectively Troubleshoot Software
In school we were often given clean, toy examples in order to hone in on a specific high-level problem. At companies with smaller data science teams, you’re often going to be the one muddling through the real, messy, ugly data yourself. If its big data, trying to mold it to run in your pipeline can feel like trying to fit a rhino into your mother’s old prom dress.
Understanding what your computer is doing behind the scenes will make all the difference. For data that’s having trouble processing, I use commands such as watch -d -n 0.5 nvidia-smi and htop to track things like GPU/CPU usage and memory, and df -h to monitor the size of files in a given directory in case I overrun on space. I also use tmux sessions in order to have multiple terminal windows open at once and to keep my work from disconnecting when ssh’ed into a remote machine. Finally, when I find a solution that works, I’ll still look online for an alternative with better O(n) complexity, which can save minutes to days of processing time when working with big datasets.
These are just a few ways I combat data that tests the upper limits of my machine — I’d love to hear your own tips and tricks!
4. Googling Everything, All the Time
As a freshly-minted data science novice, the learning curve is constant and relentless. The tasks you will be asked to do are going to feel completely out of left field a lot of the time, which means you’re scrambling to figure out a solution in the moment and solve bugs you didn’t even know existed. You’ll learn to navigate the inner complexities of the most random things on a daily basis, and while you won’t necessarily map A → C → Q ever again, you will start getting better at programming and the codebase will start making more sense with every iteration.
All in all, “data scientist”, “problem-solver”, and “professional Googler” are pretty much the same thing. For me at least, I feel like I’m learning at least as much at my first job as I did while I was in college, which came as a bit of a surprise (after all, I was paying someone to teach me things there).
Companies that use popular data science programming languages are great, because heavily-used languages also conveniently have the best Stack Overflow posts (which have saved my life on countless occasions).
5. Your Bad Habits Don’t Just Suddenly End When School Does
This isn’t data science-specific, but I put a lot of pressure on myself while in school and didn’t let loose as often as my peers did (although I lived on frat row, so that could also be why). I’d study for the entire week (or sometimes month — calculus is hard) before a midterm, barely leaving my room except to print out more practice tests or refill my coffee. On non-midterm weeks I’d push myself to study late into the mornings, then jostle myself awake, throw on sweatpants, sandals, and an old pullover and hustle over to the 8 or 9am I had that day. From the amount I didn’t get out, I’m surprised I didn’t scare everyone at graduation by looking like a gangly, pale vampire.
All the while I assumed that once I got my degree I would live like a normal person; for one, I’d wake up every morning after 8+ hours of sleep (lol). I’d journal, read, meditate, exercise, and probably eat acai-kombucha-avocado-bowls before the clock even struck 7:30. Also, work wouldn’t feel like work because I love coding and therefore everything in my life would be amazing all the time — The End.
Yeah, so….that didn’t happen.
While working does add some additional structure, I have some bad news; if you’re a workaholic in college, you’ll likely be a workaholic…..at work. If you work remotely (like I do) and don’t need to turn your camera on a lot, then sweatpants, raggedy sweatshirts, and slept-on hair are also fair game. So basically college….without the friends or darties blasting Mr. Brightside outside your window.
In essence, you might have a little more disposable income, but it can feel like you’re living in the twilight zone when you’re working until 1am without any other reason than “I think I’m close to fixing this bug!”, only to wake up a few minutes before a meeting the next day and repeat the cycle all over again. Be kind to yourself — burnout is bad and balance is the key, something of which I’m still learning.
6. Live Well While Changing the World
I don’t subscribe to the naysayers that suggest data science will be dead in 10 years, because companies will always want information to increase profits — i.e. data — and people to analyze it quantitatively — i.e. data scientists (or engineers, or analysts, the name doesn’t really matter). In my job, I’ve already been able to directly impact big initiatives and make a positive difference with my data science, projects of which will live on for many years more after I’ve left them.
I’m also extremely financially-minded, and wish to work towards a point where I could start my own ventures or retire early if I so chose. With data science, there isn’t as much of the typical trade-off of doing something impactful and being financially well-off; you can be both without compromise. Combined with the fact that programming is fun, data science really is the perfect trifecta.
Conclusion
Obviously I’m a bit biased, but I am happy to have become a data scientist and still believe that the bouts of inadequacy and self-doubt that I have are worth it for the bigger picture. For those in college, what are you most nervous about in regards to becoming a professional? And for those who are working now, what was your biggest culture shock when you made the transition yourself?
As always, thanks for reading! If you’ll excuse me, I’m off to my part-time job consulting on an ML algorithm for some hermit crabs….