21 things I learned about data science after two years in the field
Plus one common-sense tip at the end.
I've been in the data science game for two years now, and here's my current take on what this field is really like.
I'm planning to make this a yearly thing - you know, like an annual checkup but for my data science thoughts. Consider this post the kickoff in the series.
Okay, here we go. 🤞
1. I have no idea what data science is.
There are as many definitions as companies.
I talked to senior data scientists and read countless job descriptions to fathom the role.
Closest definition I came?
A data scientist most probably is someone, who builds machine learning models – and knows what’s a p-value.
2. Data science is not science.
Some say it is. Some say it is not.
I’m leaning towards the latter. Knowing about the p-value and hypothesis testing or the difference between mean, mode and median don’t make one a scientist.
The job title sounds cool, though. 😎
3. It’s not sexy.
You know the article I’m referring to.
Now let me show you sexy:
Try to come up with helpful insights meeting the tight deadline,
when a so far perfectly functioning dashboard breaks, so you have to fix that immediately,
and while you’re fixing it, do a “quick”, but “urgent” analysis to prepare a decision making,
for which you have to do a lot of data cleaning,
and by lot, I mean a LOT,
and what about those insights you promised to deliver by the deadline?
Jesus.
Okay, it’s not always like that.
But still, Jesus.
4. You don’t have to be a Math genius.
Okay. If you want to be a researcher or come up with a new, top-notch model, you’ll need Math.
But most of us are not like that.
High school Math will serve you just enough.
5. Do you like machine learning? You won’t be doing that a lot.
You won't spend 100% of your time on ML.
Not 50%.
Not 20%, either.
Learn to love dashboards – probably you’ll build them a lot.
And probably they will be more useful to the business than the ML models you sooo want to build. 😅
6. Don’t learn each ML model.
Linear regression, logistic regression, random forest, K-means, XGBoost will get you far.
The rest you can learn on the go.
7. Data prep and cleaning is really 80%.
Yep.
8. Tools don’t matter, results do.
Managers don’t care what library or tool you use.
They want your insights.
Example: I don’t like matplotlib and tend to visualize in Google Sheets.
Surprise, surprise, my bar charts show the same numbers.
9. No fancy visualization is needed.
Line charts, bar charts, scatterplots and histograms will cover 95% of your visualization needs.
So don’t stress about it.
10. Python or R? Python.
You probably won’t need R, ever.
11. Data scientists are not great coders.
Your code runs without an error locally?
Nice job, champ! 👍
Now put it into production. Or show it to a software engineer.
It’ll be a humbling experience.
12. Don’t get too fixated on Jupyter notebooks.
Production-grade is the name of the game. Jupyter notebooks won’t get you there.
13. Quick solutions are dirty.
You can make quick solutions. Probably they will backfire once you need to run that code regularly or put into production.
Keep that in mind, make writing good code a habit. ✅
14. Domain knowledge > coding.
Domain knowledge takes your analyses to the next level. I truly believe it’s harder to master than coding.
15. Coding is necessary, communication skills get you promoted.
The more senior you get, the less important coding will get.
Brush up on them soft skills and learn to speak and present to people!
16. It involves a lot more communication than you’d expect.
Meetings, daily stand ups, presentations, discussions with stakeholders, summaries.
Oh boy! 🫥
17. With great data comes great responsibility.
If you work in a truly data-driven company, your insights and analyses will have consequences, because decisions will be made based off them.
Act accordingly, and take your job seriously.
Also, it’s a rewarding feeling.
18. Impostor syndrome hits you hard.
There’s an infinite amount of stuff that you could learn.
Unfortunately, you can’t.
Prioritize and learn stuff that you need at the moment.
Don’t stress about how much you don’t know. And prepare to forget even basic things like join!
19. Doing > consuming.
If you want to improve, do stuff and stop watching the 1000th YouTube video about {{insert topic that you don’t really have to know right now}}.
Future you will thank you. 👌
20. AI engineer is the new sexy.
But I guess you already know this.
21. I love it.
I can’t imagine doing anything else besides data as a profession.
+1. Family is number one.
Don’t be a workaholic.
Spend as much time with your family as you can.
Hug them & love them. Be there for them.
That’s what matters. ❤️
That was spot on Tamás!
Even after 6 years of doing this I strongly agree with the vast majority of points you made. You certainly won't be doing much ML (if any) and results matter much more than the tools you use.
Thanks for sharing an honest take about what you do as a data scientist! I’m currently learning Python and I was glad to read #10😅