Depending on who you ask, the personality traits of top data scientists can apparently span a huge range. However, when looking at these opinions in aggregate, some commonalities start to emerge in terms of what really matters and separates the average from great data scientists. From my perspective, here are the three key categories:
Curiosity
In my opinion, this is both the most important factor while simultaneously being the hardest factor to learn. At their core, what data scientists want to do is take lot of data and find the relationships that no one else has yet found.
This is a lot where the term ‘data scientist’ originated from – scientists fundamentally try to be on the active frontier of knowledge and find out things that haven’t yet been discovered. Without this innate desire and trait, it’s hard for someone to make a career as a scientist, let alone as a data scientist. When someone talks about a data scientist having ‘passion,’ this is what they’re talking about.
We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress. -Richard Feynman
One signal to roughly estimate someone’s innate curiosity is to discuss what they do in their free time. Although there are certainly many exceptions, it could generally be considered a promising sign when their hobbies involve something science-related (that is unrelated to their day-to-day job).
A subset that I believe falls within this category is statistical thinking. Essentially, knowing whether you’ve made a true discovery is intimately linked to understanding the significance of your finding – especially when you’re dealing with messy data, as most of us are.
The good news is that there are solid training options available for those wishing to concretely improve the quality of their statistical mindset.
Communicator
Unless they don’t have to convince anyone of the value of their work, communication is also a critical skill for a data scientist.
It’s especially tricky for data scientists to be good presenters, as they have to simultaneously be the expert in the technical minutiae of their project, while also be able to quickly and efficiently explain what their findings actually mean to a potentially very non-technical audience.
This dichotomy of required skills (great technically, great communicator) is one of the driving factors behind why data scientists are one of the more highly-paid professions. It’s relatively easy to find someone who is either great at digging into the data or great at presenting findings, but finding someone who is good at both remains a key factor in the huge, unmet demand for top data scientists.
Now that we have huge machines to absorb huge amounts of data, the value of big data is clearer. -Andrew Ng
Fortunately, unlike curiosity, it’s much more common for someone to be able to improve their technical communication skills. Results will vary, but there at least exist concrete frameworks to become a better presenter.
Technical Acumen
I use this term broadly here, and I rank this trait as below the first two for this reason: it’s the most straightforward to learn – at least to a passable degree.
Whether it’s the company offering the specific technology framework providing training or a colleague giving a hands-on crash course, learning how to be at least moderately effective in a new technology framework generally one of the less-rare skills amongst data scientist.
Of course, what falls under ‘technical acumen’ and the levels of talent span huge ranges, and all else equal the more technologically proficient someone is, certainly the better. The most straightforward way to improve this trait is what we’d expect – write code everyday, experiment, fail, fix, and repeat.
Be on the lookout for tips and shortcuts from the relevant community, and don’t be shy about at least trying their suggestions in your own work.