How to think like a data scientist

I’ve been doing data science for a few years now, and recently moved into a kind of unofficial ‘team lead’ role.  This means recruiting, interviewing, and all that fun stuff.

In the informal discussions I have with my data science industry buddies, this topic has been coming more frequently (in not so many words): how exactly do data scientists think?

As more of us move into positions where we help interview data scientists, this is becoming a more relevant concern.

With that in mind, here are some notes about what the group was thinking:

 

The mental cost of context-switching is really, really high

The mindset of trying to find new, valuable insights in the data is completely different than the mindset needed for doing standard administrative burden tasks.

Another way to look at it: if you’re a manager and want to kill a data scientist’s productivity for a day, schedule administrative/planning/check-in meetings – and spread them throughout the day.

In talking to my little industry group, we’ve found it can be a struggle to effectively communicate this concept to managers.

How the mindset of making sure your paperwork/bureaucracy is just right to satisfy the unofficial ‘auditors,’ is just a completely different mindset from being curious, and finding a new correlation that no one else has ever even thought of.

To the manager, it’s not ‘administrative burden’ or ‘bureaucracy’ – it’s just ‘stuff we have to get done,’ which is both true and completely understandable.  It’s just – really mentally expensive to keep having to switch the mental perspective from one form of thinking to another.  A completely different way of looking at the world.

 

It’s more about managing your energy than managing your time

This is a subtly tough one to fully appreciate, especially for newer data scientists.  Long story short: especially as a data scientist, where you’ll be doing stuff that requires inspiration and creative thinking, your productivity really doesn’t scale well with time.

In other words, you could have a really productive half-hour (kind of a mini-breakthrough), and what you accomplish in that half-hour would way exceed everything else you produce that day – or even more.

This is also a difficult concept for some managers to fully appreciate – especially if their background isn’t data science.  To some of them, data scientists are doing nothing more than what a standard data analyst would do, and just having to implement tightly defined, precise business requirements.

In that case, time spent would indeed be much more highly correlated to productivity – but as pretty much anyone who has done data science (for more than a year) can attest, that’s not at all what a data scientist actually does.

 

There can be a big difference between having fancy analysis vs actually finding useful insights

This also ties into a concept I have strong opinions about; namely that there’s a ton of excessive hype (and buzzwords) in the industry right now.

My little group sees this trickling down into some real-world internal team meetings, where some data scientists spend most of their time implementing the coolest new ML algo, the fanciest graphing library, or using the most hip buzzwords – and then are quite confused when their end-user or client essentially says their work is garbage.

The problem in these situations is that there’s more focus on the ‘flashy’ part of data science – trying to be impressive – vs actually sitting down and taking the time to understand what the user wants, the struggles they’re having, what their rough hypotheses might be, what they’re looking for, what the data actually means.

Some data scientists think that, hey I’ll just throw the coolest stuff I have at this problem…and then wonder what happened when they get quietly taken off the project.

 

Long-term, it’s better to not get excited about early, premature ‘discoveries’

We’ve probably all ran into this – digging through some data, running some analysis, stepping through the output, and then…wait, that’s odd.  And then, ok this could be useful, it’s probably not a data issue – hmm, this could be huge.

The problem is…when you’re hanging out on the cutting edge, when you’re looking at data sets (or combinations of data sets) that not many other people have ever really looked at, it’s still highly likely that what you’re seeing is not a true insight.  And that’s expected; it’s something all of us would probably logically acknowledge.

However, if you’re like me and you get excited when you think you found something cool, you run into a long-term energy problem: the emotional trip down is way less fun than the trip up (in magnitude).

And if you’re looking for insights all time, while getting excited and disappointed in rapid succession – that’s a tried-and-true recipe for burnout.

 

Note: the views expressed on this site are my own and do not represent the views of any current or former employer or client.

Leave a Reply

Your email address will not be published. Required fields are marked *