Data science: Managing time vs managing energy

January 2, 2020January 3, 2020 ~ Andrew ~ Leave a comment

Sometimes, I know I’m going to have to work a relatively difficult technical problem. The kind of task where a few minutes of inspired work could easily be more productive than 8+ hours of me mindlessly grinding at the same problem, getting frustrated about how I’m not getting anywhere.

Not all of my tasks are like that, but for the tasks where I know will be stretching my abilities to their max – my success or failure has very little to do with time spent, and much more to do with energy available.

Over the course of a few years, performance on these type of deep work tasks is also what separate the upper-tier vs replacement-level data scientists.

On the other hand, for the more mundane tasks, the time management perspective is much more relevant. If you have five half-hour meetings tomorrow, you’re going to need 2.5 free hours to attend those meeting. If you have to read some requirements or sanity check someone else’s code, that’s more time dependent and it doesn’t really matter if you’re feeling especially inspired.

For a data scientist working in a realistic environment, how can the right balance be found?

This is going to be a multi-part article series discussing time management vs energy management; this first part will discuss what I think is the first step: getting on the same page with your manager.

Clearly communicate with your manager (and their manager if necessary)

Especially for data science managers who have never actually been a data scientist themselves, it’s likely that they will not be thinking about your calendar and workload from an energy management perspective.

To them, they probably don’t see much of a problem with multiple intermittent meetings showing up on your Tuesday calendar. They might think it’s maybe ‘not ideal,’ but they also probably won’t appreciate how much of a productivity killer this is for a data scientist trying to do deep work.

Put another way, some managers don’t think interruptions are especially disruptive, and how you can get right back into near-optimal productive mindset within three minutes. For a manager who is multi-tasking all day, that is their reality. Unfortunately, data scientists have very different type of challenges to work on, and not all managers appreciate this yet.

It’s not just having a block of time – if I already had my energy drained from a previous burdensome administrative or bureaucratic meeting, I’m going to have a hard time being highly productive for the immediately proceeding deep work task. This requires me switching between two very different mindsets, and that switch is neither quick nor efficient.

I’ve discussed this with my data science industry group, and we generally agree that a good (if somewhat uncomfortable) approach is to have a very frank, upfront conversation with your manager about this.

High urgency vs high priority items

Conversely, there are maintenance and administrative items we all need to take care of as part of the job, and some of these require meetings. And sometimes, the meetings are urgent.

Realistically, from a manager’s perspective, of course they want you to be productive and spend most of your energy working on the team’s strategic goals. However, unless you clearly communicate with your manager, they might not be aware that you’re getting overwhelmed with high-urgency items, and have little energy left for the actual high-priority items.

Depending on your manager and the team’s current workload, they might be able to immediately assist in helping to remove the lower-priority productivity killers off of of your plate – or if the team is too strapped for resources right now, at least recognize where you’re coming from and mention they’ll try to assist in the future.

However, if your manager doesn’t at least theoretically recognize that you being interrupted all the time might not be the best data science environment – it might be time to look for a new job.

Wrapping up

In a future article, I’ll discuss some more thoughts about data scientist energy management (beyond just getting on the same page with your manager).

Below are some examples of related concepts that I’m currently thinking about, but will have some (hopefully) useful thoughts to share later:

Diplomatically saying no, and getting your manager to step in if necessary
Delegating tasks and efficiently onboarding team members
Shamelessly blocking off extended blocks of time on your calendar
Recognizing early-stage burnout and taking aggressive action
Forcing yourself to take a vacation that’s actually relaxing (and not a death march of fun)

What do you think? Am I right, wrong, way off? Let me know – feel free to email me or connect on LinkedIn.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Preparing for a data science interview in 2020

December 31, 2019January 7, 2020 ~ Andrew ~ Leave a comment

There’s a ton of variance in how data science interviews are designed – sometimes even within the same company.

Different groups from the same company could be looking for completely different things – and that variance could be due to more than just managerial-style differences.

On the one hand, one group’s biggest bottleneck could be more on the data engineering side, where just getting reliable and efficient access to relatively clean data is a substantial challenge at the moment – so pretty much any even rudimentary analysis done on the data is quite valuable.

On the other hand, you could have groups who are already world-class at how they handle the ingestion and access to their specific dataset – and the data scientist is more of just a pure consumer of the data (with minimal data wrangling). In that case, what passes for ‘state-of-the-art’ analytics is way different for this group – relative to other groups that are struggling to even get clean access to their data.

With that in mind, below are a couple thoughts I’ve had recently about preparing for a data science interview – regardless of the type of data science work the particular group is doing:

Be ready to explain why you’re interested in data science

I’m not (yet) talking about why you’re interested in the particular company – I’m talking about why you even wanted to be a data scientist at all.

I know my opinion might differ here from some others, but to me, this is one of the most critical aspects of getting hired as a data scientist – especially if it’s your first professional role. So much of data science is being passionate, having that deep driving desire to keep poking around in the data, sometimes for no other reason than you’re curious and you just need to know.

If you have this intrinsic motivation, where you’re always looking to learn more and improve your knowledge of the world – I think that makes up for so many other potential weaknesses.

You can teach someone technical skills; heck the market for educating data scientists is soon-to-be a multi-billion dollar industry.

However…teaching someone to care care about finding new insights? To be motivated to keep learning about the small details of the business domain that are probably 90% likely* to be ultimately irrelevant? I think that’s much harder to teach.

Learn what you can about their business domain

Again, I probably differ from others giving thoughts on data science interviewers here. Not that I’m the only one recommending that it’s good to learn about the company’s business – no, I know other people recommend that.

Where I differ is in how relatively important I think this point is. There is such a huge variance between industries and datasets about what the specific, particular challenges are for the data science team – and these challenges themselves change all the time.

If you’re looking at a biotech startup using AI to help simulate protein folding, vs a company doing autonomous driving with computer vision, vs a company trying to model and prevent credit card fraud – these companies will have completely different perspectives on data science.

As in, when you ask one director from company X to define data science, you’d possibly get a completely different answer from a comparable director at company Y (and company Z). It’s not that their definitions would necessarily conflict; it’s more that these people could be focusing on completely different aspects of data science.

This relates back a bit to the previous point; different companies are at different stages in even getting their data ready to be hardcore analyzed. Whether it’s a governance or security issue (like with financial data) or just insanely huge data sets and computational bottlenecks (protein folding), the real-world pain points could be completely different.

If you go into an interview with a company and demonstrate at least somewhat of an understanding of their current struggles, you’re ahead of probably 80% of data science candidates. Put another way: if you can have a somewhat intelligible discussion about an industry-specific, real-world struggle the group is currently having – that could be quite impressive.

It’s much more common to see a data scientist who essentially thinks their technical skillset is highly generalizable and immediately applicable between different industries. While this is generally true…when you’re doing cutting edge stuff, that generalizability starts mattering a lot less. The actual time-consuming struggles you run into require a more nuanced, customized approach.

If the company (or anyone associated with their data) has given a public talk or presentation, it would probably be worth your time to check it out. You’ll get a feel pretty quickly for what their industry-specific struggles are, and then you’d be able to much more directly speak to how you could help.

What about technical skills?

I haven’t talked much about that yet. For one, there’s a ton of resources out there already discussing the technical aspects of data science interviews. Also, this article is getting pretty long, so I’ll probably write a different article about this later.

Wrapping up

What do you think? Am I right, wrong, way off? Let me know – feel free to email me or connect on LinkedIn.

*Note (from comment about how most of the little details you learn about an industry are probably irrelevant): Some might say, well if 90% of these details about the business domain are irrelevant, why learn about them at all? Or, why not just focus on the 10% that matter?

Great point. However, I would say that it’s nearly impossible to know beforehand which little details will turn out later to actually matter. Especially when you’re hanging out at the cutting edge, and no one really knows what’s going on. If you’re a student of the domain and just keep picking at it and learning more, over time you’ll probably become the go-to person who just seems to ‘know’ the right questions to ask.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Some data scientist personality traits

December 29, 2019January 3, 2020 ~ Andrew ~ Leave a comment

With all the hype currently surrounding the data science industry, we’re increasingly seeing people from a wider range of backgrounds thinking about whether they should themselves become a data scientist.

A challenge for many new potential entrants is that they haven’t really considered whether they’re comfortable with the mindset required to be at least a somewhat-successful data scientist.

Below are my thoughts on personality traits that you’ll generally see from high-performing data scientists. It’s neither an exhaustive nor authoritative list; just something that me and my little industry group generally agree upon:

Relentlessly curious

If you were to take a random sample of ten high-performing data scientists, you’d probably find that at least seven of them strike you as notably curious.

You kind of have to be curious to be a data scientist. Nearly your entire job is to find insights in the data that probably no one else has ever found, and then make a credible and engaging presentation of why the insights actually matter.

If you don’t really care about fundamentally learning more from your data set, you’re probably not going to have an especially fulfilling data science career.

Humility + never stops learning

Like probably many data scientists, I have a lot of respect for the late physicist Richard Feynman.

There’s a lot of things to like about him, but something I deeply appreciated about him was his innate humility and philosophy on learning. Specifically, always acknowledging that, pretty much no matter what you know, there’s almost certainly a ton more that you don’t know:

The first principle is that you must not fool yourself and you are the easiest person to fool.

-Richard Feynman

When you think you have all the answers, you’ll stop looking for new knowledge; you’ll stop caring about learning. And looking for new knowledge…is kind of the whole point of being a data scientist.

Openness to new ideas

I’m a pretty big believer that the data science industry is rapidly going through some rapid, fundamental changes.

Even if you were superhuman and were somehow able to know everything there is to know about data science right now…a good part of your knowledge will probably become functionally obsolete within three years. Or at least require major modification.

Something I struggle with is getting comfortable with the current technology and tools I use on a daily basis – and not unwittingly becoming that old curmudgeon who puts up a lot of resistance when new tools are (justifiably) suggested. It’s something I’m working on; I’m currently trying to get myself to transition from R to Python.

However, I think an even bigger issue is when we sometimes start thinking that we ourselves have become ‘the’ expert within a certain subfield within data science. That is the day we stop learning, stop listening to new ideas, and stop searching for new knowledge. That’s also how you alienate an entire generation of younger data scientists.

To the person who thinks they’re king, new ideas are nothing more than a challenge to their ‘authority’ – and not even worth really considering. More of a nuisance to wave away.

This isn’t only a purely theoretical concern; it’s an increasingly discussed problem within the scientific and research community. People start thinking they have all the answers, and not only stunt their own personal growth, but actively stunt the growth of others.

Not afraid of asking ‘dumb’ questions

The more you’re hanging out on the cutting edge, the less likely any question you ask will actually be ‘dumb.’

Put another way, if you’re not asking a ton of questions, you’re probably either (a) not trying or (b) not really hanging out at the cutting edge.

I was talking to someone from my little data science industry group about this, and one thing we agreed on is how you’ll almost never see an experienced, high-performing data scientist berate someone for asking a ‘dumb’ question.

The philosophical message you send when you start being the dumb-question police: we hate learning, everyone here is perfect, reputation is way more important than learning, get back in line, we embrace mediocrity.

If you show me a data scientist who has stopped asking questions, I’ll show you a data scientist who is about to exit the industry.

Wrapping up

There’s a ton of ways to become successful in data science, and I don’t at all claim I know the path to get there. Quite the opposite – I know very little, but at least am actively trying to improve.

However, if you’re looking to get into data science, and none of the above personality traits describe you…then you might want to consider asking around a bit more before committing to becoming a data scientist.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Getting your first job in data science

December 26, 2019January 3, 2020 ~ Andrew ~ Leave a comment

If you’ve never worked as a data scientist before, it’s relatively hard to get that first data science job. Probably way harder than it should be; more on that below.

Put another way, it’s way easier to get data science job offers once you’re already working as a data scientist. The classic chicken-and-egg problem.

Some personal background: prior to 2014, I had done some independent data science work – but nothing too formalized. I also had no advanced degree (just a bachelor’s), so it was going to be relatively harder for me to get that first ‘official’ data science job.

At the end of 2014, I was hired as a senior data science consultant within the fintech industry. I’m now an employee and team lead.

Some thoughts on getting that first data science job:

Contractor vs. employee

It’s way easier to get initially hired as a contractor vs. hired as an employee – especially at larger companies. Of course, the flip side is you can get fired way more easily as a contractor, and you essentially have no protections that employees will generally have.

I think there’s a lot of misinformation out there about working as a contractor – which is a broader topic for probably a later article.

However, to summarize: I think there’s some potential advantages to initially working as a data science contractor (vs employee). And, especially if you’re just looking to get hired for that first data science job – there’s way less friction involved in getting hired as a contractor.

For example, depending on the company, getting hired as employee could involve: a formal resume submission, prescreen conversation, screening conversation, on-site interview, HR ‘fit’ conversation, aptitude test, coding test, full background check.

Conversely, for getting hired as a contractor, it could be as little as two phone calls – and that’s it, you’re hired.

Having business domain experience is a major plus

For me, I think this was the main factor in getting initially hired into my first ‘official’ data scientist role.

I had some relatively deep domain knowledge from my previous work in industry as a quant/market maker for stocks, bonds, options, ETFs, and futures. At the time, this market knowledge was relatively useful for the role I was being recruited for.

…or at least having a strong interest in learning about the business domain

However, even if I didn’t have that domain experience, I still don’t think it would have been impossible to get hired for that role eventually. I would have probably had to take at least three or so dedicated months to prepare though.

Now that I’m somewhat in a position to hire people, what I look for is less about “how experienced are you in this business domain,” and more about “how interested and motivated are you to learn more about this business domain.”

We’re all learning (myself certainly included), and I’ve been finding it a bit difficulty in finding people who are genuinely interested in actively improving.

Technical vs non-technical skills

Somewhat generally, I think it can be easier to acquire new technical data science skills vs. acquiring some of the most critical non-technical data science skills.

Of course there are exceptions, and my sample set is biased because I only generally get to talk to people who had some decent baseline of technical data science acumen.

However, I was discussing this concept with someone from my little data science industry group, and we generally agree. In other words, we think it’s harder to teach someone to be truly passionate about learning and improving, vs maybe teaching them the latest Scala optimization within Databricks.

For how fast the the field of data science is changing, I think being actively adaptable is much more important than being an expert in any one technical skill – especially with how fast the industry is changing.

Master’s degree programs, online courses, etc.

In my opinion, I think there’s a bit of a bubble in data science education. Everyone and their dog seems to be running their own data science bootcamp, online course, certificate program, whatever – and I can’t stop seeing ads for data science master’s degrees from all sorts of universities (of varying repute).

Not that these programs are bad – it’s just, there’s way too many of them (the subpar ones), and hence way too many fresh data science graduates with skillsets that kind of just blend together.

At least for me, I’m more interested in someone who has produced something (even if was just a personal project) vs someone who has all the polished ‘credentials’ – but can’t explain why they’re interested in doing data science in the first place.

It’s easy to get a credential – especially if you’re paying someone for it. It’s hard to actually produce something, and have something to show for it.

Conclusion

Of course, with these thoughts I don’t speak for anyone else except for myself…but I’m guessing more data scientists than not would agree with me here. Mostly.

It’s hard getting that first data science job – but if you’re willing to think more independently about the whole hiring process…it might go way faster (and easier) than you might think.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Some traits of the best data science communicators

December 24, 2019January 1, 2020 ~ Andrew ~ Leave a comment

Something I’ve been thinking about lately is how many data scientists seem to be underemphasizing the importance of non-technical skills.

Specifically, if you can’t effectively convey the value of your work to a non-technical audience…it’s going to be hard to be successful. Your client or user needs to understand why your analysis is valuable, and oftentimes the work might not clearly speak for itself.

Below are some thoughts about a few common traits I’ve seen from good data science communicators.

Begin discussions and presentations by clearly stating the context

Although pretty much every data scientist would agree with the above statement, many seem to have difficulty with consistently putting it into practice.

I can certainly understand why: in the time leading up to a discussion, you were probably deep in the weeds on a highly technical concept within the analysis. Big picture thinking wasn’t exactly a priority at that point – the details were.

However, one of the quickest ways to lose your audience is to start by immediately throwing technical concepts at them without first providing context – especially if they are non-technical.

For example, a data scientist might just immediately jump into the deep end: “Here’s a correlation for variable X vs variable Y, within context Z and assumptions W and V.”

This sounds great to someone who was just previously immersed in this problem space – but there’s a good chance your audience was not, and they’ll have no idea what you’re talking about or why they should care.

Taking thirty seconds to clearly establish why this discussion is happening is a relatively easy step, but in practice many data scientists don’t do it.

Openly address limitations and key assumptions

The more complex your analysis is, the more likely it is you had to make multiple critical assumptions. Depending on your client’s preferences, some of these assumptions have probably been made without explicit approval from the client or user.

This itself isn’t a problem, as in practice it would be way too much overhead to immediately check every ongoing assumption with the client, and you’d probably start annoying them.

However, when you are presenting your analysis (even if it’s just an iteration and not the final product), you now need to clearly state what these assumptions and limitations are.

At this point of the discussion (or presentation), this where things can get a bit tense. If you’re presenting highly complex analysis, it’s likely that you’ve made at least one key assumption that the client would either disagree with, and/or they were not aware of it.

And even if you don’t think the assumption is that impactful (“it’s just an edge case”), there’s a good chance that the client, who probably knows way more about the business domain than you, would feel otherwise. And maybe strongly.

If you’ve previously worked towards building up the relationship with your client, this is where it could start to really pay off. If you have a good relationship with your client, this is a smooth conversation where limitations are openly discussed, as among respected peers. If not – this could be a tough conversation.

Actively encourages questions and discussion

This is a point that newer data scientists might not appreciate as much: you want your client to be actively engaged.

Put another way, if you throw some some sweet analysis for them, and they don’t have much to say…90%+ percent of the time, that’s a really bad sign for you.

You want them to be asking questions, asking why certain decisions were made, commentating on a somewhat obscure aspect of one of your graphs, telling you that they have a slightly different interpretation. If they’re not really saying anything, it’s probably because they’re about to essentially throw your analysis into the trash.

If the discussion or presentation is in front of less than ten people, and you’re the only one talking for more than five minutes straight…something’s probably wrong.

It’s way too easy to lose your audience – and without little checkpoints to make sure they’re still engaged, you run the strong risk of them completely not understanding your analysis. And it can be hard for data scientists to remember that it’s their responsibility to keep the audience engaged.

Wrapping up

It’s easy for a data scientist (especially me) to just kind of passively assume their analytics work will stand on it’s own – and the perceived value won’t be very dependent on the quality of how it’s discussed or presented.

In my opinion, you have to keep actively reminding yourself of how that probably isn’t true, and continue to make the effort to build skill in better selling your work.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Data science education: not enough emphasis on non-technical skills

December 23, 2019December 31, 2019 ~ Andrew ~ Leave a comment

I’m starting to get the feeling that most data science education programs are underemphasizing the role of non-technical skills – and sometimes by a significant margin.

I was talking to my little industry group about this – and we didn’t quite agree on what I say below. However, we did generally agree that, especially for newer data scientists, there’s currently a lot of room for improvement in the land of data science education.

It doesn’t matter how great your technical skills are if you don’t how how to use them

It seems there’s a maybe a growing group people who (on paper) are highly technical with all the right words on their resume, and maybe even some interesting early-stage data science experience. However, when it comes to even having the mindset for actually producing something that a client or user might value – there might be a steep drop-off.

It’s not that they don’t know the latest ML algorithms, or they’re not proficient in R/Python – rather, they just essentially freeze when faced with even the prospect of a professional real-world problem.

Of course, they might be quite effective at doing the preliminary data exploration, maybe some clustering or regression to get a feel for what’s going on amongst some ‘key’ variables. But when it comes to producing something that’s remotely beneficial to a real-world client (or user)…there could be a bit of a disconnect.

Obsessed with the data, but blind to the client

There seems to be a (maybe growing) disconnect between what data science boot camps, online courses, and master’s programs are emphasizing – vs what actually matters when making a client happy.

Again, this isn’t a technical skill problem. It’s more, I’m sensing that these programs are slowly forgetting the real-world aspect that none of your world-class technical skill matters if you immediately alienate your client or user.

Especially with the growing trend of automating away the more technical aspects of a data scientist’s job, it’s becoming much more important to be able to instill trust that you actually understand what the client wants, and are interested in requesting their ongoing feedback.

You need to convince the client you’re interested in their business domain

The client does not care what fancy algorithms you’re using. They do not care if you produce some fancy graphs. They care if you help them find relevant, credible insights in their data that they were previously unaware of.

I’m pretty sure most data scientists wouldn’t disagree with that sentiment – but I think many of us are losing sight of this critical piece.

In your initial meetings with your client or user, if they get the impression that you (a) don’t really understand their business domain, and (b) you’re not apparently that interested in learning…it’s highly likely that you will lose that client – and maybe quickly.

In other words, if they get the impression that your specific methodology for finding insights is invariant to whatever domain (eg fintech, healthcare, biotech, manufacturing, social media) you’re looking at, you could be in trouble.

It’s easy to find ‘anomalies.’ It’s hard to find valuable anomalies.

If you’re not familiar with the domain, and you have no persistent desire to learn, you’re simply not going to be able to know whether the ‘anomaly’ or correlation you found is at all useful – or rather has been plainly known to the industry for the past five years.

It doesn’t matter how fancy your framework and statistical inference was to make this ‘discovery.’ If you present ‘findings’ like this to a client and feel that you’ve done something substantial and useful – you will probably lose that client.

Wrapping up

Especially amongst newer data scientists, I feel like there is an over-reliance on technical skill – to the extent that it doesn’t even really matter if you understand the data you’re looking at…let alone the business.

In educational settings, that’s maybe fine. In the real world, that’s how you lose a client.

You don’t have to be an expert on the domain to get a client, or to generate business. However, if you continually show a lack of at least trying to become more of an expert – you’ll probably struggle in your data science career.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Data science: talent vs. skill

December 22, 2019December 30, 2019 ~ Andrew ~ Leave a comment

Probably the best way to ensure you have a mediocre data science career is to believe that your personal abilities are largely static, and high performers were more or less just born that way.

In other words, thinking it’s all about just kind of having talent – vs the acquisition and development of skill.

The greatest excuse in the world: “I’m not talented enough”

I love this excuse – it’s one I often struggle with myself. It’s just so natural, so comforting.

It means, there’s no use in trying, no use in exerting yourself. No use in trying to build skill, because…well, skill doesn’t matter. No use in grinding to get better at something, because it’s pretty much all about TALENT.

Someone better at programming than you? They’re more talented. Someone makes better graphs and visualizations than you? They’re more talented. Someone is a better communicator of technical concepts to non-technical audiences? Obviously more talented.

It’s so easy to fall into this mode of thinking – as I often do. The problem is, it can quickly become a one-way depressing street to effectively deciding to just never get better.

When you tell yourself you’re not talented enough…it feels quite liberating. It just completely removes struggle from the equation – the struggle that will pretty much always accompany any efforts to build skill.

It can be really, really hard to persevere through the early stages of building a new skill

For example: I’m trying to become a better writer and communicator of complex ideas, so I try to consistently write articles here about the data science industry. I know that my initial articles will suck. And I hate sucking.

I’m finding out that I’m all-pro at finding excuses to quit, with the most persistent one being essentially “you’re not good at this, the effort isn’t worth it, so you should stop.” Again, a very alluring argument.

Except, it’s completely false. Not the part about how I’m currently bad; that is pretty accurate. More so, the part about how I can’t get better.

I’ll spare you the long philosophical and mental arguments of why this is, but here are some articles to check out if you’re interested why I strongly believe that skill development is way undervalued.

Data science skill example: making better graphs

By “better” graphs, I roughly mean: compelling to a non-technical audience, accurate and compelling to a technical audience, not overwhelming, non-inducement of eye-glaze.

I think data science graphs are an interesting example to talk about, because it kind of fits into the more subjective “art” category – where of course, the more subjective something is, the more it’s talent and the less you can actually build skill and make personal improvements (…right??).

But then, when you start to break it down, you start to think of some potential areas where you could maybe make some incremental improvements:

Paying more attention to whether the color scheme roughly looks good or not
Getting more comfortable asking for iterative client/user feedback – or from some other accessible domain expert
Thinking about whether the graph is overwhelming at first glance – and brainstorming a couple potential reasons why
Asking if the various graphical elements make sense on one graph, or if it should be split into multiple graphs
Thinking about what steps could be taken to de-clutter the graph
Focusing more on thinking like the client – relentlessly asking yourself, what exactly do they care about?

None of these considerations is a silver bullet – and depending on the graph type, might not even be relevant. However, when you start listing out some potential ‘little’ things that could maybe make your graph less bad – that is the process of you building skill. Now talent starts mattering way less.

When you break down a “talent”-based concept into the much more mundane “here’s how I can maybe make this suck a bit less” concept – that is how you build skill. The alternative being: nah bro it’s just innate talent, there’s nothing worth me trying to improve here, move along.

Wrapping up

When you implicitly believe that you can’t really change, that’s a great way to strategically demotivate yourself.

Telling yourself that you’re just “not great” at learning or understanding advanced ML algorithms? Now you’re guaranteeing you’ll never be good at that, because you’re telling yourself to never even try.

Alternatively, if you just keep picking at something, you will get better.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

The role of passion in data science is extremely undervalued

December 20, 2019December 31, 2019 ~ Andrew ~ Leave a comment

As the title implies, I have a pretty strong opinion on this one. And it’s not because companies aren’t placing enough emphasis on hiring data scientists with high passion (vs just great technical skills).

No, I’m more saying passion is undervalued because I’m hearing more about companies who are actively working to kill the passion of their data scientists – turning them into corporate drones.

It’s probably not intentional…or at least, I don’t think it is. It’s more, companies aren’t realizing just how unique of a mindset the best data scientists must have to be world-class…and how fragile that mindset is to getting overwhelmed with bureaucracy and corporate politics.

It’s suffocating, and I don’t think many companies appreciate this. The best data scientists look at the world completely differently – and without even realizing it, many companies are stomping that out.

The most valuable trait a data scientist can have is passion

I was talking to one of my industry buddies about this, and he’s currently going through a tough time.

Long story short, he’s having to philosophically choose between (a) maintaining the passionate, curious mindset that has made him a high performer, really caring, really digging into data to find true insights vs (b) becoming a mindless corporate drone, with the passion squeezed out due to corporate and bureaucratic pressures.

He really, really cares about the work he’s doing, but his job is just doing all it can to kill that inner flame. And it’s not in one fell swoop – from what I can tell, he’s just being pulled in a million different directions and getting slowly suffocated by internal corporate politics and mindless, legacy bureaucracy (‘well we’ve always done it this way’).

What makes a data scientist valuable is not wearing the most polished suits, saying the fanciest buzzwords, having the firmest handshake, maintaining steadiest eye contact – it’s finding insights in that data that no one else has ever found.

To be world class at that, you need passion. You need energy, you need the inherent motivation to keep digging – long after everyone else would have quit. You need to appreciate how ‘good enough’ analysis will almost never result in true, valuable insights that actually matter.

You have to spend some of your spare time reading about the industry, lurking on message boards to find out whatever additional little nugget of information that may help lead to a breakthrough in your next study. You need to commit to becoming an expert in your business domain, and never stop asking questions.

You need to have passion that transcends what you’re actually getting paid for – not because you’re looking to give away free work, but because you think this stuff is really cool.

That mindset, where you always want to keep improving, keep learning, becoming a world-class domain expert – it’s so valuable, and so rare. I just don’t know if you can teach passion, and convince someone to care.

This mindset takes a ton of energy. You never stop questioning yourself, never stop digging, always looking to deliver that world-class analysis. And yet – some companies don’t seem to recognize this.

They don’t understand just how easy it is to kill passion. How just a few extra meetings, just a couple extra budget spreadsheets, just a few meetings with other managers to fight over timelines, just a little internal audit, just a bit of non-critical bureaucracy, just a bit of corporate politics…how all of that adds up and will kill the passion of a data scientist.

And you know what – that’s fine. The incentives in corporate America right now is not to be valuable, it’s to be a drone. It’s not to be passionate and deliver value, it’s to stay in line and keep your head down.

As an employee, it’s very easy to lose – but employee home runs pretty much aren’t rewarded. It sucks, but it’s the message that a lot of corporate America is sending right now.

I think companies are going to have to choose. They can either treat data scientists as, well, scientists – where curiosity and passion are absolutely the currency of what makes them valuable…

…or, companies can kind of just keep going and assume that forcing data scientists to be drones is well of course the right move – my leadership book said so! Company culture! Process!

So what’s the solution?

…I don’t know. I’m guessing my buddy is about to quit, even without having another job lined up. It’s a bad situation for everyone involved.

Are some data scientists (including me) just too ‘sensitive?’ Sure, and I would readily admit that. It’s something I’m working on. However…where’s the line between ‘sensitivity’ and ‘passion?’ For me…there isn’t really one. It’s kind of a package deal, if you truly care about what you’re working on.

If you’re somehow still reading, one potential recommendation:

Let’s say a data scientist comes to you and says they’re feeling a bit overwhelmed with any work-related items that (a) aren’t data science, and (b) are non-essential. I would take that conversation very seriously.

In not so many words, they might trying to tell you that they’re being turned into someone they don’t want to be. That could very well be the last ‘open’ conversation you have with them, before they quit.

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Using AI to automate political fact-checking

December 19, 2019December 29, 2019 ~ Andrew ~ Leave a comment

In an age where we can’t even agree on what facts are anymore, it can seem like a lost cause to even attempt political fact-checking.

And to be fair, right now things don’t look great. However – and this is a topic that I have a history with – I think it’s a cause absolutely worth supporting.

Back in the day, I was a cofounder of a startup called fuseGap. It was a political/educational social app with the goal to make it cool to be informed about politically-relevant facts. So instead of who can bring the spiciest political opinions…it would be more about who could bring the best facts.

Given that I’m here writing this…yeah it didn’t work out. We learned a lot, but ultimately it’s really, really hard to get people motivated to learn facts in a golden age of the political hot take.

Lesson learned: people are way more interested in providing and consuming opinions than in actually looking to see if those opinions have any basis in reality. Of course, few people would admit that they feel this way personally…but I’m pretty sure it’s true.

Anyways, all that doesn’t mean that it’s not important to have the general population informed about political matters – if anything, it’s more important than ever.

Which brings me to a cool startup I’ve been looking into: Full Fact. They’re a startup based in the UK, and they’re using AI to automate political fact checking.

Why does it matter?

Given that most news today is more about ‘engagement’ and basically getting the viewers riled up (as opposed to…reporting on reality) – it seems that fact-checking is unfortunately becoming a relic of the past.

This has been going on for a while now: flashback to the 2012 presidential campaign, when Mitt Romney’s campaign hit us with the “We’re not going to let our campaign be dictated by fact-checkers.” Fun times.

Things haven’t gotten much better since then. Depending on who you ask, some might say that the UK and USA are starting to get torn apart by rampant polarization.

When many media outlets are more interested in making you think your fellow citizens are the enemy, vs actually reporting the facts…I’m not sure that will end well. But hey, at least the audience will be ‘engaged!’

So getting back to Full Fact. While most politicians are out there throwing hot takes left and right, it’d be nice to have someone checking whether their spiciest statements have any basis in reality.

The problem is, fact-checking is hard. While also being not exactly the most fun activity, and not the most profitable either.

And with more and more people getting politically outraged and outspoken about all sorts of things, there are more and more impactful hot takes to sift through. It’s becoming more of a scalability problem…

…which sounds like a great use case for artificial intelligence!

Full Fact’s automated fact-checking project: some traction, momentum

Full Fact has some decent momentum, along with some big-name players supporting them in a few different ways.

In 2016 they announced a partnership with Google’s Digital News Initiative, and their automated AI fact-checking approach has been featured on BBC. They’ve also been covered by TechCrunch, Wired, The Guardian, among others.

More recently, in May 2019 they were co-winners of the Google AI Impact Challenge.

Again, fact-checking is hard. It’s generally a thankless endeavor, and pretty much no one gets passionate about supporting the fact-checkers. It’s only when the facts happen to support someone’s opinion that people will generally even acknowledge the effort.

Conclusion

I think it’s cool to see people using AI to attack very impactful, very immediate real-world problems.

I do have some longer-term questions about how well Full Fact’s approach can handle the complexities and subtleties of fact-checking, but so far they certainly are off to a good start.

Now if only we could get people to care more about facts instead of opinions…

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Soft skills and emotional intelligence for data scientists

December 18, 2019December 29, 2019 ~ Andrew ~ Leave a comment

If you’re like me, you’d probably like to believe that the quality of your data science work will stand on its own, and speak for itself.

Whether people like you as a person…that’s maybe a secondary concern, at best.

I personally really want to believe this because (a) I’m heavily introverted, and (b) that would free me up to focus more on developing technical skills – skills that I can see being more concretely relevant to data science or machine learning.

However, I do think the role of emotional intelligence (or soft skills) is increasingly becoming important long-term if you want to do well as a data scientist…regardless of whether you’re a manager or individual contributor.

I really want to resist this…but I’m slowly coming around to accepting it. Reluctantly.

It’s way easier to move past a ‘bad’ mistake if people like you

If you’re doing hardcore data science, there’s a good chance you’re hanging out at the cutting edge – whether you’re the first to find a precursor of a new contextual insight, or maybe you’re the first one to even look at these datasets in this combination.

In any case, you’re probably spending a lot of your time in uncharted territory.

With that in mind, you will eventually make a mistake in front of the end-user: maybe you find out an ‘insight’ was actually a correlation that’s already well-known to the business, you spend a week digging into an ‘interesting anomaly’ that ended up being nothing more than a pretty basic data error, you present a graph that is implying an obviously impossible situation.

This stuff will happen if you’re hanging out at the cutting edge.

Put another way: if you’re not making mistakes, then you’re probably not trying.

Some of your mistakes will be blatantly obvious to end-users – maybe more than they’re letting on. When that happens, they’ll have a split-second decision to make – is this person incompetent…or is this a really difficult problem and growing pains were always expected?

You might be shocked at how starkly different an end-user’s reaction is to a data science error – with the variance based largely on whether they essentially liked the data scientist or not.

Of course, being likable doesn’t mean you can deliver garbage and expect the user to be happy. I’m more saying that gray areas are way more common than it might seem, where there’s a thin line between something seeming like solid analysis, vs substandard.

People will give you more benefit of the doubt

Similar to above, chances are you’re never quite getting crystal clear direction from your client or user about what exactly they want to accomplish.

They probably have an idea (of varying vagueness) of what they’re looking for, or what rough hypotheses they’d like you to check – but unless you’re a very junior data scientist (so pretty much a data analyst at that point), you are going to be receiving ambiguous direction.

With ambiguous direction comes a lot of responsibility – and sometimes ambiguous results.

More specifically, it’s more likely that you’ll decide upon an approach that the end-user probably wouldn’t have been 100% onboard with…but again, they have a ton of wiggle room here for how they react and value your work.

If you have previously invested in building trust and establishing clear communication with them, in addition to showing some occasional uncertainty once in a while, they’re way more likely to trust you when you make an informal recommendation of what approach to take.

If they don’t like you, for whatever reason, and then you make a mistake…there’s a good chance they’ll be getting a second opinion, and maybe permanently.

Iterative user feedback is becoming more important – and friction is a killer

A great way to avoid catastrophic mistakes with your end-users: catch the mistake while it’s tiny, and very little time (or reputation) has been implicitly staked on a mistaken belief.

If you’re operating from a mindset of “leave me alone, I’ll do this hardcore analysis and then get back to you,” you run a high risk of incurring large, unforgivable mistakes – that were entirely preventable.

Alternatively, if you make it a point to get frequent, iterative feedback from your end-users, it’s way less likely that small mistakes would ever get the chance to morph into big ones.

Another thing about getting useful iterative feedback: your users have to like you if you want quality feedback.

You’re looking at multiple informal meetings (when they could have been spending time on other things), and if the user knows you get defensive about every little perceived slight – they’re just not going to tell you when you’re maybe straying off the path. But they’ll tell someone else, and you could be off the project.

Conclusion:

It’s way easier being a data scientist if your users like you. Now, how to actually get them to like you is a topic for another article (or series)…

The views expressed on this site are my own and do not represent the views of any current or former employer or client.

Datavestment

Data Science | A.I. | Startups

Data science: Managing time vs managing energy

Preparing for a data science interview in 2020

Some data scientist personality traits

Getting your first job in data science

Some traits of the best data science communicators

Data science education: not enough emphasis on non-technical skills

Data science: talent vs. skill

The role of passion in data science is extremely undervalued

Using AI to automate political fact-checking

Soft skills and emotional intelligence for data scientists