Will automation eliminate or reduce the need for data scientists? Either by aspiring practitioners mulling an entry into the field, or employers hoping to reap the benefits of data science at a lower cost, this is a question that frequently rears its head at conferences (or webinars these days).
Observers typically point out that automation in areas such as data processing or data visualization will only make it increasingly easy for business experts to get what they need without human input. Indeed, Gartner previously predicted that 40 percent of data science tasks to be automated by 2020. Detractors hence reason that demand for data scientists can only go down.
Data scientists have nothing to fear, however. Here are three reasons why automation is unlikely to kill data science roles.
Automation is simply a way to do things faster
Alexander Gray, vice president of AI at IBM Research described automation in AI as the mechanization of tedious activities. In an interview published on an IBM site last year, he called automation tools “a timesaving benefit that data scientists embrace because they seemingly enjoy thinking more than tedium”.
To Gray, automation is akin to giving data scientists smarter and more powerful tools to assist them in their jobs. Of course, much like how the proliferation of digital has changed what office workers do, more capable data science tools will inevitably change how data scientists work.
Not only will automation empower data scientists to do more, the impact and value of their work to the business organization will also increase. Assuming data scientists keep up and not entrench themselves in the past, this transition will only make them more valuable than ever.
Overcoming automated errors
Another reason why humans in the loop are not going away soon would be the inability of automated tools to realize that they might be going off tangent. “While automation offers the potential to do things better and faster, it also has the potential to propagate human errors if there is poor science underneath them. It is much easier for this to occur than most people think,” said Gray.
Citing from personal experience, Gray noted how even teams of PhDs from top schools make errors in statistical nuances that result in poorer data models. “The need for data scientists to have a strong understanding of the underlying principles will not go away, because human oversight will always be needed for the most important applications.”
In a nutshell, data scientists are required to verify the correctness of the results coming out of automated tools and to make sure that the models are operating optimally. Efficiency might seem unimportant for tasks that run infrequently; but ask algorithms and AI get incorporated into every façade of our lives, any bloat will directly impact the bottom line.
The role of human judgment
This brings us to the next point that requires humans can fill: the ability to understand the business problem. Any data scientist will attest to how the challenge is not always technical. Aside from deciding on the right algorithms to use, or writing a script to prepare a data source, the data scientist must also interpret and address the business problem correctly to select the right data source or interpret the results correctly.
Data scientist Michael Li, founder of The Data Incubator summed it up this way in a contributed piece he wrote recently: “Real-world data are notoriously dirty and many assumptions have to be made to bridge the gap between the data we have and the business or policy questions we are seeking to address. These assumptions [are] highly dependent on real-world knowledge and business context.”
This means that data scientists play an indispensable role when it comes to formulating various assumptions: from the proxy variables they need, a realistic time frame for analyses, as well as defining appropriate control groups for accurate comparison. This requires human judgment, which is something no automated tool can offer.
When all is said and done, data science cannot be automated away. Highly trained and experienced data scientists will always be required for their expertise to craft data-handling code, pick the right data source, and design the optimal algorithms to extricate the insights needed by the organization.
And as automation increases, expect productivity to go up. This can bring down costs, bringing data science within reach of more organizations, and jacking up demand in turn. Ironically, this means that automation will ultimately increase demand for data scientists instead of decreasing it.
Photo credit: iStockphoto/Nattakorn Maneerat