The Data Dilemma
9 min read
Hello Data Lovers👋
In this article, we talk about Myths and Dark Secrets of Data Science 😱
Are you ready? Let's go! 🚀
Data Science is now hailed as the sexiest job of the 21st century with hundreds of people having the desire to become a data scientist. Although data science, being a buzzword, very few people know this technology to the core or understand it in its true sense.
Despite a lot of people showing interest in data science, it is very important to understand all the sides of data science. Indeed all the benefits, advantages, and capabilities are discussed in many forums, blogs…there are some of its dark sides too which you should know and understand before taking your next step.
Dark Secrets of Data Science
Like every coin has two sides, data science also has certain eyebrow-raising questions or I would say not so good sides of data science. Below are some of the dark sides which are enough to neutralize its hype of usefulness for the better world.
Many data science discoveries are obvious
When the bank looked for a way to predict loan defaults, they found that people with no savings were more likely to stop paying their debts. When the hospitals looked for causes of doctor error, they found lack of sleep was a big indicator. Tall people hit their heads more often. Bicyclists die from head injuries more often than couch potatoes.
Many of the problems we study have obvious answers that dominate the analysis. If the goal is to look for causes, well, the results are going to produce a mathematical confirmation of what we already know but with more significant digits. Is that worth the effort?
The statistical scientists have techniques for controlling for these dominant effects so that smaller effects can be examined, but finding subtle causes can require dramatically more data and study.
Is the answer going to be valuable enough to justify this?
Sometimes, despite doing all the hard work, data scientists find nothing i.e. no meaningful insights or patterns after the data visualization. Biologically, human minds are good at finding patterns even when there are no patterns present. In the case of data science, a lot of questions that pop up in the mind of data scientists are meant to validate and connections noticed by a human brain. Sometimes they find something and sometimes they don’t find anything.
Although, no result or negative result is also a valuable result of any work it is often unsatisfying for the data people doing all the hard work. Often these people end up with the conclusion that they might have missed something and be skeptical about their victory for nothing.
The Past & The Algorithms
All the data science related algorithms are based on data that are gathered from the past. This can make algorithms imitate the past but unfortunately not the future. Today, there are several fields that are changing at a rapid pace making it difficult for data scientists to predict the future.
As a result, only it can be summarized in the past. In such cases, no data scientists can predict the exact future but can only reveal what had happened before and it will be upon us to guess whether it’ll occur in forthcoming days or not.
Expensive human filters
Humans generally cost more to filter the data with their intelligence in order to build different training sets for artificial intelligence and machine learning algorithms. They can easily classify images, read any documents or listen to an audiotape for filling out the forms with checking the tight boxes in a consistent way.
A lot of people from numerous countries are doing this practice for building artificial intelligence training sets. Moreover, it is also an essential process, as, without this preliminary work, data science can’t begin.
This might cost you a good amount of money but it is generally completed in a manageable amount of time.
In the world of data science, hidden biases are present everywhere. Finding and eliminating biases in data science is difficult and is often considered art for the data scientist to do so.
As if it was easy then anyone would have found and removed it.
Though there are some statistical techniques which can be useful in finding such biases and removing it they are not the one which can promise a hundred percent and they are not even as automatic as we want.
The Data: certain data are impossible to get
Despite a large amount of data is currently present, still, there is a lot of data which is annoyingly difficult to find.
The majority of the data from any surveys or fields are never good enough to draw the correct insights and can lead to the wrong directions. On the other hand, apart from tools and data scientists, data science heavily relies on the available data which often leads data scientists to spend most of the time gathering accurate data initially.
Almost the majority of the data science-related projects produce thousands of graphs, patterns, and charts after examining all the combinations and sub-combinations.
Sometimes this may result in making correct and precise predictions, and sometimes it is not of any big help to businesses and organizations looking for precise patterns for their decision making.
Most of them invest in data science with the hope to find an answer which will help them to grow their business but in certain cases, they end up exploring something which is not required in the first place.
Myths About Data Science
Buzzwords that are totally unrelated to the topic are often used to attract people to the (mis)information, which most often ends up creating confusion among readers. This confusion and misrepresentation of the information under the cover of buzzwords have a convoluted effect on a reader’s decision-making capability and takes them on a route altogether different from what they intended to.
This trend can be highly visible currently in the field of Data Science as well.
Myth #1: Data Science is Just a Buzzword
Business leaders, journalists, and industry analysts are quick to use the latest jargon. The resulting noise can make it difficult to discern between industry hype and technologies or processes that can stand the test of time. Given the extreme hype about Data Science these days, it’s not surprising that some consider it just another buzzword or fad.
Data Science isn’t a buzzword or fad, however. It’s a confluence of time-tested disciplines, including statistics, math, computer science…that have existed in some form for many years. A few things that distinguish Data Science from its predecessors, including actuarial science and statistics, are access to massive amounts of data that can be stored cheaply, robust computing power, and quick access to predefined models.
Myth #2: Data science is exclusively for experts in Statistics and Mathematics
Data Science is not proprietary to some limited disciplines, it can be looked at like huge square in the middle of a crowded city where paths from multiple disciplines such as Mathematics, Statistics, Computer Science and Programming, Data Modeling, Visualization, Technology, Domain knowledge etc. pass through it.
While an expert in statistics or mathematics may get a good head start, cross-disciplinary experts bring with them the advantage of moving parallelly through different topics as a result of their past experiences.
Myth #3: Complex Models are Better Than Simple Models
Decision trees, statistical regression, and linear regression are not new, so the media pays less attention to them than deep learning and neural networks. Deep Learning and Neural Networks use complex models that are considerably more sophisticated than the models used to solve simpler problems because they are attempting to emulate arbitrarily complex functions.
Complex models are not necessarily better than simpler models for a few reasons. First, a complex model can be less efficient than a simpler model if the problem is relatively simple. Second, complex models can be costly in terms of processing power. Finally, complex models can lead to black-box approaches that are difficult or impossible to explain. While the results of a black-box solution may be “good,” black-box solutions don’t allow users to explore how a result was derived.
If users can’t explore how a result was derived, they can’t understand what went into it. If they can’t understand what led to the result, they can’t explain the details, which is not good, particularly in an audit scenario. Simpler models are easier to understand and explain.
Myth #4: Data Science Requires Massive Computing Power
The Big Data and AI hype have created the impression that Data Science requires massively parallel GPU-accelerated machines or huge clusters. While large Deep Learning and Neural Networks do sometimes require that kind of computing power, many use cases do not.
Problems that can be solved with simple models may only require a PC with 64 GB or 128 GB of RAM. If that’s not enough, two or three hours spent on a cloud may be all that’s necessary to build and test a model.
A Cloud environment, such as AWS, Microsoft Azure or Google Cloud, may also be necessary if the data processing or data cleansing requirements exceed the capacity of a single node.
Essentially, it’s more cost-effective to scale computing resources as necessary than to over-engineer a computing environment that is more complex and costly than the problem requires.
Myth #5: It’s Hard to Find Data Scientists
To overcome that obstacle, some organizations are trying to develop a Data Science practice that combines the expertise of several people.
A common mistake is to hire specialized expertise, such as a Ph.D.-level statistician or data scientist before it’s necessary. Company decision-makers believe the company needs such a person to gain a competitive advantage, but it is unclear what that person should do and for whom. Lacking a mission and purpose, the statistician or data scientist who longs to make a positive impact on the business but can’t likely resign with a better offer in hand from another employer. That’s why it’s often easier to hire specialized talent than to keep it.
Most organizations can start reaping the benefits of Data Science without highly specialized expertise or expensive software, but quite often they don’t know where to start.
Thanks for reading! If it was useful to you, please Like/Share so that, it reaches others as well.
📧 To get e-mail notification on my latest posts, please subscribe to my blog by hitting the Subscribe button at the top of the page. 📧