This blog was written by Ginny Gao, Senior Data Analyst at Indellient. On May 4, 2017 Ginny presented some key concepts in Data Science at our monthly Lunch & Learn and shared how to use Data Science to uncover hidden patterns and solve business challenges.
Data is everywhere. Today’s data and its characteristics was the focal point of my Lunch & Learn presentation. Data Science, specifically, is a field that sparks high interest in me, and it was my pleasure to explain some key terms and highlight the potential for us as a company to tackle one of the hottest subjects in the technology world today.
So, what exactly is ‘Data Science’?
I don’t think there’s one answer to this question, as every Data Science practitioner has their own definition. It certainly is a very broad field. A common definition is that it’s a multidisciplinary field where scientific methods, processes and systems come together to extract knowledge from data in various forms. But a more simple way to describe it would be “insight discovery”. This involves not only technical specialties in machine learning, statistics, databases and data processing for example, but also business strategies, domain knowledge, and communication to keep Data Science projects objective-oriented and applicable.
Data comes fast (Velocity), in different forms (Variety), in huge amounts (Volume) and varies in quality (Veracity).
Data Science Can:
- Uncover patterns;
- Identify opportunities/risks;
- Increase efficiencies; and
- Produce reliable and repeatable decisions and results.
The broadness in Data Science might make people think Data Scientists need to know everything. But what’s more important is to have “an intense curiosity”, according to Dr. DJ Patil, former US Chief Data Scientist, and “a desire to go beneath the surface of a problem to find the questions at its heart”.
Practical Examples of Data Science
Before transitioning to Data Science in practice, I went over some key terms that helped connect classical theories to real-world application, such as hypotheses, models, machine learning and algorithms, and gave some examples. We then dove into a use case analysis using Decision Tree – a graphical (flowchart-like) representation of decisions and their possible outcomes. The algorithm is relatively simple to use and popular for smaller trees, can handle numerical and categorical input and output data and mirror human decision-making. After walking through the steps involved in a typical Data Science project, I emphasized on the importance of model testing to prevent overfitting or underfitting problems. The team had fun understanding graphical definitions of bias and variance, and their relations to total error.
I also touched on Random Forest, which is an ensemble method to improve model accuracy and performance. We learned the necessity of tracking model performance continuously, as they may become ineffective over time.
The presentation finished with possibilities of where analytics can be applied. It was a pleasure to guide the team along the Data Science journey and I’m glad that Indellient encourages knowledge sharing across teams – which enables us to collaborate and serve our clients better!
Have a question about Data Science, Big data, or analytics? We love to share our knowledge and hear from others! Contact us today.