"...data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining...."— Wikipedia
Data science (related: data curation, e-Science and big data) is used to describe the an information-based discipline that requires specific skills to accumulate, manage and manipulate large data sets. The field requires practitioners to apply statistics, mathematics, text retrieval and natural language processing to analyze data in robust ways, and to interpret results accordingly. The field is related to the enterprise and in open source movements. The goal of data science is to make it easier to find and use relevant data (and datasets) in order to identify patterns and conduct more precise calculations. Data science plays a critical role in how we access and massage data and how we conduct research in the sciences. From intelligent searching that integrates understanding of text and the intentions of users, to integrating multiple ways to access information, data science is the current fad. In health care, data science aims to help us collect data about medical treatments and use it to predict beneficial treatment outcomes for patients. Personalized care is one of the areas that can be aided by better data; it can make hospitals more efficient and help them to address preventable patient complications such as blood clots and hospital re-admissions.
Features of data science
Data science is at the centre of an interdisciplinary, emerging field
Data comes from the Latin word "datum" meaning a "thing given". Although the term has been used since the 1500s, modern usage started in the 1940s and 1950s as practical electronic computers began to input, process and output data. Data science (originally "datalogy") was first coined by Peter Naur in 1960. In 1974, Naur published "A concise survey of computer methods", in which he uses the phrase data science in its overview of data processing methods.
Data science is a field that lies at the intersection of statistics, computer science, applied mathematics, data visualization and information science
As Stanton says (2012), "...data science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, management and preservation of large collections of information. Although the name seems to connect most strongly with areas such as databases and computer science, many different kinds of skills - including non-mathematical skills - are needed."
Data science comprises at least three major elements or activities:
statistical modeling and mathematical reasoning
data pipelines, programming languages and “big data” tools, and
real world topics and case studies.
Other areas include conducting logistic regression analysis, predictive modeling, clustering algorithms, decision trees, Hadoop, data pipelines, data visualization, R, python
Data science is linked to concepts such as web 2.0, "collective intelligence", crowdsourcing and smart mobs since users add data to applications, making them more useful
Data is the future because it can be turned into data products
Google, Amazon, Facebook and LinkedIn represent a first wave of data science; data is extracted from databases, systems and other sources via social networks
The “data” in data science is often heterogeneous and unstructured as text, images, and video and comes from networks with complex relationships among entities
Data is cleansed, deduplicated and made ready for meaningful analysis
Data scientists perform interactive visual analysis; draw up visual narratives, data visualization and infographics