Data science

From HLWIKI Canada
Jump to: navigation, search
Traditional data model vs. data science model
On left, aggregate data over a 6 yr. patient trial;
and on right, aggregated data over eight months in large data initiative
Are you interested in contributing to HLWIKI International? contact:

To browse other articles on a range of HSL topics, see the A-Z index.


Last Update

  • Updated.jpg 24 June 2017


See also Big data | Bioinformatics | Clinical surveillance technologies & mashups in public health | Data management | Data management portal | e-Science | ImpactStory | Open data | Semantic web

" science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining...." — Wikipedia

Data science (related: data curation, e-Science and big data) is used to describe the an information-based discipline that requires specific skills to accumulate, manage and manipulate large data sets. The field requires practitioners to apply statistics, mathematics, text retrieval and natural language processing to analyze data in robust ways, and to interpret results accordingly. The field is related to the enterprise and in open source movements. The goal of data science is to make it easier to find and use relevant data (and datasets) in order to identify patterns and conduct more precise calculations. Data science plays a critical role in how we access and massage data and how we conduct research in the sciences. From intelligent searching that integrates understanding of text and the intentions of users, to integrating multiple ways to access information, data science is the current fad. In health care, data science aims to help us collect data about medical treat­ments and use it to predict beneficial treatment outcomes for patients. Personalized care is one of the areas that can be aided by better data; it can make hospitals more efficient and help them to address preventable patient complications such as blood clots and hospital re-admissions.

Features of data science

Data science is at the centre of an interdisciplinary, emerging field

Data comes from the Latin word "datum" meaning a "thing given". Although the term has been used since the 1500s, modern usage started in the 1940s and 1950s as practical electronic computers began to input, process and output data. Data science (originally "datalogy") was first coined by Peter Naur in 1960. In 1974, Naur published "A concise survey of computer methods", in which he uses the phrase data science in its overview of data processing methods.

  • Data science is a field that lies at the intersection of statistics, computer science, applied mathematics, data visualization and information science
  • As Stanton says (2012), " science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, management and preservation of large collections of information. Although the name seems to connect most strongly with areas such as databases and computer science, many different kinds of skills - including non-mathematical skills - are needed."
  • Data science comprises at least three major elements or activities:
  • statistical modeling and mathematical reasoning
  • data pipelines, programming languages and “big data” tools, and
  • real world topics and case studies.
  • Other areas include conducting logistic regression analysis, predictive modeling, clustering algorithms, decision trees, Hadoop, data pipelines, data visualization, R, python
  • Data science is linked to concepts such as web 2.0, "collective intelligence", crowdsourcing and smart mobs since users add data to applications, making them more useful
  • Data is the future because it can be turned into data products
  • Google, Amazon, Facebook and LinkedIn represent a first wave of data science; data is extracted from databases, systems and other sources via social networks
  • The “data” in data science is often heterogeneous and unstructured as text, images, and video and comes from networks with complex relationships among entities
  • Data is cleansed, deduplicated and made ready for meaningful analysis
  • Data scientists perform interactive visual analysis; draw up visual narratives, data visualization and infographics

MOOC on data science

Undergraduate degree programs

Master’s degree programs

Doctoral degree programs


Certificate programs

Short courses

Key websites & video


Personal tools