"Big data refers to the massive amounts of data around us, which can be aggregated and measured by technological advances in micro- and nano-electronics, nano materials, interconnectivity telecommunication infrastructure, massive network-attached storage capabilities, and commodity-based high-performance computing. All credit card transactions, cell phone traffic, e-mail traffic, video and images from networks of surveillance devices, satellite and ground sensing data for weather and climate, now generate massive data and information. Personal health information related to genome sequencing and extensive imaging in medicine has driven a revolution in data analytics and predictive models that inform decision making whether identifying security threats or making diagnoses and treatment decisions for patients. — Schadt, 2012
“Big data” (also enterprise big data, smart data & even data science) is the buzzword or catch-phrase of 2015-2016 and for good reason. It appears everything is being digitized and as such, huge data sets are available to researchers and data scientists. How do researchers use this data? The idea of having data just a few clicks away is interesting but when it is not created in a way that is easily searchable or extractable, access is still problematic. Additionally, there are issues about ownership, management, preservation, and the rights the library offering it may or may not have regarding access.
In simple terms, big data refers to the tools, processes and procedures that permit the creation, manipulation and management of large data sets. Thus a new data paradigm has emerged which has broad application in research such as medical informatics, bioinformatics, genome searching and data-driven research where large volumes of data are transformed into knowledge. In some circles, big data is contested though it is used to refer to the availability and use of data broadly speaking, in structured and unstructured formats. At least one expert says that other terms are preferable such as data curation and data science. The "big data" era is a result of search and discovery technologies that can extract value from massive amounts of information. "Big data" is connected to health care, Silicon Valley, e-commerce and the private sector in that it is used to be competitive in predicting market growth. The prediction is that we create and replicate 2.8ZB — zettabytes, i.e. 2.8 million million gigabytes — of data and the ‘digital universe’ will reach 40ZB by 2020.
The debate about big data and its value in innovation and growth is prominent in the pharmaceutical and medical industries. It is associated with open data and text-mining, and linked to clinical and patient data harvestible from patient records and related health systems. Big data goes beyond the literature and refers to the vast stores of data in databases, especially clinical and research data in clinical trials, most of which waits to be mined. According to the McKinsey Institute, some major domains are implicated by big data: 1) healthcare in the United States, 2) the public sector in Europe, 3) the retail sectors in the United States, and 4) manufacturing and personal-location data globally. The Harvard Business Review has said that data scientist is an emerging field; McAfee says that "...big data is far more powerful than the analytics of the past. Executives can measure and therefore manage more precisely than ever before. They can make better predictions and smarter decisions." One of the related areas of big data in medicine is translational medicine; another is data fabrication. Ross & Krumholz argued in 2013 for sharing clinical trials data more openly.
The Data Citation Index from Thomson Reuters is the first source of data discovery for the sciences, social sciences and arts and humanities; DCI indexes leading data repositories of interest to the scientific community, including two million data studies and datasets
CSAIL looks at the issue of big data as "fundamentally multi-disciplinary"; the MIT team includes faculty and researchers across technology areas, including algorithms, architecture, data management, machine learning, privacy and security, user interfaces, and visualization; also domain experts in finance, medical, smart infrastructure, education and science
videos about IBM's Big Data Initiative; demos, interviews, presentations, tutorials and more; according to IBM, big data spans four dimensions: volume, velocity, variety and veracity; seehttp://www.ibm.com/bigdata