Real World Data

The concept of Real World Data (RWD) in the field of health is defined as data derived from sources associated with the results of a heterogeneous population of patients in real world situations.

Analysis of such data can generate real world evidence (RWE), which in turn can reveal significant perceptions about unmet needs, intervention pathways, and clinical and economic impact on patients and health systems.


An alternative definition allows to join principle and end: RWD is defined as the data that are collected outside the controlled constraints of randomized clinical trials (RCTs), in order to be able to assess what is really happening in normal clinical practice.

The scope of the Pharmacoeconomics is the one that has taken more push in the use of RWD. For the study of drugs, in the pre-marketing route, randomized clinical trials are the reference method, but the generalization of the results obtained has limitations related to:

  • Heterogeneity in drug response
  • Variability in adherence to treatments
  • Use of the drug in different populations / patients

Thus, RWDs are a very important complement to the results of clinical trials and their analysis, RWE, can report on the effectiveness and safety of health interventions in patients and accurately identify the risk-benefit relationship, demonstrate effectively the value of the product for its economic evaluation and maximize the return on investment, and therefore, is increasingly a desired instrument for decision makers at different health levels, from clinic to government.

The benefits of using RWD are (García, J.L. et al, 2014):

  • Estimates of effectiveness in different clinical settings
  • Comparison with research alternatives or clinical strategies to inform optimal therapeutic operations, beyond the use of placebo as a comparator
  • Estimation of the risks and benefits of a new intervention, including benefits and long-term damages.
  • Obtaining clinical results in a diverse population that reflects the range and distribution of patients observed in clinical practice.
  • Results obtained from a broader perspective than in traditional RCTs (outcomes reported by patients, quality of life and symptoms)
  • Usable data for the calculation of costs of health services and economic evaluation
  • Information on the application and prescription of products in clinical practice and on adherence to them
  • Data in situations where it is not possible to perform an RCT
  • Rationale for collecting data at more than one location

Where the information comes from

The Real Life Data are collected fundamentally from:

  1. Observational studies of disease registration and treatments * (also called “naturalistic studies”)
  2. Data from electronic medical records
  3. Health Surveys
  4. Pragmatic clinical trials **
  5. Patient records and routine administrative data
  6. Measures of morbidity and mortality and other clinical outcomes
  7. Prescription and treatment guidelines
  8. Natural history of disease progression
  9. Patient Experience
  10. Safety studies
  11. Studies on health-related quality of life (HRQL)

*     Include Post-authorization Observational Studies (EPA)
** Clinical trial performed in a large number of patients of more or less common characteristics for assess the efficacy of a treatment in a manner similar to how it will be used in clinical practice.

Data that, in turn, are part of a set of Big Data which contains data and information from:

  • Transactions
  • Records data
  • Events
  • Emails
  • Social media, websites, social networks, mobile phone applications
  • Wearable technologies and sensors
  • External feeds (web channels or web source or web information)
  • RFID (Radio-frequency Identification) or POS Data (Point of Sale)
  • Geospatial data
  • Data (biometric) data of digital recording, audio, images and video

The use of Big Data requires the acquisition of analytical skills in various fields of knowledge such as epidemiology, research methodologies, health informatics, health economics, measurement of patient outcomes (Patient Reported Outcome Measures PROMs), etc.

And in each of these disciplines, the use of analysis tools such as data mining, data visualization, predictive modeling, simulation, flow analysis, as well as the ability to analyze natural language text and work with unstructured data.

The 5 V’s of Big Data

The 5 main characteristics of the Big Data are known as the ‘5 vs’ (volume, speed, variety, truthfulness and valorization).

The main feature that defines Big Data is the large amount of information it manages (Ishwarappa, 2015). At present, when speaking of massive databases refers to magnitudes of the order of petabytes (1015 bytes) or exabytes (1018 bytes).

Another essential feature of Big Data is the enormous speed in generating, collecting and processing information. On the other hand, the ability to analyze such data must be very fast reducing the processing times presented by traditional analysis tools.

The third “v” that Big Data explains is the high ability to aggregate information from a wide variety of independent information sources, such as social networks, sensors, machines, or individuals. There are so many new technologies are needed to analyze this type of data in order to gain a competitive advantage. In this sense, Big Data systems allow the integration of naturally unstructured quantitative data, as well as graphics, text, sound or images.

Big Data must be able to intelligently analyze and analyze the large volume of data in order to obtain accurate and useful information that will allow us to improve our decision-making. Big Data demands not only that the data are many, analyzed and exploited at high speed, from various sources, but that these are truthful and therefore reliable.

Valuation is the creation of a distinctive competitive advantage that presupposes a good understanding of the client’s expectations and needs. To do this, the key data must be identified and processed, thus allowing:

  • Monetize data
  • Get new customers
  • Generate loyalty
  • Reduce costs
  • Improve brand image


A good example of this application is the Mini-sentinel program of the US Medicines Agency. This program has made it possible to detect new interactions, adverse effects of drugs and other safety problems that have led to the withdrawal of drugs or, through the application of algorithms to large databases with information, sometimes unstructured and coming from the real world. the modification of its indications.

The Institute of Knowledge Engineering (IIC) has participated in XX National Congress of Health Informatics, Inforsalud 2017, with the presentation of the innovative project “Integrated vision of traumatology in relation to waiting list of knee replacement”, in the which include experts from the IIC itself together with experts from the Health Service of Castilla-La Mancha (SESCAM). This project uses the information collected by the SESCAM computer systems, both Primary and Specialized. A standard methodology for Big Data and Machine Learning projects has been used in five steps:

  1. Specification of requirements, associated data and criteria to be followed for its validation
  2. Review of the state of the art and related prior projects
  3. Compilation of relevant information provided by SESCAM
  4. Construction of models
  5. Analysis of the results to choose the best models

Descriptive and predictive analytical techniques have been applied, as well as Natural Language processing on the texts of the computerized clinical history.

The Departament de Salut de la Generalitat de Catalunya launched the PADRIS program, a public program to do more and better research with the reuse of data.

The PADRIS (Public Program for Health Research and Innovation Research) will be managed by the Health and Quality Agency of Catalonia (AQuAS) of the Department of Health. The new program aims to contribute to improving the international positioning of Catalonia in scientific research, adding to the Strategic Plan for Innovation and Health Research (PERIS) to promote research in health oriented to people.

The reuse and cross-referencing of mass health data may facilitate studies such as the monitoring and surveillance of newly introduced drugs, devices or implants, the detection of interactions and adverse events which have not been revealed by standard clinical studies, to develop studies of comparative effectiveness, to follow patient cohorts or to increase knowledge about minority diseases.


If you want to know a little bit more …


García López, J.L. et al. (2014) Aportación de los “Real World Data (RWD)” a la mejora de la práctica clínica y del consumo de  recursos de los pacientes. Edición: Fundación Gaspar Casal.

IBM Global Services (2013) Analítica de datos: El uso en el mundo real de Big Data en sanidad y ciencias de la vida. Cómo las organizaciones más innovadoras en sanidad y ciencias de la salud extraen valor de datos inciertos.

Del Llano, J.E. et al. (Ed) (2016) Datos de la vida real en el Sistema Sanitario Español. Edición: Fundación Gaspar Casal.

Olshannikova et al. (2017) Conceptualizing Big Social Data. Journal of Big Data 2017, 4:3.

BDV (Big Data Value Association) TF7 Healthcare Group (2016) Big Data Technologies in Healthcare. Needs, opportunities and challenges. 12/21/2016.

Henke, N. et al. (2016) The Age of Analytics: Competing in a Data-driven world. McKinsey Global Institute. December 2016.

Ishwarappa y Anuradha, J. (2015) A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology. International Conference on Intelligent Computing, Communication & Convergence. Procedia Computer Science 48 (2015).


Other websites of interest in the subject: