By storing methods in the same format within an open-source database, experiments become more reproducible. The Pistoia Alliance has created the Methods Hub3 which enables analytical methods to transition from text‑based information to fully digitalised, machine-readable instruction sets. When implemented on a widespread level, we can improve the integrity of methodology data by ensuring it is more available, accurate and complete.
IDMP compliance
The language used in the regulatory, supply chain, pharmacovigilance and clinical sectors is hugely inconsistent; the same drug or active substance can be referred to by several different names. This makes it difficult to share information across bodies, from drug development to distribution and, crucially, regulators. As of 2023, however, new regulation means all new drug submissions to the European Medicines Agency (EMA) must comply with select Identification of Medicinal Products (IDMP) standards. By improving data integrity through a structured approach to IDMP, companies can ensure that their data is attributable and accessible internally, as well as by other bodies and regulators.
To support the R&D sector in achieving this aim, the Pistoia Alliance has launched its IDMP Common Core Ontology project4 to help organisations semi-automate IDMP processes with an artificial intelligence (AI)-powered ontology. Once developed, the ontology will be provided as an open-source tool to ensure data integrity in IDMP compliance.
Mapping new areas of research
Data integrity is key to unlocking promising new areas of research, for example, the microbiome. Scientists are uncovering the bidirectional relationship between the microbiome and many different medical conditions.5 Additionally, emerging patterns reveal the effect of the microbiome upon various pharmaceutical therapies – and vice versa.6
The industry has spent over $1.7 billion on researching the microbiome over the past decade, yet microbiome data is scattered around different organisations and bodies, and in varying formats”
The industry has spent over $1.7 billion7 on researching the microbiome over the past decade, yet microbiome data is scattered around different organisations and bodies, and in varying formats. We can only realise the true value of this data if it is attributable, legible, contemporaneous, enduring and available.
The Pistoia Alliance has launched a project,8 which will see the development of an atlas for microbiome data, mapping it into a comprehensive framework. This will eventually allow AI and machine learning (ML) models to be applied in predicting patterns in microbiome interactions by condition, active substance, or even geographical location. This is only one example of the type of framework we can use to map new areas of research and accelerate R&D.
Semantic enrichment
Different organisations use different ELN vendors and data stored in an ELN typically varies in format. This causes difficulties when searching, sharing and retrieving information. To make data accessible and searchable, it must be made FAIR (findable, accessible, interoperable and searchable). We can achieve this through semantic enrichment – adding a layer of metadata to keywords. This will make key terms more likely to appear when scientists and researchers search for experiments or active substances by other names.
By making ELN data FAIR, we can also increase its capacity for provenance, meaning data becomes attributable to its original source. This enables scientists to see whether data has been peer-reviewed, or if the source is otherwise trustworthy. Pistoia Alliance’s Semantic Enrichment of ELN Data9 project encourages computer-readable, standardised data, allowing researchers to access the provenance of information, resulting in higher quality experiments, reduced rework, and better decision making.
Artificial intelligence and machine learning
AI and ML can only transform R&D when underpinned by accurate data, with integrity assured”
Life sciences organisations are increasingly embracing AI and ML to enable them to analyse data. In combination with natural language processing, AI and ML will make it possible for algorithms to ‘learn’, advances that will enable companies to develop treatments faster and identify more drug repurposing opportunities. AI can reduce the time it takes to synthesise new drugs by 40 to 50 percent and can save pharmaceutical companies approximately $54 billion in R&D costs each year.10
However, AI and ML can only transform R&D when underpinned by accurate data, with integrity assured. Without consistent standards, AI and ML are unable to effectively perform and produce quality results. The ‘garbage in, garbage out’ principle still stands. The Pistoia Alliance’s AI/ML community11 aims to define best practices for the use of such technologies in life sciences R&D.
Ensuring data is fit for the future
Universal data integrity is crucial to the life sciences industries. We can encourage and maintain integrity by ensuring that data meets FAIR principles and is consistent with universal standards. Data integrity will only become more important as organisations begin to adopt technologies like AI and ML, which have enormous potential to transform areas like drug development, diagnostics and clinical trials, but only if the data on which they are built meets integrity standards.
About the author
Anca Ciobanu is strategic theme lead at not‑for‑profit organisation Pistoia Alliance, which has a diverse global membership including large pharmaceutical companies and leading technology providers. Anca’s focus is on improving the efficiency and effectiveness of R&D by solving the common pain points scientists regularly encounter. These include data governance issues, company reluctance to adopt new technologies and difficulty providing use cases that prove value. Anca leads numerous initiatives to lower the barriers to R&D innovation through pre-competitive collaboration, including the Pistoia Alliance’s Diversity and Inclusion Women in STEM leadership programme.
For further information about the Pistoia Alliance or to get involved in its projects, visit www.PistoiaAlliance.org or email [email protected]
References
- Harvard Business Review. Bad data costs the U.S. $3 trillion per year [Internet]. [cited 2022 April 30]. Available from: https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
- Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016. 533: 452-454. Available from: https://www.nature.com/articles/533452a
- https://www.pistoiaalliance.org/projects/current-projects/methods-hub/
- https://www.pistoiaalliance.org/projects/current-projects/idmp-common-core-ontology/
- Durack J, Lynch SV. The gut microbiome: Relationships with disease and opportunities for therapy. Journal of Experimental Medicine. 2019, January: 2016 (1): 20-40. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6314516/
- McCoubrey EL. Addressing drug–microbiome interactions: the role of healthcare professionals. Pharmaceutical Journal. 2022. Available from: https://pharmaceutical-journal.com/article/research/addressing-emerging…
- News Medical. Microbiome research has surpassed $1.7 billion in the past decade [Internet]. [cited 2022 April 30]. Available from: https://www.news-medical.net/news/20200929/…
- https://www.pistoiaalliance.org/projects/current-projects/microbiome/
- https://www.pistoiaalliance.org/projects/current-projects/semantic-enrichment-of-eln-data/
- Innovation Files. Fact of the Week: Artificial Intelligence Can Save Pharmaceutical Companies Almost $54 Billion in R&D Costs Each Year [Internet]. [cited 2022 April 30]. Available from: https://itif.org/publications/2020/12/07/fact-week-artificial-intelligence…
- https://www.pistoiaalliance.org/community/artificial-intelligence-machine-learning/