Information automation in R&D laboratories; past and future
Posted: 2 August 2008 | Dr. Charlie Sodano, Founder, eOrganizedWorld | No comments yet
What were the drivers that helped launch the laboratory information management system (LIMS) and electronic laboratory notebook (ELN)? This article will trace the history of LIMS and ELN from their emergence into the future. Technology development did play a big role to be sure, but the desire of scientists to minimise time performing and analysing experiments provided the biggest push. By looking at the past, one can quite often get a better perspective of what the future will bring. There will be many issues to consider, primarily how to manage and preserve the vast amount of scientific electronic information that is being generated daily.
What were the drivers that helped launch the laboratory information management system (LIMS) and electronic laboratory notebook (ELN)? This article will trace the history of LIMS and ELN from their emergence into the future. Technology development did play a big role to be sure, but the desire of scientists to minimise time performing and analysing experiments provided the biggest push. By looking at the past, one can quite often get a better perspective of what the future will bring. There will be many issues to consider, primarily how to manage and preserve the vast amount of scientific electronic information that is being generated daily.
What were the drivers that helped launch the laboratory information management system (LIMS) and electronic laboratory notebook (ELN)? This article will trace the history of LIMS and ELN from their emergence into the future. Technology development did play a big role to be sure, but the desire of scientists to minimise time performing and analysing experiments provided the biggest push. By looking at the past, one can quite often get a better perspective of what the future will bring. There will be many issues to consider, primarily how to manage and preserve the vast amount of scientific electronic information that is being generated daily.
Beginnings
Back in the 60’s before the advent of lab computer devices, a single spectral or chromatographic analysis consumed a good portion of a day. For spectral analysis it was necessary to manually vary the wavelength of light impinging on a sample, and record the strength of absorption at each wavelength. Gas chromatography output was a tracing on a sheet of paper. Measurements such as retention time and quantitation were computed with a pencil, ruler, scissors and balance. The efficiency of these operations has improved by more than a factor of ten thanks to modern computer software and hardware. A driving force for this revolution was the desire to use a scientists time more efficiently, and have them focus on science, not mindless manual procedures.
Thanks to IBM and others, many companies in the early 1960’s had already undergone a revolutionary change in their business practices. Many traditional paper based transactions, primarily finance, were beginning to be handled by mainframe computers. This not only speeded up many processes, but made them much more accurate and precise. The resulting change in business culture was enormous. Hordes of ‘paper-pushers’ were eliminated and replaced to some extent by the formidable Information Services department.
Before the introduction of the microprocessor in the early 1970’s, computers were generally large, costly systems owned by large institutions. It was rarely possible to allow a single user exclusive access to the machine. Time-sharing was possible because computers in interactive use often spend much of their time waiting for user input. Some of the first computers that might be called personal were early minicomputers such as the PDP-8, and later on VAX and larger minicomputers from digital equipment corporation (DEC), and others. They originated as peripheral processors for mainframe computers, taking on some routine tasks and freeing the processor for computation.
Scientists had access to IBM mainframe systems and those developed by DEC with an eye toward scientific computing via a dumb terminal. A dumb terminal is an electronic or electromechanical hardware device that is used for entering data into, and displaying data from, a computing system. All interactions were performed with ‘command line’ instructions. The graphical user interface (GUI) was still off in the future.
The first LIMS systems were composed on these mainframes in the 70’s. Work flow automation was achieved via automating electronic form analysis requests through the viewing of results. Searches and statistics could be more readily performed compared to the paper systems it replaced. The first electronic notebook was born, consisting of a simple template that contained analytical results and the authors comments and interpretations.
With the rise of microcomputing in the early 1980’s, time-sharing faded into the background because the individual microprocessors were sufficiently inexpensive that a single person could have all the CPU time dedicated solely to their needs, even when idle.
Interestingly enough, the Internet has brought the general concept of time-sharing back into popularity. Expensive server farms costing millions can host thousands of users all sharing the same common resources. As with the early serial terminals, websites operate primarily in bursts of activity followed by periods of idle time.
The first Apple™ computer was introduced in April of 1976 at the Homebrew Computer Club in Palo Alto and it wasn’t until 1983 that the LISA Office machine with a GUI and mouse was introduced. The scientific community became enamored with the Apple™ and began to explore applications within their laboratories. Apple™ is still prevalent in academic laboratories and small biotechs. In August 12, 1981 the IBM-PC was released and Microsoft™ launched its first Windows program shortly afterward in 1983. The reputation of IBM for business computing, and the large number of compatible computers and third-party plug-compatible peripherals, allowed the PC and compatibles to make substantial sales in business and scientific applications.
RS-232 (Recommended Standard 232) is a standard for serial binary data signals connecting between a DTE (Data Terminal Equipment) and a DCE (Data Circuit-terminating Equipment). Since application to devices such as computers, printers, test instruments, and so on were not considered by the standard, designers implementing an RS-232 compatible interface on their equipment often interpreted the requirements idiosyncratically. Common problems were non-standard pin assignment of circuits on connectors, and incorrect or missing control signals. The lack of adherence to the standards produced a thriving industry of breakout boxes, patch boxes, test equipment, books, and other aids for the connection of disparate equipment. Later personal computers and other devices started to make use of the standard so that they could connect to existing equipment. For many years, an RS-232-compatible port was a standard feature for serial communications, such as modem connections, on many computers. It remained in widespread use into the late 1990’s and continues to be seen on many analytical instruments.
Scientific instruments gradually built in capabilities to electronically capture information and subsequently crunch it into displays and reports. The innovations that really blew out the doors were local area network wiring and networking protocols. Initially all data processing was done on the instrument processor or linked server. Typically, these devices were only capable of handing one or two jobs at one time. Processing speeds were low and access to results was slow. Despite these constraints, scientists had a lot more time to spend planning and executing their next experiments.
The first commercial LIMs system appeared in the 80’s. These were hardware/software combinations that facilitated the transfer of information from scientific instruments via RS232 port connections, where they did typical calculations, displayed and organised sample results in a flexible format. Many early LIMS systems were developed by the instrument manufacturers to help promote sales.
Techniques for analysing samples in an automated manner began to appear in the early 60’s. The Technicon AutoAnalyzer™ used a special flow technique named continuous flow analysis. The first applications were for clinical analysis, but methods for industrial analysis soon followed. The AutoAnalyzer profoundly changed the character of the chemical testing laboratory by allowing significant increases in the numbers of samples that could be processed. The novel design based on separating a continuously flowing stream with air bubbles all but eliminated slow, clumsy, and error prone manual methods of analysis. This instrument single handedly changed the concept of days per sample to a mindset that hundreds, or even thousands, of tests are possible per day.
In a similar manner, autosamplers hooked up to chromatography instruments (gas and liquid) made it possible to run pre-programmed multiple analyses without human intervention.
Lab robotics via articulated arms and hands was introduced in 1981 by Zymark followed quickly by PerkinElmer and others. These robotic chemists worked tirelessly for days and were the forerunners of the automated chemistry systems used today by many pharmaceutical companies for high throughput screening and the synthesis of families of drug candidate compounds.
There have been many refinements in lab automation systems since these early days. Most LIMS vendors today have an open architecture that permits the facile capture and integration of information from almost any instrument or robotic system. Reports are readily exported into other information management systems to meet compliance or intellectual property requirements.
Electronic laboratory notebooks
For centuries scientists have utilised a hand written diary to record their thoughts and experiments. This practice has continued up to the present despite the fact that the majority of information today relating to a scientific experiment is generated in a digital format. One of the reasons why paper notebooks have persisted is that many scientists consider them to be an intimate personal record not to be shared with others. This situation has developed because of indoctrination from the prior generation of pre-computer scientists. They teach that your lab notebook is the primary protection to fend off rivals from usurping your discoveries and inventions. Notebooks support claims associated with a patentable invention discovery or promote preferable author ranking on a scientific publication. Students are taught that notebook information should only be shared with the most intimate members of a research team, if at all.
Drug discovery research is a very expensive process. It has become increasing complex because of the emergence of new scientific disciplines and methodologies. Experts in these disciplines and methodologies must fully cooperate in order to complete a successful project. The team approach has slowly diminished the veil of secrecy surrounding a personal lab notebook. Beyond the psychological barriers, paper notebooks have many disadvantages. Some key items are:
- Output from instruments must be printed and pasted into a notebook
- Table of contents is usually inadequate for searching
- Many authors have poor handwriting skills resulting in illegible entries
- Paper notebooks can be damaged in a laboratory environment, resulting in the permanent loss of information
- Pertinent electronic information is often not managed
The first ELNs emerged in the mid 1990’s. These proprietary in-house programs were (and still are) utilised by hundreds of scientists in companies like Kodak, Berlex (now Bayer), Merck KGaA and Boehringer Ingelheim. In 1995, The Collaborative Electronic Notebook Systems Association CENSA (www.censa.org) was formed. The goal of the Consortium was to influence the creation and design of software and standards for R&D and testing lab applications and specifically, electronic lab notebooks. This organisation of primarily major pharmaceutical companies had an influence on commercial products that began to appear around 2000.
The impetus for ELN adoption was and continues to be to gain more efficiency in data entry and analysis. However, the capabilities to improve information sharing and intellectual property protection are also strong factors. Most scientists today produce virtually no handwritten documents, except for lab notebooks. Paper lab notebook pages consist partially or completely of glued in items that have been printed and resized. It has been estimated that a scientist typically spends about half of their time on data and information tasks. Selection and implementation of an appropriate ELN product will reduce this time substantially.
Most of the early commercial ELNs, such as CambridgeSoft were geared to chemists needs. The ability to easily map reactions, enter and search chemical structures and make routine calculations made the software very attractive. Many of the early software vendors were start-ups funded by venture capital and did not survive for long. As the market grew, there was a period of acquisition and consolidation. Niche players have since appeared who specialise in either basic packages or packages that are geared to specific scientific disciplines. The vendor market today consists of more than 30 offerings. Software licenses are expensive. Implementation costs for a medium sized company can run well over a million dollars. These high initial and continuing costs are driving a portion of the market to more basic solutions. There are several enterprises today that are authoring in Microsoft Word and/or Excel combined with document sharing or management software like Microsoft SharePoint and EMC Documentum.
ELN systems can be thought to be composed of 3 components. The authoring module captures experimental information using whatever tool is most appropriate for the technical discipline. It could be Microsoft Word, a report from an analytical instrument or an output from chemical drawing software. The document management module provides version control, review and approvals and security controls. Completed experiments are usually made non-editable. The third module is concerned with life cycle management of the records. This module has not been rigorously addressed by most vendors.
Preservation
The most commonly accepted retention time for laboratory notebook experiments is 50 years. This was never an issue for paper notebooks. However, it is wise to audit paper notebooks that have been kept in storage for more than 20 years. The quality of printer paper and printer inks in the 1980’s was often poor. The papers were very acidic and tended to yellow, become brittle with age and degrade printer inks. As a result, many notebook pages that contain pasted-in output from instruments or specialised software have become illegible. The quality of printer paper has improved greatly and degradation is not as big of a problem that it used to be. However, paper should be stored in an environment that provides temperature and humidity control.
Research models have been developed that quantify the effects of temperature and relative humidity on paper deterioration. The models show that storage at 72° F (22.2°C) and 50% RH would have an approximate lifetime of 33 years, but if the temperature were lowered to 62°F (16.7°C) and the humidity to 40%, such materials would have a lifetime of 88 years. The model also shows that if materials are subjected to high temperatures and humidity’s (such as 82°F (27.7°C) and 75% RH) noticeable deterioration would occur in nine years or less. Very low humidity’s will dry out paper and make it brittle. It has been projected that microfilm can be preserved for as long as 500 years at temperatures around 68°F (20°C).
Electronic records have not been around long enough to make educated guesses about how long they will survive. We do know that computer hardware, software and media are on an ever accelerating pathway of change and replacement. No one can place confidence in future predictions past 10 years. Even if we assume that the information on a DVD will survive for 30 years, most likely there will be no hardware or software available to view this information. All hardware and media degrade with age.
There have been numerous accelerated shelf life studies performed on storage media by standards organisations such as National Institute of Standards and Technology (NIST). All studies show accelerated degradation of optical and magnetic media occurs with increasing temperature and humidity. The disk bearing in a hard drive wears out over continued usage. Most experts agree that tape is the best choice in storage media for long term preservation. However, even tapes must be refreshed every 2-5 years to counter aging and changes in hardware and software.
Finally there is the matter of software format. For years the format of choice was the TIFF image. This is gradually being replaced by pdf. There is a pdf archive standard that was developed and adopted by ISO. This is slowly being implemented but there is some resistance due to the limited functionality of this format.
If records must be kept in electronic format for most than 10 years, the best strategy is to plan for continuous migration. Metadata must also be captured, preserved and migrated along with the record content.
The line between LIMS and ELNS is blurring. LIMS vendors are offering ELN features and visa versa. Very large vendors will eventually acquire most of the independent survivors and integrate these features into their offerings. However, there will always be a small market segment of specialists focusing on the needs of a particular scientific discipline.
The future
What’s next? As mentioned previously technology advances are impossible to predict past 10 years. However, here’s my image of the LIMS/ELN of the future. The inter-relationship between print, audio and video will continue to improve. The lab experiments of the future will probably be largely orally dictated, and to instrument data via hyperlink. All information upon creation will be in some sort of tagged format (like xml) to facilitate interchange between various devices and formats. The encryption algorithms we use now to bind author to record will be replaced by some type of registered bio signature. Original records will be housed in a globally accessible (government?) warehouse. Individuals and enterprises will lease portions of this space. Information that is beneficial to humanity (patents) and relevant personal history (medical records) will be housed in a separate repository. People will continue to read and write on paper for a long time, until there is some revolution in technology that will produce a suitable substitute.
References
IBM history – http://en.wikipedia.org/wiki/Ibm#History
Digital Equipment Company history – http://en.wikipedia.org/wiki/Digital_Equipment_Company
Technicon history – http://www.case.edu/artsci/dittrick/site2/museum/artifacts/group-d/auto-a.htm)
Zymark history – http://pubs.acs.org/subscribe/journals/mdd/v07/i06/html/604feature_cleaves.html
Australian government e-records program – http://www.prov.vic.gov.au/vers/standard/
PDF archive format – http://www.digitalpreservation.gov/formats/fdd/fdd000125.shtml
Stability testing optical media – http://www.uni-muenster.de/Forum-Bestandserhaltung/downloads/iraci.pdf
Data storage technology assessment – http://www.imation.com/government/nml/pdfs/AP_NMLdoc_DSTAssessment.pdf
Predicting the Life Expectancy of Modern Tape and Optical Media – http://digitalarchive.oclc.org/da/ViewObjectMain.jsp;jsessionid=84ae0c5f8240144d303351b54a4e8beeaf808354f7a3?fileid=0000070519:000006289185&reqid=766