Technologies and Solutions for Trend Detection in Public Literature for Biomarker Discovery Bernd Wachmann Siemens Corporate Research,NJ, USA Research for new biomarkers usually begins with a literature review to identify the mechanisms of action and to define a set of biomarkers that can jointly be used as a panel to characterize the type and stage of a disease. However, the manual search for biomarkers is an increasingly difficult task, since the number of publications is steadily increasing in volume and broadening in terms of complexity and diversity. The PubMed database of publications in biomedical science lists more than 6 million articles from the last 10 years. Currently more than 600k publications are added to the knowledge base every year, making a manual search for information a time consuming task. Even for a single disease, like lung cancer, several thousand related publications are published every year (i.e., in 2007, more than 300 per month on average for lung cancer). To address this challenging task, we have developed a system that can identify structural and longitudinal patterns in the biomedical literature data that support the understanding of trends and relationships between diseases and biomarkers over time. We believe that the information of time is important, since it helps in tracking o when a biomarker has been discovered and how important it has become for the understanding of the disease over time, o if a biomarker has been “replaced” or complemented by another, more informative biomarker, o at what time we can see an emerging biomarker that will become relevant for a disease on a broader basis. The solution addresses the above challenges and covers several functionalities that automate a large part of the previously manual search process and can therefore significantly reduce the time of searching for biomarkers and increase the quality of the results. The system uses biomedical entity recognition and dictionaries of medical terms to emphasize the key concepts and their relations in a document but in principle all words and phrases are considered as possible trends. This way even very current biomarkers not recognized by the detection modules can be discovered. A hybrid text clustering and time series analysis engine detects topics evolving over time and ranks trends by their past and predicted future relevance to a certain disease. To make the retrieved information accessible to the user in a convenient way, the system visualizes the trends of publications with respect to diseases and biomarkers. Narrowing the results down to specific elements allows the user to understand the relationships between them both statically and over time.