CPM 2018

29th Annual Symposium on Combinatorial Pattern Matching

Qingdao, China, July 2-4, 2018

Conference

Papers and Proceedings

Local information

Other

Keynote speakers


Ming Li (University of Waterloo, Canada)

Ming Li is a CRC Chair Professor in Bioinformatics, of Computer Science at the University of Waterloo. Together with Paul Vitanyi he pioneered applications for Kolmogorov complexity. He is also a co-founder of RSVP Technologies Inc., an artificial intelligence startup company in Waterloo. His research interests are bioinformatics tools (protein structures, genome mapping, conducting homology searches); protein structure prediction, and automated NMR protein structure determination; stem cell image recognition; deep learning, natural language processing and automated conversation, AI. Ming Li has published over 250 scientific papers and was a recipient of the Outstanding Contribution Award in 2010. He has published 4 books and he also is the Canada Research Chair from 2002 to present.

Title: Challenges from Cancer Immunotherapy

Abstract: There are currently two revolutions happening in the scientific world: deep learning and cancer immunotherapy. The former we have all heard, but I believe it is the latter [1-4] that is more closely related to the CPM/COCOON community and personally to each of us. In principle, cancer immunotherapy is to activate our own defense system to kill cancer cells. When a cell in our bodies (for all vertebrates) becomes sick beyond repair, the MHC complex brings fragments of 8-15 amino acids, or (neo)antigens, from the foreign invader or cancerous proteins, to the surface of the cell inviting the white blood cells to kill that cell. Short peptide immunotherapy uses these short sequences (of 8-15 amino acids) as the vaccine. One key obstacle for this treatment to become a clinical reality is how to identify and validate these somatic mutation loaded neoantigens (peptides of 8-15 amino acids) that are capable of eliciting effective antitumor T-cell responses for each individual. Currently, to treat a patient, we take a biopsy, do exome sequencing, perform somatic mutation analysis and MHC binding prediction. This process is a long, unreliable, and very expensive detour to predicting the neoantigens that are brought to the cancer cell surface [3, 4]. This process potentially can be validated by mass spectrometry (MS) [3-5] or even replaced by MS altogether if MS has sufficient sensitivity to capture the low abundant neoantigens on the cancer cell surface. There is a promising MS technology called Data-Independent Acquisition (DIA) [6, 7] that has unbiased fragmentation of all precursor ions within a certain range of m/z. In this talk we will present our preliminary work [8] on how to find these mutated peptide sequences (de novo sequencing) from the cancer cell surface using deep learning and DIA data. We will discuss major open problems. This is joint work with NH. Tran, R. Qiao, L. Xin, X. Chen, C. Liu, X. Zhang, and B. Shan. This work is partially supported by China’s National Key R&D Program under grants 2018YFB1003202 and 2016YFB1000902, Canada’s NSERC OGP0046506, Canada Research Chair Program, MITACS, and BSI.


Michael Segal (Ben-Gurion University of the Negev, Israel)

Michael Segal is a Professor of Communication Systems Engineering at Ben-Gurion University of the Negev, known for his work in ad-hoc and sensor networks. After completing his undergraduate studies at Ben-Gurion University in 1994, Segal received a Ph.D. in Mathematics and Computer Science from Ben-Gurion University in 2000 under the supervision of Klara Kedem. The topic of his PhD Dissertation was: Covering point sets and accompanying problems. After continuing his studies with David G. Kirkpatrick at University of British Columbia, and Pacific Institute for the Mathematical Studies he joined the faculty at Ben-Gurion University in 2000, where he also served as the head of the Communication Systems Engineering department between 2005-2010. He is known (equally with his coauthors) for being first to analyze the analytical performance of the well-known Least Cluster Change (LCC) algorithm that is widely used in ad hoc networks for re-clustering in order to reduce the number of modifications. He also was one of the first to introduce and analyze the construction of multi-criteria spanners for ad hoc networks. Segal has published over 140 scientific papers and was a recipient of the Toronto Prize for Research in 2010. He is serving as the Editor-in-Chief for the Journal of Computer and System Sciences. Along with his Ben-Gurion University professorship, he also is visiting professor at Cambridge University.

Title: Privacy Aspects in Data Querying

Abstract: Vast amounts of information of all types is collected daily about people by governments, corporations and individuals. The information is collected, for example, when users register to or use online applications, receive health related services, use their mobile phones, utilize search engines, or perform common daily activities. As a result, there is an enormous quantity of privately-owned records that describe individuals finances, interests, activities, and demographics. These records often include sensitive data and may violate the privacy of the users if published.The common approach to safeguarding user information, or data in general, is to limit access to the storage (usually a database) by using and authentication and authorization protocol. This way, only users with legitimate permissions can access the user data. However, even in these cases some of the data is required to stay hidden or accessible only to a specific subset of authorized users. Our talk focuses on possible malicious behavior by users with both partial and full access to queries over data. We look at privacy attacks that meant to gather hidden information and show methods that rely mainly on the underlying data structure, query types and behavior, and data format of the database. The underlying data structure may vary between graphs, trees, lists, queues, and so on. Each of these behaves differently with regards to data storage and querying, allow for different types of attacks, and require different methods of defense. The data stored in databases can be just about anything, and may be a combination of many different data types such as text, discrete numeric values, coordinates, continuous numeric values, timestamps, and others. We will show how to identify the potential weaknesses and attack vectors for each of these combinations of data structures and data types, and offer defenses against them. This is a joint work with Eyal Nussbaum.


Russell Schwartz (Carnegie-Mellon University, USA)

Russell Schwartz is a Professor of Biological Sciences and Computational Biology at Carnegie Mellon University, where he has worked on a variety of problems in computational genetics and biophysics. He completed his B.S. in Computer Science and Engineering and M.S. in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology in 1996, followed by a Ph.D. in Computer Science at MIT in 2000 pursuing work in computational biophysics under the supervision of Prof. Bonnie Berger. Following his doctoral studies, he did postdoctoral work with Prof. Jonathan King of the MIT Department of Biology before joining Celera Genomics, where he worked on some of the first whole-genome genetic variation analyses. Most recently, his laboratory's work has focused on cancer genomics, with contributions including the introduction of single-cell phylogenetics of tumor evolution and the use of deconvolution methods for tumor evolution studies on bulk genomic data. In addition to his research contributions, he has been active in international efforts at bioinformatics education reform. He has published over 100 peer-reviewed scientific papers and the textbook Biological Modeling and Simulation.

Title: Reconstructing tumor evolution and progression in structurally variant cancer cells

Abstract: Cancer is disease governed by the process of evolution, in which a process of accelerated genomic diversification and selection leads to the formation of tumors and a process of generally increasing aggressiveness over time. As a result, computational algorithms for reconstructing evolution have become a crucial tool for making sense of the immense complexity of tumor genomic data and the molecular mechanisms that produce them. While cancers are evolutionary systems, though, they follow very different rules than standard species evolution. A large body of research known as cancer phylogenetics has arisen to develop evolutionary tree reconstructions adapted to the peculiar mechanisms of tumor evolution and the limitations of the data sources available for studying it. Here, we will explore computational challenges in developing phylogenetic methods for reconstructing evolution of tumors by copy number variations (CNVs) and structural variations (SVs). CNVs and SVs are the primary mechanisms by which tumors functionally adapt during their evolution, but require very different models and algorithms than are used in traditional species phylogenetics. We will examine variants of this problem for handling several forms of tumor genomic data, including particular challenges of working with various bulk genomic and single-cell technologies for profiling tumor genetic variation. We will further see how the resulting models can help us develop new insight into how tumors develop and progress and how we can predict their future behavior.