General Information
There are plenty of opportunities for Bioinformatics research projects at UCLA. This program is designed to help interested students find research projects related to Bioinformatics across campus. Typically, these projects are for credit; in exceptional circumstances they may offer funding. Participation in research projects can both significantly improve your chances of admittance into top graduate programs and make you a much more competitive employment candidate. Even better, it gives you something to talk about during an interview. Feel free to contact us even if you do not know exactly whether or not you want to work on a research project or know the field you wish to research in. Please remember that every undergraduate and masters student is welcome to participate in research, regardless of your background or year in the program. Undergraduates are STRONGLY encouraged to participate in research as early as possible in their careers. Ideally, you should start a research project during your sophomore year, but it is never too late or to early to start! Undergraduate students may receive up to 8 units credit toward the minor with enrollment in Computer Science 194/199 or Bioinformatics 194/199.
General Procedure
If you are reasonably sure which project you would like to work on, use the contact information listed under the project to contact the person responsible for the project directly to set up a meeting. If you are not sure, but you are even slightly interested in research, feel free to email us or drop in to help chose an appropriate project. Most students take a project for course credit, although funding may be available in some cases. You can contact Eleazar Eskin (eeskin [at] cs [dot] ucla [dot] edu) if you have any questions.
Research Projects
Below is a list of research projects that are accepting undergraduate researchers.
Machine learning models to understand gene regulation in cellular quiescence |
||
Project Description Cells can reversibly exit from the cell cycle to enter into a non-dividing quiescent state. This process is tightly regulated and misregulation can lead to diseased states such as cancer and chronic wound healing. The factors contributing to gene regulation in quiescence have not been completely described. Leveraging genome-wide sequencing datasets (such as bulk and single-cell RNA-seq and ATAC-seq) to build machine learning models could provide valuable information about the coordinated action of transcription factors that regulate gene expression. We are recruiting students to help us with the development of machine learning models for quiescence gene regulation. The selected students will primarily utilize next-generation sequencing datasets that we have generated as well as publicly available data for model development. Previous experience in Python, R, shell scripting, RNA-seq analysis, deep learning workflows, handling big genomic datasets, and using UCLA Hoffman2 cluster are strongly desired. |
||
Requirements Coursework in programming or bioinformatics, basic statistics, machine learning. |
||
|
Imaging transcriptomics across developmental stages of early psychotic illness |
||
Project Description There is increasing evidence that the progression to psychosis is dynamic and protracted, that working memory dysfunction is a core feature of schizophrenia with typical maturation throughout adolescence, and that working memory depends partly on widely distributed cortical glutamate- and GABA-mediated neural circuitry. However, connecting the molecular underpinnings of disruptions in this circuitry to in vivo human brain development has been elusive. This integrative project utilizes novel neuroimaging methods and publicly available brain wide transcriptomic atlases to investigate neurodevelopmental mechanisms of schizophrenia risk in a prospective longitudinal cohort of youth at clinical high risk for psychosis. Example opportunities include learning how to analyze structural MRI in novel ways using the Hoffman2 Linux compute cluster, analyze gene expression data from large publicly available brain datasets, integrate the two datasets, and compute and use polygenic risk scores to better understand clinical high risk for schizophrenia. You might also have an opportunity to work on 22q11.2 copy number variant studies within Dr. Carrie Bearden’s group. The deletion syndrome is a highly pathogenic genetic disorder that greatly increases the risk for autism spectrum disorder, ADHD and schizophrenia. Mentorship will be provided by Dr. Gil Hoftman in collaboration with Dr. Carrie Bearden. |
||
Requirements One year of programming coursework (such as PIC 10C or CS 32) and one bioinformatics core course (such as CM121, CM122, or CM124). As things evolve with the pandemic, we can also discuss working remotely, on campus and/or off campus. |
||
|
Uncovering Genetic Regulation in Rare Genetic Disease |
||
Project Description Our lab studies rare pediatric syndromes leverages genomic data sets and machine learning to improve diagnosis of these rare disorders. Additional projects explore integration of different data sets to uncover novel mechanisms and drug targets. |
||
Requirements One year of programming coursework (such as PIC 10C or CS 32) and one bioinformatics core course (such as CM121, CM122, or CM124). |
||
|
Genomic studies of psychiatric disorders |
||
Project Description We use genomic data to study the genetic architecture of psychiatric disorders such as schizophrenia and bipolar disorder. Bioinformatic tools are used to decipher clinical features as well as genetic susceptibility, epigenetic features and regulation of gene expression. Student projects are tailored to the interest and skill set of the student. |
||
Requirements One year of programming coursework (such as PIC 10C or CS 32). |
||
|
Computational Ontology of the Multicellular Brain Networks |
||
Project Description Our nervous system is a highly parallel high-performance computing network with trillions of connections called synapses and billions of processing elements called neurons supported by trillions of glial cells. In this project, we seek to tackle this complexity and develop computational ontologies of brain cells. Students will have an opportunity to learn advanced data science approaches, statistical analyses, dynamic modeling, cloud computing and machine learning. They will also gain a deeper understanding of the underlying neurobiology, and how stressors such as aging and diseases alter the brain and its function. We merge these approaches with experimental work and computational students can benefit through direct interactions with experimentalists and learning about experimental tools and approaches. |
||
Requirements One course in programming (such as PIC 10A or CS 31), one year of programming coursework (such as PIC 10C or CS 32), one bioinformatics core course (such as CM121, CM122, or CM124). |
||
|
O-PTM in Cardiovascular Biology and Medicine |
||
Project Description In a cardiac cell, the proteome consists of more than 200,000 proteins. Multiple proteins interact with each other to form a biological pathway. Each pathway performs a function and supports a cellular process. Changing the function of an individual protein may lead to alterations on the function of the entire pathway. Post-translational modification (PTM) is a common mechanism regulating protein structure and function. Oxidative stress is a redox imbalance when the generation and accumulation of reactive oxygen species (ROS) exceed the endogenous antioxidant capacity of living organisms. It is often involved with the progression of cardiovascular diseases (CVD). Oxidative stress sensitive post-translational modifications (O-PTMs) are typical features of proteins in human hearts; these O-PTMs are associated with healthy and/or diseased conditions. Project leaders: Dr. Ding Wang (dingwang [at] g [dot] ucla [dot] edu), Dr. Dominic Ng (dominicng [at] g [dot] ucla [dot] edu), Dr. Howard Choi (cjh9595 [at] g [dot] ucla [dot] edu) Education goals: Oxidative stress biology: get familiar with common reactive oxygen species (ROS), ROS-generating enzymes, and antioxidants. O-PTMs: get familiar with 15 types of O-PTMs, know their AA targets and changes in m/z value. Extract O-PTM signatures of proteins: get components associated with a CV-relevant biological pathway; get their identification, subcellular distribution, and O-PTMs (e.g., modification type, modification site, occupancy). Scientific goals: Identify O-PTM changes unique to health and disease conditions of human hearts. The similarity and differences between human and mouse protein homologues will be compared. These findings may offer opportunities to interpret phenotypic observations in human HF and mouse models under stress. |
||
Requirements No required experience. |
||
|
Bioinformatics Pipelines for Proteomics Data Analyses |
||
Project Description Bioinformatics tools, including the Integrated Proteomics Pipeline (IP2), in-house generated software packages, are employed to characterize properties of individual protein at the proteome-level, in a high-throughput fashion. Publicly available kownledgebases (e.g., Uniprot & Reactome) support proteomics data analyses and enable further data interpretation. Project leader: Dr. Howard Choi (cjh9595 [at] g [dot] ucla [dot] edu) Education goals: Students will be introduced to several bioinformatics tools essential for proteomics data analyses. After the training, they will be able to independently utilize these resources to characterize biological variables of interest (e.g., Proteins, O-PTMs) from raw proteomics datasets. Scientific goals: Understand the fundamental concepts and/or algorithms of these bioinformatics resources. Get comfortable in applying bioinformatics tools to better characterize biological systems. They should develop a data-driven mindset different to the conventional hypothesis-driven approaches that once dominated biomedical investigations. |
||
Requirements No required experience. |
||
|
Mass Spectrometry (MS)-based Proteomics in Cardiovascular Research |
||
Project Description Proteomics is the large-scale study of proteomes within a biological system. Building on advances in mass spectrometry and data sciences, proteomics approaches have offered powerful means in understanding of cardiovascular diseases. Massive mass spectrometry datasets are the intersection between proteomics and data science. In this project, students will learn the proteomics sample processing techniques and gain the knowledge in mass spectrometry for applying downstream data analysis on studying cardiovascular diseases. Project leaders: Dr. Ding Wang (dingwang [at] g [dot] ucla [dot] edu), Dr. Dominic Ng (dominicng [at] g [dot] ucla [dot] edu), Dr. Howard Choi (cjh9595 [at] g [dot] ucla [dot] edu) Education goals: Students will learn the fundamental concepts of mass spectrometry, get familiar with sample preparation protocols and data acquisition workflow for MS-based proteomics, and learn how to extract the MS data for downstream data. Scientific goals: Introduce fundamental concepts of mass spectrometry and proteomics to students. After the training, the students will be able to tell the differences between Top-down and bottom-up approaches, apprehend standard proteomic applications in biomedical research, and know what information can be retrieved from proteomic datasets. |
||
Requirements No required experience. |
||
|
Knowledge Graph construction and analysis to support heart failure classification |
||
Project Description New cases of heart failure, or HF, are diagnosed by the millions each year. Not all hearts fail in the same manner, however: HF cases may be categorized by their percentage of healthy ejection fraction, or EF. An EF below 40% is considered HF with reduced EF (HFrEF) while HF with an EF greater than 50% – while often physiologically normal outside the context of disease – constitutes HF with preserved ejection fraction, or HFpEF. HFpEF is increasingly common and is distinguished from HFrEF by a variety of presentation factors, patient traits, comorbidities, and other factors such as systemic inflammation. How may we organize these varied factors in a consistent manner? If clinical and biomolecular correlates with HFrEF or HFpEF are structured as relationships, may we assemble them into a knowledge graph? What may this knowledge graph allow us to infer regarding HF classification? Project leader: Harry Caufield (jcaufield [at] mednet [dot] ucla [dot] edu) Education goals: An understanding of the technical methods required to integrate heterogeneous biomedical relationships described in text and knowledge bases. Skills to gain familiarity with include: data retrieval through APIs, text data analysis and natural language processing with Python, and data management in Neo4j. The ability to analyze knowledge graphs (and, by extension, other networks of biomedical relationships) to identify relationships supporting conclusions about cardiovascular disease. Students will also gain knowledge of the symptomology of heart disease. Scientific goals: Identify specific patterns of biomedical relationships associated with specific subtypes of heart failure, such that text describing heart failure may be classified without explicit definitions being present (e.g., HFpEF may be described implicitly). |
||
Requirements No required experience. |
||
|
Constructing an Integrated Cardiovascular Knowledge Graph to Discover Disease Phenotype Relationships |
||
Project Description Modern bioinformatics and biomedical informatics projects rely upon well-curated knowledge bases and data repositories. These resources contain structured information describing proteins (e.g., UniProtKB), biomolecular interactions (e.g., IntAct), or genotype-phenotype relationships (e.g., OMIM), among numerous other topics. Similarly, carefully engineered ontologies and coding systems define relationships between diseases (e.g., Disease Ontology; ICD) or broader sets of biomedical concepts (e.g., MeSH). Though each of these resources are data-rich and highly valuable, we rarely need to use any one of them in their entirety – and we would like to use knowledge curated from multiple sources, even when their structures present obstacles to data integration. By exploring the subset of each knowledge base and ontology through the perspective of cardiovascular disease research, we may identify the most relevant elements and unify them within a single graph structure. The resulting knowledge graph supports asking complex questions about cardiovascular phenomena. With some additional engineering, higher-level representations of these knowledge graphs can drive machine learning approaches for understanding cardiovascular disease. Project leader: Harry Caufield (jcaufield [at] mednet [dot] ucla [dot] edu) Education goals: An understanding of the technical methods required to integrate heterogeneous biomedical relationships described in text and knowledge bases. Skills to gain familiarity with include: data retrieval through APIs, text data analysis and natural language processing with Python, and data management in Neo4j. Experience with the data formats and structures used to store biomolecular data and metadata, as well as ontologies (e.g., OBO or OWL formats) and other data (e.g., JSON). Scientific goals: Assemble a consistently-structured knowledge resource optimized for phenomena relevant to cardiovascular disease, including relationships between disease phenotypes, biomolecules, biomolecular pathways, symptoms, and therapeutics. Identify best practices for merging specific knowledge sources. Develop reusable code for obtaining and integrating knowledge base contents. |
||
Requirements No required experience. |
||
|
Mapping Collective Knowledge of the Cardiac Proteome |
||
Project Description By definition, we expect that a proteome lists each protein within a particular tissue or organ. A cardiac proteome, for example, should include identities and amounts of each protein in the heart. This definition becomes clouded once we begin considering specific conditions: how does an unhealthy (e.g., hypertrophic or failing) heart’s proteome differ from that of a healthy one? Does the proteome change over time? How may the proteome vary between hearts from male or female individuals? Our ability to address these questions may be limited by the samples used to define each proteome as well as by inherent experimental variability. We may search across current and past literature to rigorously define and merge differing (and in some cases, conflicting) observations of cardiac protein expression, with the goal of assembling an updated proteome of the human heart. This process requires intensive application of text mining coupled with an understanding of cardiac-specific biological pathways. This project will place particular focus on three types of proteins: contractile proteins, proteins impacted by oxidative stress, and proteins with metabolic functions (especially those involved in branched chain amino acid, or BCAA, metabolism) as these topics are foci of other lab efforts. Assembly of an updated cardiac proteome will produce a crucial reference for classification of a peptide’s relevance to the heart. Project leader: Harry Caufield (jcaufield [at] mednet [dot] ucla [dot] edu) Education goals: An understanding of PubMed and the language used in biomedical research literature. Experience with obtaining text data through an API. Familiarity with computational methods for bibliometrics, text mining, information extraction, and natural language processing. Knowledge of biomolecular pathways in cardiac function. Scientific goals: Construction of a literature-derived cardiac proteome, serving as a comprehensive resource for identification of proteins most relevant to healthy and diseased cardiac phenotypes. |
||
Requirements No required experience. |
||
|
A study of Covid-19 Knowledge Graphs for different Age Groups and CVD Cases |
||
Project Description Covid-19 is caused by a coronavirus called SARS-CoV-2 and often presents with symptoms of high fever, cough and shortness of breath. In severe cases, Covid-19 may lead to acute respiratory distress syndrome (ARDS) and multiple organ dysfunction and eventually to death. It is clear that the severity and mortality of Covid-19 is much higher than any other known coronaviruses. New data from Covid-19 cases have indicated that the severity and mortality of this disease are significantly higher in elderly patients and patients with a history of CVD. Applying a Text Mining approach, the students will explore the role of risk factors such as aging and several cardiovascular diseases (e.g., coronary artery disease) on the severity of Covid-19, and unravel possible underlying mechanisms. Project leaders: David Liem (dliem [at] mednet [dot] ucla [dot] edu), Dibakar Sigdel (sigdeldkr [at] gmail [dot] com) Education goals: Students will learn how to apply innovative tools in text mining and knowledge graphs (e.g., Neo4J and Spark) for data exploration and for the development of search algorithms with specific tasks in biomedical scenarios. Scientific goals: Students will learn how to hypothesize meaningful biomedical questions from available tools and databases in CVD and Covid-19. (e.g., Which age groups and pre-existing CVD significantly increase the risk of mortality in Covid-19, and what are the underlying mechanisms?) The search results can be further explored to investigate the underlying age based mechanism. |
||
Requirements No required experience. |
||
|
A study of Drug to Cardiovascular Disease (CVD) Associations with SemRep and Deep Learning |
||
Project Description Starting with well defined oxidative stress categories (e.g., Initiation, Regulation and Outcome of Oxidative Stress) and a list of drugs in cardiovascular disease (CVD), we will explore SemRep to extract all relevant SPO- triplets. We further build knowledge graphs with these triplets and prepare a muli-order association matrix to represent graph data structure. Using this graph structure, we will build a sequence prediction model for drug to CVD association. This project will provide a detailed analysis of drugs to CVD association with both qualitative evidence and quantitative scores. Project leaders: David Liem (dliem [at] mednet [dot] ucla [dot] edu), Dibakar Sigdel (sigdeldkr [at] gmail [dot] com) Education goals: The students will learn how to work with innovative text mining tools (e.g., SemRep, CaseOLAP, Neo4J) for biomedical documents and machine learning approach (RNN, LSTM) for model development and implementation to answer important biomedical questions. Scientific goals: The students will explore knowledge graphs for drug and CVD associations with a focus on oxidative stress categories (e.g., Initiation, Regulation and Outcome) and underlying molecular mechanism. |
||
Requirements No required experience. |
||
|
Analysis of complex behavior in mice |
||
Project Description Social interactions between individuals and among groups are a hallmark of human society as we know it and are critical to the physical and mental health of a wide variety of species including humans. Our lab studies how animal social behavior is regulated in the brain. This project involves analysis of complex behavior during animal social interaction. Work can be done remotely. |
||
Requirements No required experience, but some basic skills of MATLAB would be great. |
||
|
Genetic architecture of neuropsychiatric traits |
||
Project Description We use genomic data to study the genetic architecture of psychiatric disorders such as schizophrenia and bipolar disorder. Bioinformatic tools are used to decipher clinical features as well as genetic susceptibility, epigenetic features and regulation of gene expression. Student projects are tailored to the interest and skill set of the student. |
||
Requirements One year of programming coursework (such as PIC 10C or CS 32). |
||
|
Structure-function relationship of chromatin architecture in cell quiescence |
||
Project Description We are recruiting a student to work on a project that aims to study the changes in global chromatin accessibility and structure when the proliferating cells enter and exit a non-dividing quiescent state. We have previously observed widespread gene expression changes between proliferating and quiescent cells and one of the goals of the project would be to understand the link between gene expression and chromatin architecture changes during quiescence entry and exit. The student will primarily work on analyzing next-generation sequencing datasets that we have generated as well as publicly available data. Previous experiences in R, Python, shell scripting, RNA-seq analysis, handling big genomic datasets, and using UCLA Hoffman2 cluster are strongly desired. |
||
Requirements Coursework in programming or bioinformatics, basic statistics and linear algebra |
||
|
Comparative epigenetic studies of aging in 150 mammalian species |
||
Project Description Help us find the the secret behind differences in maximum lifespan across mammalian species. Why does a shrew live for less than 2 years while a bowhead whale can live for more than 200 years? Why do rats live for less than 7 years while the naked mole rat can live for more than 30 years? Help us to annotate genomic locations and chromatin states in many species. What can we learn from DNA methylation sites that correlate to maximum lifespan and age in different species. |
||
Requirements One Bioinformatics core course such as CM121, CM122 or CM124 |
||
|
Epigenetic Biomarkers |
||
Project Description Our lab is interested in the development of DNA methylation biomarkers for health and disease. This includes biomarkers for aging from saliva and blood as well as biomarkers for organ specific diseases from plasma. We develop tools to analyze DNA methylation data and develop biomarkers using machine learning. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Development of Statistical and Computational Methods for Single Cell Genomics |
||
Project Description The recent experimental advances in single cell biology have allowed us to learn highly granular biological information of disease at single cell resolution. Despite the availability of a plethora of methods to better tease apart true biological signals from technical noise, there are numerous challenges in addressing complex biological questions. We strive to develop computational, mathematical, and statistical models to better harness single cell genomics to advance our understanding of disease mechanisms using our in-house data sets as well as provide tools to the larger biological community for robust data analysis. Projects include batch correction, extraction of gene regulatory networks, prediction of transcriptional response to perturbations, and others! |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Harmonizing Tissue and Single Cell Multi-omics to Elucidate Regulatory Networks in Disease and Treatment |
||
Project Description More and more evidence suggests that most diseases are the cultivation of complex molecular interactions in the form of regulatory networks within select cell types and between cell types. We use both tissue and single cell level multi-omics to understand the tissue- and cell type specific mechanisms behind diseases as well as potential therapeutic treatments that aim to reverse these disease networks. Our research involves the investigation of a broad range of complex diseases encompassing cardiometabolic diseases (heart disease, diabetes, obesity, fatty liver disease) and brain disorders (Alzheimer’s disease, traumatic brain injury, and neuropsychiatric disorders), to identify the underlying molecular networks within and between diseases. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|
Evolution in the Microbiome |
||
Project Description While the taxonomic composition of the human microbiome has been extensively studied, little is known about how these microbes evolve. In the Garud lab (garud.eeb.ucla.edu), we are studying the evolutionary forces within and between hosts that shape microbiome genetic diversity (recombination, drift, selection) (e.g., see Garud et al. 2019 PLoS Biology). The lab develops statistical and computational methods to gain insight into evolutionary processes from population genomic data.A variety of projects are available and can be tailored to the student’s interest. A few of them include: 1. Quantifying selective sweeps across human host using linkage disequilibrium statistics 2. Estimating the distribution of fitness effects across hosts using site frequency spectrum statistics 3. Quantifying adaptation within a host using spatial metagenomic data collected along the mouse gut. Projects will include a combination of data analysis, simulations, and literature search. The lab is situated in the Ecology and Evolutionary Biology Department and has close interactions, including joint lab meetings, with Dr. Kirk Lohmueller’s lab (there may be options for pursuing a project that is co-advised by Dr. Garud and Dr. Lohmueller). The lab is affiliated with the Microbiome Center at UCLA, the Institute for Quantitative and Computational Biology, and the California NanoSystems Institute at UCLA. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Epigenetic Biomarkers of metabolic health |
||
Project Description Our lab is interested in the development of DNA methylation biomarkers. These biomarkers can be used to study aging and human health. We have developed experimental approaches to carry out targeted bisulfite sequencing to measure the DNA methylation of specific sites of interest. We have also developed computational methods to model the epigenetic state of a sample based on its methylation level. The combination of these techniques can be use to estimate the age and health of individuals from blood or saliva samples. We would also like to develop approaches that combine genetic and epigenetic data to better model the epigenetic changes in an individual. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|
Projects in Cancer Data Science |
||
Project Description We study cancer, trying to understand how it originates and what makes it lethal. We use data arising from DNA & RNA sequencing, mass-spectrometry, clinical records and images. To analyze them, we develop and apply biostatistical and machine-learning approaches. We try to generate clinically-useful tools, while simultaneously discovering new areas of cancer biology.The team is a multi-disciplinary group of computer scientists, software engineers, statisticians, biologists, chemists and clinicians. People come to the team with all levels of programming, of statistics and of cancer biology. We’re used to training people in the areas they don’t know, and have projects suited to all levels of experience. For software-engineering focused students, typical projects will involve creating dev-ops infrastructure (e.g. CICD), optimizing high-performance code, containerizing software for cloud-based deployment or developing web-services. For data-science-focused students, projects will involve optimizing ML-based workflows (e.g. hyper-parameter tuning), applied-ML on high-dimensional datasets, or developing new algorithms for quantifying specific features of cancer. For biology-focused students, projects will involve pre-processing and analyzing high-throughput experimental data, and linking it to fundamental aspects of cancer biology like hypoxia or cell proliferation. Recent publications from undergrad or medical students in our team: |
||
Requirements Projects available at all levels |
||
|
Discovery of cell-type specific expression signals conritbuting to cardiometabolic disorders in humans |
||
Project Description Cardiometabolic disorders, such as type 2 diabetes and non-alcoholic fatty liver disease, are major causes of morbidity and mortality world-wide. We are developing and applying integrative genomics approaches utilizing genome-wide variant and single cell RNA-sequencing data from metabolic tissues to decompose cell-type proportions and cell-type specific expression of genes and their connections to cardiometabolic traits. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Decoding Neural Signals for Brain-Computer Interface Communication |
||
Project Description Patients with neuromuscular disorders such as ALS lose the ability to communicate. The goal of this project is to restore this ability by translating neural signals recorded by EEG into computer commands. Several projects are ongoing, involving programming (C++, MATLAB, and Python), machine learning, natural language processing, and experimental design. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Analysis of Whole Exome Sequencing Data of Patients with Undiagnosed Neurological Disorders |
||
Project Description Our lab is interested in improving genomic testing methods to improve the diagnosis of rare neurogenetic conditions in patients presenting with neurodegenerative diseases, specifically cerebellar ataxia. Among our projects is the development of a large searchable data repository where we can re-evaluate previously performed exome sequencing with the latest analysis and annotation pipelines periodically to identify rare diseases as well as create large datasets to evaluate risk alleles and genetic modifiers in this patient population. |
||
Requirements One Bioinformatics core course such as CM121, CM122 or CM124 |
||
|
Using Machine Learning to Integrate RNA-Seq and Lipidomics Datasets to Discover Novel Gene Regulation |
||
Project Description We have gathered a wealth of RNA-seq and lipidomics data from the livers of mice exhibiting early-stage phenotypes of non-alcoholic fatty liver disease (NAFLD). The aim of the project is to identify a novel set of transcriptomic and lipidomic biomarkers in NAFLD regulation. We are looking for motivated individuals who are interested in analyzing RNA-seq data and applying machine learning approaches towards deciphering biological processes. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|
Methods for Analyzing the Non-Coding Human Genome |
||
Project Description We are interested in developing computational methods to better annotate and understand the non-coding human genome, and more specifically applying methods to analyze rare non-coding variation from whole genome sequencing data studying psychiatric disorders and other traits. Potential projects could involve integrating large-scale epigenomic data, comparative genomic data, and/or high-throughput functional testing data with whole genome sequencing data. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32, plus one bioinformatics core |
||
|
Identifying loci for regulation of RNA splicing in mice |
||
Project Description We have obtained deep RNA sequencing data from a panel of inbred mouse strains. The genome of these strains is well characterized, allowing fine mapping of loci involved in regulating gene expression. The project is to identify loci involved in splice site selection. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|
Investigating Differential Isoform Expression with Cell Cycle Exit |
||
Project Description We generated next generation sequencing datasets that provide information on the expression of different isoforms of genes in cells that are cycling and cells that have exited the proliferative cell cycle. We are recruiting a student to assist with the analysis of these datasets and determining the biological importance of changes in isoform expression. The following skills would be help the student to be most successful in the project: familiarity with programming in R, basic statistics, RNA-seq analysis, motif searching. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Human microbiome data analysis |
||
Project Description 16S and metagenomic data analysis of the human microbiome |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Genomic studies of psychiatric disorders |
||
Project Description We use genomic data to study the genetic architecture of psychiatric disorders such as schizophrenia and bipolar disorder. Bioinformatic tools are used to decipher clinical features as well as genetic susceptibility, epigenetic features and regulation of gene expression. Student projects are tailored to the interest and skill set of the student. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
The evolutionary dynamics of cephalopods |
||
Project Description Living cephalopods (octopuses, squid, and nautiluses) comprise over 700 species but their evolution is thought to reflect a series of “arms races” with other marine predators including sharks, marine reptiles, and ancient and modern fishes that has led to the waxing and waning of species richness through time. I am seeing an undergraduate student with some programming experience to compile occurrence data from fossil databases and conduct comparative evolutionary analyses that will measure changing rates of speciation and extinction and test arms race hypotheses. |
||
Requirements One year of programming coursework such as PIC 10C or CS 32 |
||
|
Building and analyzing the fish tree of life |
||
Project Description We are currently assembling the largest phylogenetic tree of vertebrates based upon published gene sequences and seek one or more students to assist with scripting and analysis. This project involves creating multi gene alignments from genetic databases, reconciling Genbank taxonomy with published classifications, phylogenetic reconstruction, and macroevolutionary analyses. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|
Crowdsourcing of phenotypic data |
||
Project Description We are developing software tools through Amazon mechanical turk to enable crowdsourced collection of shape data on a massive scale. This project will involve development of software protocols for data collection and analysis of geometric morphometric data. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|
Application of integrative omics analysis pipelines for cancer systems biology and immunity studies |
||
Project Description Recent advances in cancer biology have shown massive changes in the transcriptome, proteome and metabolome of tumor specimen in response to drug treatment and acquired resistance. Our lab studies the complexity of mis-wired cancer cells, and the elegance of systems programs enacted by immune cells to accomplish their specialized anti-tumor functions. We aim to understand the governing principles that result in global changes during tumorigenesis and therapy resistance acquisition; with the end goal to identify new therapeutic vulnerabilities in the evolving cancers. To this end, we are conducting multi-omics experimentation for systems biology analysis. This includes NGS sequencing approaches for transcriptomics, DNA mutation profiling, DNA copy number alteration (CNA) profiling, DNA methylation, chromatin accessibility (ATAC-seq), as well as in lab metabolomics and proteomics analyses of cancer cell lines and tumors using top-of-the-line mass spectrometry equipment. This project will develop custom bioinformatic analysis pipelines to address clinic-linked cancer biology questions. The project includes creation of bioinformatic algorithms and pipelines for analyzing multi-omic data, and collaboration with biologists in the analysis and interpretation of data. |
||
Requirements One course in programming such as PIC 10A or CS 31 |
||
|