bioinformatics minor courses

General Information
There are plenty of opportunities for Bioinformatics research projects at UCLA. This program is designed to help interested students find research projects related to Bioinformatics across campus. Typically, these projects are for credit; in exceptional circumstances they may offer funding. Participation in research projects can both significantly improve your chances of admittance into top graduate programs and make you a much more competitive employment candidate. Even better, it gives you something to talk about during an interview. Feel free to contact us even if you do not know exactly whether or not you want to work on a research project or know the field you wish to research in. Please remember that every undergraduate and masters student is welcome to participate in research, regardless of your background or year in the program. Undergraduates are STRONGLY encouraged to participate in research as early as possible in their careers. Ideally, you should start a research project during your sophomore year, but it is never too late or to early to start! Undergraduate students may receive up to 8 units credit toward the minor with enrollment in Computer Science 194/199 or Bioinformatics 194/199.

General Procedure
If you are reasonably sure which project you would like to work on, use the contact information listed under the project to contact the person responsible for the project directly to set up a meeting. If you are not sure, but you are even slightly interested in research, feel free to email us or drop in to help chose an appropriate project. Most students take a project for course credit, although funding may be available in some cases. You can contact Eleazar Eskin (eeskin [at] cs [dot] ucla [dot] edu) if you have any questions.

Research Projects
Below is a list of research projects that are accepting undergraduate researchers.

Developing methods to study the composition of microbial communities

Project Description
Next-generation sequencing allows us to directly study the genetic material of microbial communities recovered directly from environmental samples. This process is known as metagenomics. We are interested in developing computational methods to study the composition of microbial communities, including bacteria, viruses, and eukaryotic pathogens. Traditional approaches rely on the microbial marker genes, which are only portions of the genome. We plan to use coverage of entire microbial genomes to determine presence-absence and relative abundance of specific taxa in a given community. In particular, we will focus on improving the accuracy and speed of the novel method. The method will be applied to data from the Human Microbiome Project (HMP) and multi-tissue data from Genotype-Tissue Expression (GTEx).
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Serghei Mangul
smangul [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of whole-genome sequencing data of neuropsychiatric disorders

Project Description
We are interested in identifying the genetic basis of several neuropsychiatric disorders such as bipolar disorder, Tourette Syndrome, and schizophrenia. We are currently analyzing several whole-genome sequencing (WGS) data for those disorders and also developing new computational and statistical approaches for the large-scale WGS data. This project involves applying existing software tools to the large-scale WGS data using the high performance cluster at UCLA and also improving the existing tools.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Jae Hoon Sul
jaehoonsul [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Genomic approaches to cell cycle

Project Description
We are studying the difference between cells that are dividing and cells that have stopped dividing. We are interested in working with an undergraduate student with strong computer skills and an interest in bioinformatics to understand the changes in isoform expression upon cell cycle exit. The student will assist us with analysis of next generation sequencing data, identification of splicing variants and changes in isoform use, and text based analysis of sequence motifs surrounding differentially present 5’ UTRs, exons and 3’ UTRs.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Hilary Coller
hcoller [at] ucla [dot] edu
Possibility of Funding?
Yes

Genomic Analysis using Statistical Methods

Project Description
The research in the“Junction of Statistics and Biology” group (http://www.stat.ucla.edu/~jingyi.li/) focuses on developing statistical methods to address important biological and biomedical questions from high-throughput genetic and genomic data. This project aims to quantify full-length mRNA transcripts across various tissues and cells of different species. Required Experience:
complete some courses in bioinformatics, genomics and programming

· basic knowledge in RNA-seq data analysis

· programming skills : Linux, Bash, R, Python, Perl

· working more than 10-15 hours per week

Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Jingyi (Jessica) Li
jli [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Machine Learning and Text-Mining Applications for Indexing Biomedical Research Software

Project Description
Advances in bioinformatics, especially in the field of genomics, have been greatly accelerated by the progress in more powerful computational systems that enable larger and larger amounts of data to be quickly processed. This has resulted in a rapid increase in the number of software tools, databases, and knowledge bases for biology publicly available. Unfortunately, the lack of systems for assisting users to search and find those most suited for their needs is becoming a significant obstacle. Our lab aims to develop a computational platform (https://aztec.bio) that will aggregate, index, and integrate all biomedical research software. We are developing methods for classifying biomedical software and extracting relevant metadata from scientific publications.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Machine Learning in Textual Data of Cardiovascular Disease

Project Description
Currently, over 2.2 million cardiovascular-related scientific articles are available online, but are largely unstructured, making it a formidable challenge to identify datasets and to comprehend information. We aim to address this big data challenge by developing text-mining and machine learning methods to discover new insights from clinical data and scientific literature.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Regulation of Gene Expression during Development and Reprogramming

Project Description
Placental female mammals such as humans and mice have two X chromosomes (XX), but one of them becomes transcriptionally inactivated early in development. Recent evidence largely based on single-cell pre-implantation human embryo studies suggests existence of key differences in early development of human versus mouse. Prior to the onset of X-inactivation, both human X chromosomes are active, but not to the full extent. This newly observed human-specific X-to-autosome chromosome dosage compensation is termed X chromosome dampening, but the extent of it or its mechanism remains to be explored. This project will analyze single-cell RNA-sequencing data of human pre-implantation blastocysts (published datasets) and naive human embryonic stem cells (hESCs, from our laboratory) to understand exactly which genes of the X-chromosome are affected by dampening, whether both X chromosomes are dampened to the same extent, and reveal the role of long-noncoding RNAs in these processes.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Kathrin Plath
kplath [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of transcriptome complexity using deep RNA sequencing

Project Description
Deep RNA sequencing has emerged as a powerful technology for transcriptome analysis. By generating massive amount of short sequence reads from a given RNA sample, one can use RNA-Seq to define and quantify patterns of gene expression and RNA processing on a genomic scale. This project will develop methods for analysis of transcriptome complexity (gene expression, RNA processing, non-coding RNA) using RNA-Seq data, and apply these methods to study transcriptome regulation and mRNA isoform expression in development and disease.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Yi Xing
yxing [at] ucla [dot] edu
Possibility of Funding?
Yes

Identifying loci for regulation of RNA splicing in mice

Project Description
We have obtained deep RNA sequencing data from a panel of inbred mouse strains. The genome of these strains is well characterized, allowing fine mapping of loci involved in regulating gene expression. The project is to identify loci involved in splice site selection.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Des Smith
DSmith [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Fine mapping genes for drug action

Project Description
Although a staggering array of drugs is available for disorders such as cancer and autoimmunity, much remains unknown about their genetic targets. The project repurposes the technology of radiation hybrid (RH) panels to identify genes for drug action. RH cells contain extra copies of randomly selected genes and offer the opportunity to pinpoint functional drug/gene interactions with high precision. The project involves analyzing the data from these RH mapping experiments to identify gene targets for drugs of medical relevance.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Des Smith
DSmith [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Development of a metabolomics and proteomics pipeline for high-throughput analysis of cancer biology and immunity samples

Project Description
Recent advances in cancer biology have shown massive changes in the transcriptome, proteome and metabolome of tumor specimen in response to drug treatment. Our lab aims to understand the governing principles that cause these global changes. To this end, we are conducting metabolomics and proteomics analyses of cancer cell lines and xenograft tumors using top-of-the-line mass spectrometry equipment.

This project will develop the bioinformatic tools necessary to establish a high-throughput pipeline for the mass spectrometry-based analysis of biological samples. The project includes creation of bioinformatic algorithms and pipelines for analyzing multi-omic data, and collaboration with biologists in the analysis and interpretation of the data.

Requirements
One course in programming such as PIC 10A or CS 31
Contact
Thomas Graeber
tgraeber [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

The evolutionary dynamics of cephalopods

Project Description
Living cephalopods (octopuses, squid, and nautiluses) comprise over 700 species but their evolution is thought to reflect a series of “arms races” with other marine predators including sharks, marine reptiles, and ancient and modern fishes that has led to the waxing and waning of species richness through time. I am seeing an undergraduate student with some programming experience to compile occurrence data from fossil databases and conduct comparative evolutionary analyses that will measure changing rates of speciation and extinction and test arms race hypotheses.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Building and analyzing the fish tree of life

Project Description
We are currently assembling the largest phylogenetic tree of vertebrates based upon published gene sequences and seek one or more students to assist with scripting and analysis. This project involves creating multi gene alignments from genetic databases, reconciling Genbank taxonomy with published classifications, phylogenetic reconstruction, and macroevolutionary analyses.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Crowdsourcing of phenotypic data

Project Description
We are developing software tools through Amazon mechanical turk to enable crowdsourced collection of shape data on a massive scale. This project will involve development of software protocols for data collection and analysis of geometric morphometric data.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Database construction for skin metagenomic data

Project Description
The human microbiome plays important roles in human physiology and has become a new exciting research field in recent years. Our group studies the human skin microbiome and oral microbiome and their associations with diseases. This project will develop a database for the metagenomic data and genome data that we obtained to study the disease associations.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Huiying Li
huiying [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Internet Services for Collaborative Data Sharing (ISCDS)

Project Description
The ISCDS project aims to develop a system enabling the examination of research grant proposals data (specifically reviewer’s and study section’s (panel’s) tendency to deliver a constructive (positive, negative or neutral) evaluation of grant applications). The project will collect and mine all publicly available information and construct a set of metrics characterizing research grants based on applicant/application-topic. Specifically, ISCDS will provide a webservice enabling: Community Entry of Data, Community Export of Data, Exploratory Data Analysis/Graphics, Model Estimation/Model Fitting, and Outcome Prediction.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Graphical Pipeline Workflow Environment for Visual Informatics and Genomics

Project Description
“Informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed protocol provenance.

This project will extend the Pipeline Environment (http://pipeline.loni.ucla.edu), a client-server distributed computational infrastructure, to enable the visual graphical construction, execution, monitoring, validation and dissemination of advanced informatics and genomics data analysis protocols. Examples of diverse genomics tools and the interoperability of informatics tools include EMBOSS, mrFAST, GWASS, PLINK, MAQ, SAMtools, Bowtie, CNVer, etc.”

Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Statistics Online Computaitonal Resource (SOCR)

Project Description
“SOCR R&D efforts revolve around developing HTML5/JavaScript routines, software adn interfaces for data science, predictive analytics, statistical computing and visualization. Review the following materials:
SOCR Resource:  www.SOCR.ucla.edu
SOCR source archive: https://github.com/SOCR  
SOCR projects http://wiki.stat.ucla.edu/socr/index.php/Available_SOCR_Development_Projects

Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Large-Scale Epigenomic Data Sets

Project Description
Advances in sequencing technology has enabled unprecedented ability to experimentally map genome-wide epigenetic features such as histone modifications, DNA methylation, and regions of open chromatin in a large number of cell types and conditions. Potential projects include analysis and/or method development in the context of leveraging large-scale epigenomic datasets to address problems related to stem cell reprogramming, splicing, cancer progression, and neuropsychiatric diseases.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Jason Ernst
jason [dot] ernst [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Single Nucleotide Variants in RNA-Seq Data

Project Description
Our lab handles a large amount of high-throughput sequencing data of different types from the ENCODE and other large projects. With these data, we routinely analyze gene expression, alternative splicing, expressed polymorphisms, RNA editing and protein-RNA interaction. This project will develop methods for analyzing RNA-Seq data and measuring expression of single nucleotide variants.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Xinshu (Grace) Xiao
gxxiao [at] ucla [dot] edu
Possibility of Funding?
Yes

Integrative OMICs methods in neurodegenerative dementia

Project Description
The long-term goal of our group is to advance our understanding of the genetic architecture of neuropsychiatric disorders. We have collected genetic, genomic (gene expression, methylation) imaging and phenotypic data in a large series of patients with neurodegenerative conditions, including Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD). In collaboration with Dr. Horvath, biostatistician at UCLA, and Dr. Paul Thompson (Department of Neurology) we are developing network-based methods to integrate multiple layers of information, including genetic, genomic, epigenetic, and neuroimaging data.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Giovanni Coppola
gcoppola [at] ucla [dot] edu
Possibility of Funding?
Yes

A multidimensional, web-based database for OMICs data

Project Description
We have created and developed three web-based databases to host high-throughput data, which are becoming major tools for collaboration and data mining within our lab and among our collaborators. We are now working to compile them in a single suite of tools for web0based mining of OMICs data.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Giovanni Coppola
gcoppola [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Variant-specific Gene and Isoform Expression using RNA-seq Data

Project Description
Current massive parallel sequencing technologies allow us to investigate human transcriptome for changes conferring the susceptibility to obesity and high serum cholesterol and triglyceride levels. We hypothesize that there are DNA sequence variants influencing allele-specific expression of near-by genes that in turn increases the risk of obesity, dyslipidemia, and cardiovascular disease. This project will develop and employ approaches elucidating these expression changes using human adipose RNA-seq data.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Paivi Pajukanta
ppajukanta [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Effect of DNA methylation on genomic stability

Project Description
Dysregulated DNA methylation has been associated with many diseases including cancers, but for reasons that are not well understood. Our lab has developed a model system to assess genomic stability based on different levels of DNA methylation. This project will examine a variety of model systems ranging in yeast, embryonic stem cells, and cancer cells to precisely quantify how DNA methylation contributes to genome stability.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Guoping Fan
gfan [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Cell type deconvolution of clinical samples

Project Description
Our lab is interested in developing tools to analyze human biopsies. These tools come in multiple forms. We are developing web interfaces that allow us to interpret gene expression datasets from biopsies using gene sets that measure the cell types and inflammatory states of the samples. We are also developing methods to quantitatively estimate the amount of reference cell types within a sample using gene expression data. Finally, we are also developing assays to use DNA methylation to estimate the cell types found within samples. The rotation projects will involve the further development of these tools and their application to datasets that we will generate in collaboration with clinicians. The projects will span a broad range of clinical applications such as infectious diseases, cancer, and neurodegenerative diseases.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Matteo Pellegrini
matteop [at] mcdb [dot] ucla [dot] edu
Possibility of Funding?
No

Genetic and Genomic studies of Neuropsychiatric disorders

Project Description
Neuropsychiatric disorders are common, complex and polygenic traits. Despite high heritability estimates, only a small part of the genetic basis of these disorders has been identified. Our lab focuses mostly on human genetic studies of schizophrenia, bipolar disorder and amyotrophic lateral sclerosis (ALS).
We apply genetic and genomic tools such as RNA sequencing and whole genome sequencing to decipher the genetic architecture of these disorders in available cohorts as well as in in vitro model systems. These large datasets offer the opportunity to apply either existing methods or develop novel methods and strategies.
Requirements
No programming experience
Contact
Roel Ophoff
rophoff [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Computational Challenges in Cardiovascular Systems Biology

Project Description
Our group uses epigenomics to study complex phenotypes in cardiovascular disease. Our goals include understanding basic principles of chromatin biology and using epigenomics to operationalize precision health in human populations. Available projects involve analyses of proteomics and next generation nucleotide sequencing data with the goals of annotation, hierarchical comparison and discovery of emergent properties.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Tom Vondriska
tvondriska [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Population genetic simulations of natural selection in the human genome

Project Description
We are developing and applying mathematical models of how natural selection affects patterns of genetic variation across regions of the human genome. This particular project will involve performing population genetic simulations using existing software to 1) assess the accuracy of our theoretical predictions, and 2) help interpret signals seen in actual genetic variation data. I am looking for a motivated and talented student to play a prominent role in this project.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Kirk Lohmueller
klohmueller [at] ucla [dot] edu
Possibility of Funding?
Yes

Defining human cardiac protein half-life in healthy man and heart failure patients

Project Description
Cell synthesizes proteins and degrades proteins; the half-life of proteins is the key to biological function of all cells. Understanding protein turnover and protein dynamics is essential to developing novel therapies in diseases, including the treatment of heart failure. Using experimental data, this project will develop computational models for predicting the half-life of proteins in healthy individuals and how these protein half-lives maybe impacted in patients suffering from heart failure.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Peipei Ping
pping [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of mitochondrial protein expression profiles using four large protein data sets

Project Description
Mitochondrion produces ATP and is the energy source of all cells. A mitochondrial proteome may contain up to 2000 distinct proteins with various abundance and they form up to 100 networks/pathways. We have obtained four large protein datasets on four types of mitochondria: the human heart mitochondria; the mouse heart mitochondria; the mouse liver mitochondria; and the fly muscle mitochondria. The analyses of these four datasets will inform what are the core proteins essential to all mitochondrial proteomes, which proteins are unique and contribute to the specificities in function for heart, liver, and muscle, and which proteins are fundamental to the human heart mitochondrial proteome. These information will be essential for our understanding of human cardiac mitochondrial function.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Peipei Ping
pping [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes