bioinformatics minor courses

General Information
There are plenty of opportunities for Bioinformatics research projects at UCLA. This program is designed to help interested students find research projects related to Bioinformatics across campus. Typically, these projects are for credit; in exceptional circumstances they may offer funding. Participation in research projects can both significantly improve your chances of admittance into top graduate programs and make you a much more competitive employment candidate. Even better, it gives you something to talk about during an interview. Feel free to contact us even if you do not know exactly whether or not you want to work on a research project or know the field you wish to research in. Please remember that every undergraduate and masters student is welcome to participate in research, regardless of your background or year in the program. Undergraduates are STRONGLY encouraged to participate in research as early as possible in their careers. Ideally, you should start a research project during your sophomore year, but it is never too late or to early to start! Undergraduate students may receive up to 8 units credit toward the minor with enrollment in Computer Science 194/199 or Bioinformatics 194/199.

General Procedure
If you are reasonably sure which project you would like to work on, use the contact information listed under the project to contact the person responsible for the project directly to set up a meeting. If you are not sure, but you are even slightly interested in research, feel free to email us or drop in to help chose an appropriate project. Most students take a project for course credit, although funding may be available in some cases. You can contact Eleazar Eskin (eeskin [at] cs [dot] ucla [dot] edu) if you have any questions.

Research Projects
Below is a list of research projects that are accepting undergraduate researchers.

Identifying loci for regulation of RNA splicing in mice

Project Description
We have obtained deep RNA sequencing data from a panel of inbred mouse strains. The genome of these strains is well characterized, allowing fine mapping of loci involved in regulating gene expression. The project is to identify loci involved in splice site selection.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Des Smith
DSmith [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Fine mapping genes for drug action

Project Description
Although a staggering array of drugs is available for disorders such as cancer and autoimmunity, much remains unknown about their genetic targets. The project repurposes the technology of radiation hybrid (RH) panels to identify genes for drug action. RH cells contain extra copies of randomly selected genes and offer the opportunity to pinpoint functional drug/gene interactions with high precision. The project involves analyzing the data from these RH mapping experiments to identify gene targets for drugs of medical relevance.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Des Smith
DSmith [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Effect of DNA methylation on genomic stability

Project Description
Dysregulated DNA methylation has been associated with many diseases including cancers, but for reasons that are not well understood. Our lab has developed a model system to assess genomic stability based on different levels of DNA methylation. This project will examine a variety of model systems ranging in yeast, embryonic stem cells, and cancer cells to precisely quantify how DNA methylation contributes to genome stability.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Guoping Fan
gfan [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Investigating Histone Modifications with Cell Cycle Exit

Project Description
We generated ChIP-Seq datasets that provide information about the genomewide location of specific histone modifications in cells that are actively cycling and in cells that have exited the proliferative cell cycle. We are interested in recruiting a student to assist with the analysis of these datasets and determining the biological importance of changes in genomewide histone modification localization. The following skills would be help the student to be most successful in the project: familiarity with programming in R, basic statistics, ChIP-seq analysis, motif searching.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Hilary Coller
hcoller [at] ucla [dot] edu
Possibility of Funding?
No

Investigating Differential Isoform Expression with Cell Cycle Exit

Project Description
We generated next generation sequencing datasets that provide information on the expression of different isoforms of genes in cells that are cycling and cells that have exited the proliferative cell cycle. We are recruiting a student to assist with the analysis of these datasets and determining the biological importance of changes in isoform expression. The following skills would be help the student to be most successful in the project: familiarity with programming in R, basic statistics, RNA-seq analysis, motif searching.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Hilary Coller
hcoller [at] ucla [dot] edu
Possibility of Funding?
No

Database construction for skin metagenomic data

Project Description
The human microbiome plays important roles in human physiology and has become a new exciting research field in recent years. Our group studies the human skin microbiome and oral microbiome and their associations with diseases. This project will develop a database for the metagenomic data and genome data that we obtained to study the disease associations.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Huiying Li
huiying [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Internet Services for Collaborative Data Sharing (ISCDS)

Project Description
The ISCDS project aims to develop a system enabling the examination of research grant proposals data (specifically reviewer’s and study section’s (panel’s) tendency to deliver a constructive (positive, negative or neutral) evaluation of grant applications). The project will collect and mine all publicly available information and construct a set of metrics characterizing research grants based on applicant/application-topic. Specifically, ISCDS will provide a webservice enabling: Community Entry of Data, Community Export of Data, Exploratory Data Analysis/Graphics, Model Estimation/Model Fitting, and Outcome Prediction.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Graphical Pipeline Workflow Environment for Visual Informatics and Genomics

Project Description
Informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed protocol provenance.
This project will extend the Pipeline Environment (http://pipeline.loni.ucla.edu), a client-server distributed computational infrastructure, to enable the visual graphical construction, execution, monitoring, validation and dissemination of advanced informatics and genomics data analysis protocols. Examples of diverse genomics tools and the interoperability of informatics tools include EMBOSS, mrFAST, GWASS, PLINK, MAQ, SAMtools, Bowtie, CNVer, etc.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Statistics Online Computaitonal Resource (SOCR)

Project Description
SOCR R&D efforts revolve around developing HTML5/JavaScript routines, software adn interfaces for data science, predictive analytics, statistical computing and visualization. Review the following materials:
SOCR Resource:  www.SOCR.ucla.edu
SOCR source archive:
https://github.com/SOCR
SOCR projects http://wiki.stat.ucla.edu/socr/index.php/Available_SOCR_Development_Projects
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of whole-genome sequencing data of neuropsychiatric disorders

Project Description
We are interested in identifying the genetic basis of several neuropsychiatric disorders such as bipolar disorder, Tourette Syndrome, and schizophrenia. We are currently analyzing several whole-genome sequencing (WGS) data for those disorders and also developing new computational and statistical approaches for the large-scale WGS data. This project involves applying existing software tools to the large-scale WGS data using the high performance cluster at UCLA and also improving the existing tools.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Jae Hoon Sul
jaehoonsul [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Large-Scale Epigenomic Data Sets

Project Description
Advances in sequencing technology has enabled unprecedented ability to experimentally map genome-wide epigenetic features such as histone modifications, DNA methylation, and regions of open chromatin in a large number of cell types and conditions. Potential projects include analysis and/or method development in the context of leveraging large-scale epigenomic datasets to address problems related to stem cell reprogramming, splicing, cancer progression, and neuropsychiatric diseases.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one bioinformatics core
Contact
Jason Ernst
jason [dot] ernst [at] ucla [dot] edu
Possibility of Funding?
Yes

Genomic Analysis using Statistical Methods

Project Description
The research in the“Junction of Statistics and Biology” group (http://jsb.ucla.edu) focuses on developing statistical methods to address important biological and biomedical questions from high-throughput genetic and genomic data. This project aims to quantify full-length mRNA transcripts across various tissues and cells of different species. Required Experience: – complete some courses in bioinformatics, genomics and programming· basic knowledge in RNA-seq data analysis· programming skills : Linux, Bash, R, Python, Perl· working more than 10-15 hours per week
Requirements
One course in programming such as PIC 10A or CS 31, plus one bioinformatics core
Contact
Jingyi (Jessica) Li
jli [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Population genetic simulations of natural selection in the human genome

Project Description
We are developing and applying mathematical models of how natural selection affects patterns of genetic variation across regions of the human genome. This particular project will involve performing population genetic simulations using existing software to 1) assess the accuracy of our theoretical predictions, and 2) help interpret signals seen in actual genetic variation data. I am looking for a motivated and talented student to play a prominent role in this project.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Kirk Lohmueller
klohmueller [at] ucla [dot] edu
Possibility of Funding?
Yes, especially if working over the summer or for longer periods of time.

Cell type deconvolution of clinical samples

Project Description
Our lab is interested in developing tools to analyze human biopsies. These tools come in multiple forms. We are developing web interfaces that allow us to interpret gene expression datasets from biopsies using gene sets that measure the cell types and inflammatory states of the samples. We are also developing methods to quantitatively estimate the amount of reference cell types within a sample using gene expression data. Finally, we are also developing assays to use DNA methylation to estimate the cell types found within samples. The rotation projects will involve the further development of these tools and their application to datasets that we will generate in collaboration with clinicians. The projects will span a broad range of clinical applications such as infectious diseases, cancer, and neurodegenerative diseases.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Matteo Pellegrini
matteop [at] mcdb [dot] ucla [dot] edu
Possibility of Funding?
No

The evolutionary dynamics of cephalopods

Project Description
Living cephalopods (octopuses, squid, and nautiluses) comprise over 700 species but their evolution is thought to reflect a series of “arms races” with other marine predators including sharks, marine reptiles, and ancient and modern fishes that has led to the waxing and waning of species richness through time. I am seeing an undergraduate student with some programming experience to compile occurrence data from fossil databases and conduct comparative evolutionary analyses that will measure changing rates of speciation and extinction and test arms race hypotheses.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Building and analyzing the fish tree of life

Project Description
We are currently assembling the largest phylogenetic tree of vertebrates based upon published gene sequences and seek one or more students to assist with scripting and analysis. This project involves creating multi gene alignments from genetic databases, reconciling Genbank taxonomy with published classifications, phylogenetic reconstruction, and macroevolutionary analyses.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Crowdsourcing of phenotypic data

Project Description
We are developing software tools through Amazon mechanical turk to enable crowdsourced collection of shape data on a massive scale. This project will involve development of software protocols for data collection and analysis of geometric morphometric data.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Analysis of Variant-specific Gene and Isoform Expression using RNA-seq Data

Project Description
Current massive parallel sequencing technologies allow us to investigate human transcriptome for changes conferring the susceptibility to obesity and high serum cholesterol and triglyceride levels. We hypothesize that there are DNA sequence variants influencing allele-specific expression of near-by genes that in turn increases the risk of obesity, dyslipidemia, and cardiovascular disease. This project will develop and employ approaches elucidating these expression changes using human adipose RNA-seq data.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Paivi Pajukanta
ppajukanta [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Machine Learning and Text-Mining Applications for Indexing Biomedical Research Software

Project Description
Advances in bioinformatics, especially in the field of genomics, have been greatly accelerated by the progress in more powerful computational systems that enable larger and larger amounts of data to be quickly processed. This has resulted in a rapid increase in the number of software tools, databases, and knowledge bases for biology publicly available. Unfortunately, the lack of systems for assisting users to search and find those most suited for their needs is becoming a significant obstacle. Our lab aims to develop a computational platform (https://aztec.bio) that will aggregate, index, and integrate all biomedical research software. We are developing methods for classifying biomedical software and extracting relevant metadata from scientific publications.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Machine Learning in Textual Data of Cardiovascular Disease

Project Description
Currently, over 2.2 million cardiovascular-related scientific articles are available online, but are largely unstructured, making it a formidable challenge to identify datasets and to comprehend information. We aim to address this big data challenge by developing text-mining and machine learning methods to discover new insights from clinical data and scientific literature. One subset of this work concerns developing systems to analyse and parse clinical case reports. As a source of biomedical evidence, case reports offer observations of cardiovascular symptoms, disease, and prognoses not seen in other resources, though extracting relevant features from these documents requires development of new medical language processing methods. Such methods may then be used as part of machine learning pipelines to discover properties common to cardiovascular disease as it appears in clinical environments. Additionally, the resulting models of text features may be applied to tools for parsing other medical text, including electronic health records. The results of this project will therefore enable both researchers and clinicians to more rapidly interpret medical text relevant to cardiovascular disease.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Analysis of mitochondrial protein expression profiles using four large protein datasets

Project Description
The mitochondrion produces ATP and is the energy source of all cells. A mitochondrial proteome may contain up to 2000 distinct proteins with various abundance and they form up to 100 networks/pathways. We have obtained four large protein datasets on four types of mitochondria: the human heart mitochondria; the mouse heart mitochondria; the mouse liver mitochondria; and the fly muscle mitochondria. The analyses of these four datasets will inform what are the core proteins essential to all mitochondrial proteomes, which proteins are unique and contribute to the specificities in function for heart, liver, and muscle, and which proteins are fundamental to the human heart mitochondrial proteome. These information will be essential for our understanding of human cardiac mitochondrial function.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Building Metadata for TopMed Datasets, including Machine Learning Auto Extraction

Project Description
TOPMed database contains several high value human cohorts which are quite diverse in nature, ranging from cardiovascular disease to chronic lung disease cohorts. The bioinformatics investigators may wish to identify molecular signatures that are predictive of clinical outcomes and determine phenotype-genotype associations using machine learning algorithms. Moreover, disease phenotypes among different human cohorts may be interrelated. For example, a high fraction of patients from chronic lung disease cohort can also suffer from Congestive Heart Failure. The integration of datasets across cohorts will allow the quantification of lung and cardiovascular disease clinical data under different environmental conditions. Therefore, TOPMed investigators require a platform to query and integrate the datasets available in these longitudinal cohorts in order to link progression and outcomes with omics signatures. We want to define metadata standards and standardize datasets among human cohorts within TOPMed. Furthermore, we want to develop tools for automatic metadata extraction from different types of datasets, supported by consistent ontology.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Protein Interaction Networks and Integration with Novel Graphs

Project Description
Protein-protein interactions (PPI) can reveal protein functions, especially when viewed in the context of protein interaction networks. Within a network, changes to a target of interest can reveal impacts relevant to its interactions: loss of a protein within a cell due to mutation, for example, may impact all potentially interacting proteins, as well as the proteins those protein interact with, and so on, resulting in network perturbations. Examining the complex nature of changes within protein interaction networks often requires comprehensive experimental data sets such as proteomes. Our lab has produced proteomes of the mammalian heart, including experimental quantification of proteins under conditions mimicking heart disease and measurements of changing amounts of these proteins over time. We are now developing methods to combine these and other proteome data sets with protein interaction networks. Integration of these data will likely reveal the protein interactions most likely to be impacted by heart disease, providing evidence for further studies or for the development of novel therapies.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Genetic and Genomic studies of Neuropsychiatric disorders

Project Description
Neuropsychiatric disorders are common, complex and polygenic traits. Despite high heritability estimates, only a small part of the genetic basis of these disorders has been identified. Our lab focuses mostly on human genetic studies of schizophrenia, bipolar disorder and amyotrophic lateral sclerosis (ALS).
We apply genetic and genomic tools such as RNA sequencing and whole genome sequencing to decipher the genetic architecture of these disorders in available cohorts as well as in in vitro model systems. These large datasets offer the opportunity to apply either existing methods or develop novel methods and strategies.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Roel Ophoff
rophoff [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Developing methods to study the composition of microbial communities

Project Description
Next-generation sequencing allows us to directly study the genetic material of microbial communities recovered directly from environmental samples. This process is known as metagenomics. We are interested in developing computational methods to study the composition of microbial communities, including bacteria, viruses, and eukaryotic pathogens. Traditional approaches rely on the microbial marker genes, which are only portions of the genome. We plan to use coverage of entire microbial genomes to determine presence-absence and relative abundance of specific taxa in a given community. In particular, we will focus on improving the accuracy and speed of the novel method. The method will be applied to data from the Human Microbiome Project (HMP) and multi-tissue data from Genotype-Tissue Expression (GTEx).
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Serghei Mangul
smangul [at] ucla [dot] edu
Possibility of Funding?
Yes

Application of integrative omics analysis pipelines for cancer systems biology and immunity studies

Project Description
Recent advances in cancer biology have shown massive changes in the transcriptome, proteome and metabolome of tumor specimen in response to drug treatment and acquired resistance. Our lab studies the complexity of mis-wired cancer cells, and the elegance of systems programs enacted by immune cells to accomplish their specialized anti-tumor functions. We aim to understand the governing principles that result in global changes during tumorigenesis and therapy resistance acquisition; with the end goal to identify new therapeutic vulnerabilities in the evolving cancers.
To this end, we are conducting multi-omics experimentation for systems biology analysis. This includes NGS sequencing approaches for transcriptomics, DNA mutation profiling, DNA copy number alteration (CNA) profiling, DNA methylation, chromatin accessibility (ATAC-seq), as well as in lab metabolomics and proteomics analyses of cancer cell lines and tumors using top-of-the-line mass spectrometry equipment.
This project will develop custom bioinformatic analysis pipelines to address clinic-linked cancer biology questions. The project includes creation of bioinformatic algorithms and pipelines for analyzing multi-omic data, and collaboration with biologists in the analysis and interpretation of data.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Thomas Graeber
tgraeber [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Can evolve to a paid position.

Integration of Epigenetic Datasets to Discover Novel Gene Regulation and Function

Project Description
There are a large number of publically available datasets for transcription factor binding sites (ChIP-Seq) which are untapped treasure troves of data for discovery biology. Our lab is interested in using these resources to help us identify functions of novel genes. We want to integrate ChIP, RNA and ATAC-Seq datasets using a combination of publically available data as well as data generated in our lab.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Thomas Vallim
tvallim [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Computational Challenges in Cardiovascular Systems Biology

Project Description
Our group uses epigenomics to study complex phenotypes in cardiovascular disease. Our goals include understanding basic principles of chromatin biology and using epigenomics to operationalize precision health in human populations. Available projects involve analyses of proteomics and next generation nucleotide sequencing data with the goals of annotation, hierarchical comparison and discovery of emergent properties. Visit our lab website: www.vondriskalab.org.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Tom Vondriska
tvondriska [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes; to be determined by level of student commitment; course credit also available

Analysis of Single Nucleotide Variants in RNA-Seq Data

Project Description
Our lab handles a large amount of high-throughput sequencing data of different types from the ENCODE and other large projects. With these data, we routinely analyze gene expression, alternative splicing, expressed polymorphisms, RNA editing and protein-RNA interaction. This project will develop methods for analyzing RNA-Seq data and measuring expression of single nucleotide variants.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one bioinformatics core
Contact
Xinshu (Grace) Xiao
gxxiao [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of transcriptome complexity using deep RNA sequencing.

Project Description
Deep RNA sequencing has emerged as a powerful technology for transcriptome analysis. By generating massive amount of short sequence reads from a given RNA sample, one can use RNA-Seq to define and quantify patterns of gene expression and RNA processing on a genomic scale. This project will develop methods for analysis of transcriptome complexity (gene expression, RNA processing, non-coding RNA) using RNA-Seq data, and apply these methods to study transcriptome regulation and mRNA isoform expression in development and disease.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one bioinformatics core
Contact
Yi Xing
yxing [at] ucla [dot] edu
Possibility of Funding?
Yes