bioinformatics minor courses

General Information
There are plenty of opportunities for Bioinformatics research projects at UCLA. This program is designed to help interested students find research projects related to Bioinformatics across campus. Typically, these projects are for credit; in exceptional circumstances they may offer funding. Participation in research projects can both significantly improve your chances of admittance into top graduate programs and make you a much more competitive employment candidate. Even better, it gives you something to talk about during an interview. Feel free to contact us even if you do not know exactly whether or not you want to work on a research project or know the field you wish to research in. Please remember that every undergraduate and masters student is welcome to participate in research, regardless of your background or year in the program. Undergraduates are STRONGLY encouraged to participate in research as early as possible in their careers. Ideally, you should start a research project during your sophomore year, but it is never too late or to early to start! Undergraduate students may receive up to 8 units credit toward the minor with enrollment in Computer Science 194/199 or Bioinformatics 194/199.

General Procedure
If you are reasonably sure which project you would like to work on, use the contact information listed under the project to contact the person responsible for the project directly to set up a meeting. If you are not sure, but you are even slightly interested in research, feel free to email us or drop in to help chose an appropriate project. Most students take a project for course credit, although funding may be available in some cases. You can contact Eleazar Eskin (eeskin [at] cs [dot] ucla [dot] edu) if you have any questions.

Research Projects
Below is a list of research projects that are accepting undergraduate researchers.

Integration of Epigenetic Datasets to Discover Novel Gene Regulation and Function

Project Description
There are a large number of publically available datasets for transcription factor binding sites (ChIP-Seq) which are untapped treasure troves of data for discovery biology. Our lab is interested in using these resources to help us identify functions of novel genes. We want to integrate ChIP, RNA and ATAC-Seq datasets using a combination of publically available data as well as data generated in our lab.

Requirements
One year of programming coursework such as PIC 10C or CS 32

Contact
Thomas Vallim
tvallim [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Diet networks for expression-based classification

Project Description
Neural networks are computationally tractable when data is thin (more samples than features), or if it contains highly correlated local structure (such as images or sound). By contrast, genetic data tends to be both fat (millions of features, thousands of samples) and lack exploitable local structure. Thus, deep learning approaches are not typically applied to large genomic datasets. However, a new network architecture termed a diet network, was recently been proposed to overcome this challenge, and demonstrated on DNA-sequencing data.

Diet networks work by exploiting good dual features: information about the features themselves (such as class histograms) that reveal structure in the feature space relevant to the classification task. For the task of population classification from genotype data, the within-population averages were used as dual features, and provided superior performance than current (linear-model-based) approaches.

While class means provide a good feature for classifications from genotype data, such features are not known for classification or regression tasks from RNA-seq data. However, several potential features such as observed co-expression and shared transcriptional regulation, could prove very effective for predicting likely disease-associated genes.

This methods-development project proposes to apply diet networks – incorporating biological features such as co-expression and predicted transcription factor binding – to gene expression data for the purpose of predicting disease status, cell-type specificity, or protein-protein interactions.

Requirements: Familiarity with linux shell and python programming (at the level of CS121/122). Pre-existing implementations of diet networks exist in both Theano+Lasagne and TensorFlow, however, experience with these frameworks is not required.

Requirements
One course in programming such as PIC 10A or CS 31

Contact
Chris Hartl
chartl [at] ucla [dot] edu
Possibility of Funding?
No

Adversarial networks for confounder and nuisance variables

Project Description
Datasets produced by biochemical assays face unique challenges in contrast to textual, image, and sensor data. In particular, assay noise is partly measurable: technical features exist which are correlated with the error term. In RNA-seq data, the fraction of duplicated molecules, the integrity of the input sample, and the ID of the measuring machine, each explain a significant fraction of the observed expression levels.

When these technical covariates are unmatched between classes – which is the typical case – they result in classifiers that fail to generalize beyond the initial sample (counfounding variables). At the same time, even when technical features are matched, their effects can mask true signal, eliminating good features from consideration (nuisance variables).

Because technical factors can be easily incorporated as control variables, linear models are the most widely-used method both for statistical inference and prediction. More sophisticated machine learning methods do not (in general) account for counfounding or nuisance variables, rendering them inapplicable to the biological setting.

This methods-development project proposes to use adversarial neural networks to control for both confounding and nuisance variables.

Requirements: Familiarity with linux shell and python programming (at the level of CS121/122); ability to quickly learn library APIs. Pre-existing experience with a deep learning library (Lasagne, Keras, TensorFlow, …) will be extremely helpful, but is not required.

Requirements
One course in programming such as PIC 10A or CS 31

Contact
Chris Hartl
chartl [at] ucla [dot] edu
Possibility of Funding?
No

Identifying systematic biochemical signatures of batch effects

Project Description
The most pernicious source of error in RNA-sequencing is inter-laboratory error (typically referred to as a batch effect). Slight differences in reagent concentration or quality, pH levels, temperature, input RNA concentrations, or trace contamination can result in large differences in measured RNA levels across laboratories – even when the very same RNA is analyzed.

These differences – sometimes as large as 80% of the total observed variance in gene expression – present a challenge when attempting to combine data from across multiple studies analyzed at different sites. Current methods to address this source of bias make obviously invalid assumptions, in particular that these differences are independent across genes.

However, while there are many uncontrollable factors changing between different labs, their effect on measured RNA expression levels are mediated by a single physical property: the shape of the mRNA itself. This intuition suggests that, by looking across many sequencing experiments from many sites, it should be possible to identify gene-specific factors that reproducibly explain inter-laboratory differences. For example, the sequence GC content has been long-established as a large driver of inter-site differences in gene expression.

This analysis project will take a gene-centric view of batch effects, and use gene-specific technical factors (such as coverage bias, sequence content, map-ability, and observed error rates) to identify features that are highly correlated with gene expression differences across studies. These features, if found, will become a key component of properly correcting for noise and/or bias introduced during library preparation, and enable cleaner and more accurate analyses of combined expression data.

Requirements: Familarity with linux shell, R programming, and python programming. Research assistants will build deep familiarity with RNA-seq data and algorithms for alignment, transcript-level quantification, splicing estimation, assembly, and QC; as well as mixed linear models, factor analysis, and dimensionality-reduction techniques (such as kernel PCA or tSNE).

Requirements
One course in programming such as PIC 10A or CS 31

Contact
Chris Hartl
chartl [at] ucla [dot] edu
Possibility of Funding?
No

Cell type deconvolution in brain tissue

Project Description
A major open challenge in neuropsychiatry is understanding the extent to which disease is driven by alterations in cell types, and the extent to which disease is driven by alterations in cell functions. Neurodegenerative diseases, such as Alzheimer’s and Parkinson’s diseases, are characterized by progressive loss of certain cell types – and therefore disease severity can be established by measuring cell loss. For other neuropsychiatric diseases, such as bipolar disorder or autism, roles for both disruption of cell populations and dysfunction of specific cell types have been proposed.

Heterogeneity of brain tissue makes it difficult to provide direct evidence for either of these hypotheses. It is hard to understand whether a difference in gene expression arises from changes in gene expression within a cell type, changes across multiple cell types, changes in the cellular makeup of the tissue, or all of these effects in combination.

Single-cell sequencing provides measurement of RNA from individual cells, enabling researchers to probe expression differences at a cellular level. However, this approach comes with a loss of resolution, as many transiently-expressed mRNA species will not be observed in single cells, and yet are routinely measured in whole tissue. Furthermore, existing case/control datasets are nearly entirely whole-tissue sequencing, and cannot be used trivially in single-cell comparisons.

Cell type deconvolution is a bioinformatics approach which aims to use existing single-cell sequencing to estimate cell type proportions from whole-tissue RNA expression. There are several published methods, but they have not been systematically tested in brain tissue, and are necessarily subject to inter-laboratory biases.

This analysis project aims to evaluate cell-type deconvolution methods on human and mouse RNA-seq data, with particular focus on how well they recapitulate known changes in cell type proportion in neurodegenerative disease, how sensitive they are to inter-laboratory effects, and how robust they are to “missing” (i.e. absent from single-cell sequencing) cell types.

Requirements: Familiarity with linux shell, R programming, and python programming. Research assistants will build familiarity with single-cell and whole-tissue sequencing data, the application of deconvolution methods, RNA-seq alignment and quantification, machine learning (linear models, SVMs, dynamic programming), and association analysis.

Requirements
One course in programming such as PIC 10A or CS 31

Contact
Chris Hartl
chartl [at] ucla [dot] edu
Possibility of Funding?
No

Regulation of brain gene expression across multiple species and regions

Project Description
Humans, great apes, and mice share similar numbers of genes and similar gross brain structure, yet their apparent behaviors can differ drastically. Understanding how evolutionary changes in gene regulation relate to differences in neurological phenotypes may reveal novel neurological roles for many genes, shed light on the development of brain structures and behaviors, and provide guidance for the relevance of model organisms to particular aspects of human disease.

Previous studies have focused on differences in gene expression within the cortex – a brain region showing very pronounced differences between humans and rodents. It is likely that evolutionary alterations in cortical expression pattern show concomitant changes in other brain regions. Gene expression studies are now available from multiple brain regions in human, great ape, and mouse samples, enabling the direct molecular comparison of multiple brain regions across species.

This analysis project aims to characterize genes which are unique to each species, or uniquely regulated in each species, in terms of their role in brain patterning. Using gene networks, we will place these genes in the context of whole brain, tissue-specific, and cell-type expression to determine their potential roles in brain function.

Requirements: Familiarity with linux shell, R programming, and (optionally) python programming. A basic understanding of RNA-transcription, orthologous genes, and synteny will be helpful for this project. Research assistants will build experience with RNA-seq data processing, coexpression network construction, comparative analysis in R, evolutionary conservation analysis, and use of numerous bioinformatics libraries and databases.

Requirements
One course in programming such as PIC 10A or CS 31

Contact
Chris Hartl
chartl [at] ucla [dot] edu
Possibility of Funding?
No

Evolutionary Forces in the Bengalese Finch Song: Parallels and Implications for the Study of Human Language Evolution

Project Description
Songbirds have provided a highly successful animal model for the study of certain aspects of human speech, including its production, perception and evolution. Among existing songbird models, the Bengalese Finch (BF) (Lonchura striata domestica) is becoming a popular one, due to its remarkably flexible vocal behavior, which evolved during BF’s domestication from the white-backed munia (WBM) (Lonchura striata). Different hypotheses have been proposed to explain how BF evolved a more flexible vocal behavior than its wild ancestor: one such hypothesis argues for the major role of positive selection (PS) (i.e. female choice for more complex songs); while an alternative hypothesis argues for the major role of relaxation of purifying selection (RS) (i.e. on-going relaxation of sources of purifying selection commonly found in the wild, but absent in the domesticated setting, such as pressures to avoid confusion with other cohabiting finch species). A more flexible vocal behavior is also assumed to distinguish the current human vocal behavior from its ancestral state, and the roles of PS and RS in human language evolution have also been intensely debated. Given the parallels between the changes in vocal behavior in BF relative to WBM and the proposed changes in the current human vocal behavior relative to its ancestral state; along with the existence of striking analogies between birdsong and human speech, we consider the WBM/BF songbird system to be a suitable model for providing insight into the evolution of human speech. We have begun an investigation of the evolutionary forces underlying the changes in vocal behavior in BF relative to WBM. This study involves whole-genome sequencing of individuals within the two bird strains, and subsequent scans for signatures of PS or RS, which will allow us to identify genes that have undergone PS or RS in BF relative to WBM. The final steps of this study involve the search for evolutionary convergence between analogous genes or biological pathways in BF relative to WBM and in humans relative to other primates.

This work will empirically identify evolutionary and genetic processes that correlate with increased social learning and flexibility of vocal behavior in an animal model that parallels key aspects of human language evolution. Furthermore, because BF and WBM are strains of the same species, comparisons between them present less phylogenetic confounds relative to comparisons of different songbird species. Birdsong is a paradigmatic model for the study of speech disorders in translational medicine. We expect this work to provide insights into understanding such disorders.

Expected time commitment – Min of 8h/week or 30h/month. This can be spread throughout the week days and includes attending lab meeting. Availability to work at later times (5-9pm), and possibly on the weekends.

Lab responsibilities – Work on the processing and analysis of next generation whole-genome sequencing data of two songbird strains. Even though, close mentoring will be provided by Professor White, Madza Farias-Virgens and collaborators at the White Lab, the student is expected to take on these responsibilities in the most autonomous way possible; to have initiative to develop skills in an autonomous way. The student will be accessing unique, unpublished data, it is therefore expected for any sort of data manipulation or sharing to be explicitly communicated to both direct mentors, Professor White and Madza Farias-Virgens. Other responsibilities include participating in lab and other meetings TBD, and engage in common lab practices in place at the White Lab.

Qualifications – Experience with running programs from the command line, shell scripting, using a cluster, writing programs/scripts in C/C++, Perl, or Python to parse large files, and using R. Some background on genetic variation and interest in population genetics are major plus. Specific fitting Majors are Computational and Systems Biology (B.S.), Ecology, Behavior, and Evolution (B.S.), Microbiology, Immunology, and Molecular Genetics (B.S.), Molecular, Cell, and Developmental Biology (B.S.), Neuroscience (B.S.), Physiological Science (B.S.), Psychobiology (B.S.), Psychology (B.A.), and Statistics (B.S.), but it could also include Biochemistry (B.S.), Biology (B.S.), Biophysics (B.S.), Chemistry (B.S.), Chemistry/Materials Science (B.S.), Cognitive Science (B.S.), Human Biology and Society (B.A.), Linguistics (B.A.), Linguistics and Computer Science (B.A.), Linguistics and Psychology (B.A.), Marine Biology (B.S.), Mathematics (B.S.), Mathematics, Applied (B.S.), Mathematics/Applied Science (B.S.) Mathematics of Computation

Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Madza Farias-Virgens
madzayasodara [at] ucla [dot] edu
Possibility of Funding?
Yes

Developing methods to study the composition of microbial communities

Project Description
Next-generation sequencing allows us to directly study the genetic material of microbial communities recovered directly from environmental samples. This process is known as metagenomics. We are interested in developing computational methods to study the composition of microbial communities, including bacteria, viruses, and eukaryotic pathogens. Traditional approaches rely on the microbial marker genes, which are only portions of the genome. We plan to use coverage of entire microbial genomes to determine presence-absence and relative abundance of specific taxa in a given community. In particular, we will focus on improving the accuracy and speed of the novel method. The method will be applied to data from the Human Microbiome Project (HMP) and multi-tissue data from Genotype-Tissue Expression (GTEx).
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Serghei Mangul
smangul [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of whole-genome sequencing data of neuropsychiatric disorders

Project Description
We are interested in identifying the genetic basis of several neuropsychiatric disorders such as bipolar disorder, Tourette Syndrome, and schizophrenia. We are currently analyzing several whole-genome sequencing (WGS) data for those disorders and also developing new computational and statistical approaches for the large-scale WGS data. This project involves applying existing software tools to the large-scale WGS data using the high performance cluster at UCLA and also improving the existing tools.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Jae Hoon Sul
jaehoonsul [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Genomic approaches to cell cycle

Project Description
We are studying the difference between cells that are dividing and cells that have stopped dividing. We are interested in working with an undergraduate student with strong computer skills and an interest in bioinformatics to understand the changes in isoform expression upon cell cycle exit. The student will assist us with analysis of next generation sequencing data, identification of splicing variants and changes in isoform use, and text based analysis of sequence motifs surrounding differentially present 5’ UTRs, exons and 3’ UTRs.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Hilary Coller
hcoller [at] ucla [dot] edu
Possibility of Funding?
Yes

Genomic Analysis using Statistical Methods

Project Description
The research in the“Junction of Statistics and Biology” group (http://www.stat.ucla.edu/~jingyi.li/) focuses on developing statistical methods to address important biological and biomedical questions from high-throughput genetic and genomic data. This project aims to quantify full-length mRNA transcripts across various tissues and cells of different species. Required Experience:
complete some courses in bioinformatics, genomics and programming

· basic knowledge in RNA-seq data analysis

· programming skills : Linux, Bash, R, Python, Perl

· working more than 10-15 hours per week

Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Jingyi (Jessica) Li
jli [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Machine Learning and Text-Mining Applications for Indexing Biomedical Research Software

Project Description
Advances in bioinformatics, especially in the field of genomics, have been greatly accelerated by the progress in more powerful computational systems that enable larger and larger amounts of data to be quickly processed. This has resulted in a rapid increase in the number of software tools, databases, and knowledge bases for biology publicly available. Unfortunately, the lack of systems for assisting users to search and find those most suited for their needs is becoming a significant obstacle. Our lab aims to develop a computational platform (https://aztec.bio) that will aggregate, index, and integrate all biomedical research software. We are developing methods for classifying biomedical software and extracting relevant metadata from scientific publications.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Machine Learning in Textual Data of Cardiovascular Disease

Project Description
Currently, over 2.2 million cardiovascular-related scientific articles are available online, but are largely unstructured, making it a formidable challenge to identify datasets and to comprehend information. We aim to address this big data challenge by developing text-mining and machine learning methods to discover new insights from clinical data and scientific literature.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Peipei Ping
admin [at] heartbd2k [dot] org
Possibility of Funding?
Yes

Regulation of Gene Expression during Development and Reprogramming

Project Description
Placental female mammals such as humans and mice have two X chromosomes (XX), but one of them becomes transcriptionally inactivated early in development. Recent evidence largely based on single-cell pre-implantation human embryo studies suggests existence of key differences in early development of human versus mouse. Prior to the onset of X-inactivation, both human X chromosomes are active, but not to the full extent. This newly observed human-specific X-to-autosome chromosome dosage compensation is termed X chromosome dampening, but the extent of it or its mechanism remains to be explored. This project will analyze single-cell RNA-sequencing data of human pre-implantation blastocysts (published datasets) and naive human embryonic stem cells (hESCs, from our laboratory) to understand exactly which genes of the X-chromosome are affected by dampening, whether both X chromosomes are dampened to the same extent, and reveal the role of long-noncoding RNAs in these processes.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Kathrin Plath
kplath [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of transcriptome complexity using deep RNA sequencing

Project Description
Deep RNA sequencing has emerged as a powerful technology for transcriptome analysis. By generating massive amount of short sequence reads from a given RNA sample, one can use RNA-Seq to define and quantify patterns of gene expression and RNA processing on a genomic scale. This project will develop methods for analysis of transcriptome complexity (gene expression, RNA processing, non-coding RNA) using RNA-Seq data, and apply these methods to study transcriptome regulation and mRNA isoform expression in development and disease.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Yi Xing
yxing [at] ucla [dot] edu
Possibility of Funding?
Yes

Identifying loci for regulation of RNA splicing in mice

Project Description
We have obtained deep RNA sequencing data from a panel of inbred mouse strains. The genome of these strains is well characterized, allowing fine mapping of loci involved in regulating gene expression. The project is to identify loci involved in splice site selection.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Des Smith
DSmith [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Fine mapping genes for drug action

Project Description
Although a staggering array of drugs is available for disorders such as cancer and autoimmunity, much remains unknown about their genetic targets. The project repurposes the technology of radiation hybrid (RH) panels to identify genes for drug action. RH cells contain extra copies of randomly selected genes and offer the opportunity to pinpoint functional drug/gene interactions with high precision. The project involves analyzing the data from these RH mapping experiments to identify gene targets for drugs of medical relevance.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Des Smith
DSmith [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Development of a metabolomics and proteomics pipeline for high-throughput analysis of cancer biology and immunity samples

Project Description
Recent advances in cancer biology have shown massive changes in the transcriptome, proteome and metabolome of tumor specimen in response to drug treatment. Our lab aims to understand the governing principles that cause these global changes. To this end, we are conducting metabolomics and proteomics analyses of cancer cell lines and xenograft tumors using top-of-the-line mass spectrometry equipment.

This project will develop the bioinformatic tools necessary to establish a high-throughput pipeline for the mass spectrometry-based analysis of biological samples. The project includes creation of bioinformatic algorithms and pipelines for analyzing multi-omic data, and collaboration with biologists in the analysis and interpretation of the data.

Requirements
One course in programming such as PIC 10A or CS 31
Contact
Thomas Graeber
tgraeber [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

The evolutionary dynamics of cephalopods

Project Description
Living cephalopods (octopuses, squid, and nautiluses) comprise over 700 species but their evolution is thought to reflect a series of “arms races” with other marine predators including sharks, marine reptiles, and ancient and modern fishes that has led to the waxing and waning of species richness through time. I am seeing an undergraduate student with some programming experience to compile occurrence data from fossil databases and conduct comparative evolutionary analyses that will measure changing rates of speciation and extinction and test arms race hypotheses.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Building and analyzing the fish tree of life

Project Description
We are currently assembling the largest phylogenetic tree of vertebrates based upon published gene sequences and seek one or more students to assist with scripting and analysis. This project involves creating multi gene alignments from genetic databases, reconciling Genbank taxonomy with published classifications, phylogenetic reconstruction, and macroevolutionary analyses.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Crowdsourcing of phenotypic data

Project Description
We are developing software tools through Amazon mechanical turk to enable crowdsourced collection of shape data on a massive scale. This project will involve development of software protocols for data collection and analysis of geometric morphometric data.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Michael Alfaro
michaelalfaro [at] ucla [dot] edu
Possibility of Funding?
No

Database construction for skin metagenomic data

Project Description
The human microbiome plays important roles in human physiology and has become a new exciting research field in recent years. Our group studies the human skin microbiome and oral microbiome and their associations with diseases. This project will develop a database for the metagenomic data and genome data that we obtained to study the disease associations.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Huiying Li
huiying [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Internet Services for Collaborative Data Sharing (ISCDS)

Project Description
The ISCDS project aims to develop a system enabling the examination of research grant proposals data (specifically reviewer’s and study section’s (panel’s) tendency to deliver a constructive (positive, negative or neutral) evaluation of grant applications). The project will collect and mine all publicly available information and construct a set of metrics characterizing research grants based on applicant/application-topic. Specifically, ISCDS will provide a webservice enabling: Community Entry of Data, Community Export of Data, Exploratory Data Analysis/Graphics, Model Estimation/Model Fitting, and Outcome Prediction.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Graphical Pipeline Workflow Environment for Visual Informatics and Genomics

Project Description
“Informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed protocol provenance.

This project will extend the Pipeline Environment (http://pipeline.loni.ucla.edu), a client-server distributed computational infrastructure, to enable the visual graphical construction, execution, monitoring, validation and dissemination of advanced informatics and genomics data analysis protocols. Examples of diverse genomics tools and the interoperability of informatics tools include EMBOSS, mrFAST, GWASS, PLINK, MAQ, SAMtools, Bowtie, CNVer, etc.”

Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Statistics Online Computaitonal Resource (SOCR)

Project Description
“SOCR R&D efforts revolve around developing HTML5/JavaScript routines, software adn interfaces for data science, predictive analytics, statistical computing and visualization. Review the following materials:
SOCR Resource:  www.SOCR.ucla.edu
SOCR source archive: https://github.com/SOCR  
SOCR projects http://wiki.stat.ucla.edu/socr/index.php/Available_SOCR_Development_Projects

Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Ivo Dinov
dinov [at] stat [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Large-Scale Epigenomic Data Sets

Project Description
Advances in sequencing technology has enabled unprecedented ability to experimentally map genome-wide epigenetic features such as histone modifications, DNA methylation, and regions of open chromatin in a large number of cell types and conditions. Potential projects include analysis and/or method development in the context of leveraging large-scale epigenomic datasets to address problems related to stem cell reprogramming, splicing, cancer progression, and neuropsychiatric diseases.
Requirements
One year of programming coursework such as PIC 10C or CS 32, plus one Bioinformatics core course
Contact
Jason Ernst
jason [dot] ernst [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Single Nucleotide Variants in RNA-Seq Data

Project Description
Our lab handles a large amount of high-throughput sequencing data of different types from the ENCODE and other large projects. With these data, we routinely analyze gene expression, alternative splicing, expressed polymorphisms, RNA editing and protein-RNA interaction. This project will develop methods for analyzing RNA-Seq data and measuring expression of single nucleotide variants.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Xinshu (Grace) Xiao
gxxiao [at] ucla [dot] edu
Possibility of Funding?
Yes

Integrative OMICs methods in neurodegenerative dementia

Project Description
The long-term goal of our group is to advance our understanding of the genetic architecture of neuropsychiatric disorders. We have collected genetic, genomic (gene expression, methylation) imaging and phenotypic data in a large series of patients with neurodegenerative conditions, including Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD). In collaboration with Dr. Horvath, biostatistician at UCLA, and Dr. Paul Thompson (Department of Neurology) we are developing network-based methods to integrate multiple layers of information, including genetic, genomic, epigenetic, and neuroimaging data.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Giovanni Coppola
gcoppola [at] ucla [dot] edu
Possibility of Funding?
Yes

A multidimensional, web-based database for OMICs data

Project Description
We have created and developed three web-based databases to host high-throughput data, which are becoming major tools for collaboration and data mining within our lab and among our collaborators. We are now working to compile them in a single suite of tools for web0based mining of OMICs data.
Requirements
One core course in Bioinformatics such as CS 121 CS 122 or CS 124
Contact
Giovanni Coppola
gcoppola [at] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of Variant-specific Gene and Isoform Expression using RNA-seq Data

Project Description
Current massive parallel sequencing technologies allow us to investigate human transcriptome for changes conferring the susceptibility to obesity and high serum cholesterol and triglyceride levels. We hypothesize that there are DNA sequence variants influencing allele-specific expression of near-by genes that in turn increases the risk of obesity, dyslipidemia, and cardiovascular disease. This project will develop and employ approaches elucidating these expression changes using human adipose RNA-seq data.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Paivi Pajukanta
ppajukanta [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Effect of DNA methylation on genomic stability

Project Description
Dysregulated DNA methylation has been associated with many diseases including cancers, but for reasons that are not well understood. Our lab has developed a model system to assess genomic stability based on different levels of DNA methylation. This project will examine a variety of model systems ranging in yeast, embryonic stem cells, and cancer cells to precisely quantify how DNA methylation contributes to genome stability.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Guoping Fan
gfan [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Cell type deconvolution of clinical samples

Project Description
Our lab is interested in developing tools to analyze human biopsies. These tools come in multiple forms. We are developing web interfaces that allow us to interpret gene expression datasets from biopsies using gene sets that measure the cell types and inflammatory states of the samples. We are also developing methods to quantitatively estimate the amount of reference cell types within a sample using gene expression data. Finally, we are also developing assays to use DNA methylation to estimate the cell types found within samples. The rotation projects will involve the further development of these tools and their application to datasets that we will generate in collaboration with clinicians. The projects will span a broad range of clinical applications such as infectious diseases, cancer, and neurodegenerative diseases.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Matteo Pellegrini
matteop [at] mcdb [dot] ucla [dot] edu
Possibility of Funding?
No

Genetic and Genomic studies of Neuropsychiatric disorders

Project Description
Neuropsychiatric disorders are common, complex and polygenic traits. Despite high heritability estimates, only a small part of the genetic basis of these disorders has been identified. Our lab focuses mostly on human genetic studies of schizophrenia, bipolar disorder and amyotrophic lateral sclerosis (ALS).
We apply genetic and genomic tools such as RNA sequencing and whole genome sequencing to decipher the genetic architecture of these disorders in available cohorts as well as in in vitro model systems. These large datasets offer the opportunity to apply either existing methods or develop novel methods and strategies.
Requirements
No programming experience
Contact
Roel Ophoff
rophoff [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
No

Computational Challenges in Cardiovascular Systems Biology

Project Description
Our group uses epigenomics to study complex phenotypes in cardiovascular disease. Our goals include understanding basic principles of chromatin biology and using epigenomics to operationalize precision health in human populations. Available projects involve analyses of proteomics and next generation nucleotide sequencing data with the goals of annotation, hierarchical comparison and discovery of emergent properties.
Requirements
One year of programming coursework such as PIC 10C or CS 32
Contact
Tom Vondriska
tvondriska [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Population genetic simulations of natural selection in the human genome

Project Description
We are developing and applying mathematical models of how natural selection affects patterns of genetic variation across regions of the human genome. This particular project will involve performing population genetic simulations using existing software to 1) assess the accuracy of our theoretical predictions, and 2) help interpret signals seen in actual genetic variation data. I am looking for a motivated and talented student to play a prominent role in this project.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Kirk Lohmueller
klohmueller [at] ucla [dot] edu
Possibility of Funding?
Yes

Defining human cardiac protein half-life in healthy man and heart failure patients

Project Description
Cell synthesizes proteins and degrades proteins; the half-life of proteins is the key to biological function of all cells. Understanding protein turnover and protein dynamics is essential to developing novel therapies in diseases, including the treatment of heart failure. Using experimental data, this project will develop computational models for predicting the half-life of proteins in healthy individuals and how these protein half-lives maybe impacted in patients suffering from heart failure.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Peipei Ping
pping [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes

Analysis of mitochondrial protein expression profiles using four large protein data sets

Project Description
Mitochondrion produces ATP and is the energy source of all cells. A mitochondrial proteome may contain up to 2000 distinct proteins with various abundance and they form up to 100 networks/pathways. We have obtained four large protein datasets on four types of mitochondria: the human heart mitochondria; the mouse heart mitochondria; the mouse liver mitochondria; and the fly muscle mitochondria. The analyses of these four datasets will inform what are the core proteins essential to all mitochondrial proteomes, which proteins are unique and contribute to the specificities in function for heart, liver, and muscle, and which proteins are fundamental to the human heart mitochondrial proteome. These information will be essential for our understanding of human cardiac mitochondrial function.
Requirements
One course in programming such as PIC 10A or CS 31
Contact
Peipei Ping
pping [at] mednet [dot] ucla [dot] edu
Possibility of Funding?
Yes