top of page

Publications & Projects

Publications

Bheemanahalli, R., Knight, M., Quinones, C., Doherty, C. J., & Jagadish, S. K. (2021). Genome-wide association study and gene network analyses reveal potential candidate genes for high night temperature tolerance in rice. Scientific Reports, 11(1), 1-17.

Projects

Investigating Cis Regulatory Elements in Differential Expression| March 2021 - Present

  • Performing standard RNA-Sequencing analysis on multiple publicly available datasets from GEO.

  • Utilizing machine learning techniques to investigate how features in the cis-regulatory regions of genes contribute to their differential expression.

  • Investigating different machine learning algorithms for accuracy and performance.

Studying the Impact of Spaceflight on Sequence Variants | Jan 2020 - Present

  • Explore NASA’s GeneLab (an Omic’s database) for relevant A. thaliana transcriptome data from spaceflight experiments.

  • Creating a pipeline to analyze RNA-Sequencing data for sequence variants associated with spaceflight conditions.

    • Took into account the nature of plant expression data, a highly inbred species, and RNA-Seq data so that the pipeline would give appropriate variant count information.

    • The pipeline encompasses many bioinformatic tools including fastqc, STAR, GATK’s Mutect2 and VariantFiltration.

Rice Candidate Gene Project | Aug 2019 - Oct 2020

  • Utilized a gene-regulatory network from Oryza sativa ChIP-sequencing data to prioritize potential candidate genes which were identified in a GWAS study.

  • Created interactive network visuals using R so future readers could investigate the potential candidate genes.

  • Performed differential expression analysis using R’s edgeR on Oryza sativa count information.

  • Used a Linux environment to run QTG-Finder2 in order to rank Oryza sativa candidate genes based on their orthology to known causal genes.

  • Associated publication linked above (Bheemanahalli et al.).

Wheat Phenotypic Data Project | Summer 2019

  • Formulated wheat data into clear phenotypes for over 300 varieties of wheat.

  • Transformed phenotypic wheat data using R in order to read the data in a GWAS software (ex. TASSEL).

Coffee Aroma Project | Spring 2019

  • Analyzed tetraploid coffee data to find potential proteins associated with aroma.

  • Used whole genome sequencing data from Coffea arabica (tetraploid) to find out which chromosomes were descendants of which parent genome.

  • Utilized a Linux environment to run NCBI’s BLAST and Mummer.

Phytopthora Phylogeny Project | Spring 2019

  • Used gene sequencing data of seven genes from over forty species of Phytophthora to investigate the evolutionary relationships of the phyla.

  • Ran RAxML on Geneious to create the phylogenies.

Oral Presentations

Genetics and Genomics Initiative 4th Annual Retreat & Semester Seminar Series | Fall 2021

Abstract: Space is an exciting frontier that presents unique environmental stressors like microgravity and space radiation. It is difficult to study the impact of spaceflight on terrestrial life due to the limited resources of the International Space Station (ISS). NASA created Genelab, a public Omics database for spaceflight relevant data, which gives scientists access to data without needing a new experiment on the ISS. Transcription profiles are among the most common data type available on Genelab, and in the age of Next Generation Sequencing, this commonly means RNA-Sequencing data. A pipeline finding new information out of the public RNA-Seq data would be useful, especially in a case like this where the available data is extremely limited. We developed and utilized a pipeline analyzing Genelab’s RNA-Sequencing data from Arabidopsis thaliana for sequence variants. RNA-Sequencing data is not the preferred method to call variants due to an associated high false discovery rate, however recent studies show it can be done with appropriate precautions. Our pipeline incorporates steps to combat factors leading to RNA Seq’s high false variant discovery rate including 2-pass mapping methods and stringent filters. Our hypothesis is that space’s environment will cause a higher number of variants to be called in the spaceflight A. thaliana samples compared to those on the ground. Preliminary results show A. thaliana samples from space tend to have higher variant counts than those from the ground, showing the damage spaceflight can have at the nucleotide level. Further, this demonstrates that our analysis pipeline can use RNA-Seq to acquire additional information on nucleotide sequence variation from abiotic stressors like microgravity and space radiation. Findings from this research can lead to a better understanding of what future precautions are needed to ensure the safety of those in space.

82nd Meeting of Southern Section of the American Society of Plant Biologists | Spring 2021

Abstract: Space is an exciting frontier that presents unique environmental stressors like microgravity and space radiation. It is difficult to study the impact of spaceflight on terrestrial life due to the limited resources of the International Space Station (ISS). NASA created Genelab, a public Omics database for spaceflight relevant data, which gives scientists access to data without needing a new experiment on the ISS. Transcription profiles are among the most common data type available on Genelab, and in the age of Next Generation Sequencing, this commonly means RNA-Sequencing data. A pipeline finding new information out of the public RNA-Seq data would be useful, especially in a case like this where the available data is extremely limited. We developed and utilized a pipeline analyzing Genelab’s RNA-Sequencing data from Arabidopsis thaliana for sequence variants. RNA-Sequencing data is not the preferred method to call variants due to an associated high false discovery rate, however recent studies show it can be done with appropriate precautions. Our pipeline incorporates steps to combat factors leading to RNA Seq’s high false variant discovery rate including 2-pass mapping methods and stringent filters. Our hypothesis is that space’s environment will cause a higher number of variants to be called in the spaceflight A. thaliana samples compared to those on the ground. Preliminary results show A. thaliana samples from space tend to have higher variant counts than those from the ground, showing the damage spaceflight can have at the nucleotide level. Further, this demonstrates that our analysis pipeline can use RNA-Seq to acquire additional information on nucleotide sequence variation from abiotic stressors like microgravity and space radiation. Findings from this research can lead to a better understanding of what future precautions are needed to ensure the safety of those in space.

NCSU Genomic Sciences Symposium | Spring 2021

Abstract: Space is an exciting frontier, but it presents unique environmental stressors like microgravity and space radiation. It is difficult to study the impact of spaceflight on terrestrial life due to the limited resources of the International Space Station (ISS). NASA created Genelab, a public Omics database for spaceflight relevant data, to give scientists access to data without needing a new experiment on the ISS. Transcription profiles are among the most common data type available on Genelab, and in the age of Next Generation Sequencing this commonly means RNA-Sequencing data. A developed pipeline that could find new information out of the public RNA-Seq data would be useful, especially in a case like this where the available data is extremely limited. This study aims to develop and use a pipeline analyzing Genelab’s RNA-Sequencing data from Arabidopsis thaliana for sequence variants. The hypothesis is that space’s environment will cause more variants to be called in the spaceflight A. thaliana samples than those in the ground control. RNA-Sequencing data is not the preferred method to call variants due an associated high false discovery rate, however recent studies show it can be done with appropriate precautions. The pipeline incorporates steps to combat factors leading to RNA Seq’s high false variant discovery rate including 2-pass mapping methods and stringent filters. Preliminary results show A. thaliana samples from space tend to have higher variant counts than those from the ground. The development of this pipeline would demonstrate RNA-Seq can be analyzed beyond gene expression to see the impact of abiotic stressors like microgravity and space radiation. Further, if the results continue trending towards spaceflight A. thaliana having a higher number of variants then this will show the damage spaceflight can have at the nucleotide level. This may lead to a better understanding of what future precautions may need to be taken for the better safety of those in space.

Poster Presentations

College of Sciences Research Symposium | Fall 2021

Abstract: Cis regulatory elements are key components in regulating gene expression patterns. These elements often come in the form of sequence motifs located within a nearby regulatory region and aid in recruiting transcription factors. In plants, these motifs are necessary to activate everything from photosynthetic machinery to stress response pathways. Linking motifs to a particular gene expression pattern allows for the creation of synthetic promoters that can fine-tune any gene expression response of interest in a number of conditions1. However, identifying motifs is not an easy task. Traditionally, to find essential motifs, regulatory regions would be extracted from genes of interest and examined for k-mer patterns among them. Popular tools may use methods like expectation maximization or greedy algorithms to find the best solution in relatively low computational time2. In this study, a machine learning method is employed to identify motifs linked to genes responding to cold stress. XGBoost, a gradient boosting algorithm, was the machine learning method of choice due to its speed and accuracy3. The model was built using the number of times a motif was present within a gene’s 1kb upstream regulatory region as the predictor variable and the corresponding gene’s differential expression status in cold stress as the response. Individual motif importance was examined using the R package xgboostexplainer4 and top essential motifs potentially linked to gene response in cold stress were identified. The development of this XGBoost model demonstrates a new approach to identify links to gene expression patterns, which can lead to more responsive synthetic promoters in any desired stress condition. 

NCSU Genomic Sciences Symposium | Spring 2020

Abstract: Hands-on science education can have tremendous benefits for early childhood development and influence lifelong scientific interest. According to a recent study, the more time spent studying and doing science in those early years relates to later scientific achievement (Curran & Kitchin). As part of North Carolina State University’s science community, it is up to us to inform and excite our local area about our research. We did this in the Fall of 2019 at Exploris Elementary School. Related to our current research, we taught second and third graders about growing plants in space. We delved into the scientific method, conditions astronauts face in orbit, and what plants need to survive. Students were divided into three groups and grew a variety of plants in different light conditions. Throughout the five-week program, they examined their experiment and recorded their findings while learning about space, plants, and scientific methodology. At the end of the program students created blue-prints for their own lunar growth chamber. They incorporated everything they had learned about what plants would need while in space. For us, the project was a fun way to reach out to the community and show young students the kind of exciting research done at NCSU. For the students of Exploris, the outreach we did planted a seed for scientific interest that they can grow throughout their education.

FGCU Spring Senior Research Symposium | Spring 2018

Abstract: DNA from 86 natural nasal isolates, previously identified as methicillin resistant staphylococci (aureus and non-aureus) by 16srRNA sequencing, were assayed for the presence of the phage encoded Panton-Valentine Leukocidin (PVL) gene. The PVL gene is carried by one of 8 or 9 lysogenic phage known to infect and insert into the SCCmec (Staphylococcus Chromosomal Cassett mec) of methicillin-resistant Staphylococcus aureus. SCCmec is a highly mobile pathogenicity island, that is transferred between different species of staphylococci during promiscuous conjugation. For epidemiological purposes, PVL is thought to be a molecular marker of community-acquired methicillin-resistant Staphylococcus aureus (CAMRS) versus hospital acquired methicillin-resistant Staphylococcus aureus (HA-MRS). This designation may have merit given the increasing prevalence of methicillin-resistant staphylococci (MRS) that are PVL+ but not (yet) implicated in overt disease. In the study we show that 85 out of 86 natural isolates, obtained from a population of healthy carriers, possess the PVL gene. In addition, 20 of the PVL MRS are non-aureus that are not endemic to humans. Thus, this study provides may provide evidence strengthening the link between PVL and CA-MRS.

  • LinkedIn
bottom of page