Bioinformatics @ NYGC

Our team of bioinformaticians aims to develop, maintain and improve our analysis pipelines by leveraging the large amounts of sequencing data we produce. We work on estimating the sources of errors and variability in the data, defining methods to correct them, both computationally and on the lab side. We are also continually evaluating and benchmarking available tools, refining best practices to analyze and combine results, and are developing novel tools and methods.

We are also supporting our CLEP lab by providing the expertise in clinical interpretation of constitutional and cancer genomics.


processOur diverse team of bioinformatics scientists has expertise in:

  • Statistical and population genetics
  • Cancer genomics
  • Expression analysis
  • Epigenomics and functional genomics
  • de novo genome assembly
  • Metagenomics
  • Clinical interpretation


A typical project is initiated with one of the sequencing project managers. Our bioinformatics scientists are consulted to further refine the experimental design, analytic plan, and project deliverables.

The bioinformatics team performs standard and project-specific quality control, and analysis of sequencing data (e.g., differential expression and functional enrichment for RNA-Seq, variant annotation and interpretation for genome and exome sequencing, and somatic variant—both SNV and structural variant—for cancer). Results are delivered via our web interface or APIs and are stored and accessible for a period of time as part of NYGC’s Integrated Genomics.

Clinical Interpretation

As exome and genome sequencing data are processed and genomic variation between the sample and a reference are defined, annotated, and compared to existing databases, our bioinformatics scientists contribute to the last step of the analysis: clinical interpretation.

This usually requires ranking and filtering of putative candidates, manual curation, and functional validation (when possible) of our findings. NYGC’s analysis alleviates the need for the investigator to perform the standard computationally intensive analysis steps, thus freeing up time to focus on the biology.

Michael Zody

Scientific Director, Computational Biology

Nicolas Robine

Director, Computational Biology

Giuseppe Narzisi

Associate Director, Computational Biology

Will Liao

Lead Bioinformatics Scientist

Anna Basile

Senior Bioinformatics Scientist

Marta Byrska-Bishop

Senior Bioinformatics Scientist

Andre Corvelo

Lead Bioinformatics Scientist

Rui Fu

Senior Bioinformatics Scientist

Uday Evani

Senior Data Scientist

Wayne Clarke

Bioinformatics Scientist

Will Hooper

Bioinformatics Scientist

Tim Chu

Senior Bioinformatics Programmer

Rajeeva Musunuri

Bioinformatics Data Scientist

Zalman Vaksman

Senior Bioinformatics Scientist

Heather Geiger

Bioinformatics Scientist

Jennifer Shelton

Senior Bioinformatics Programmer

Brian Zhu

Bioinformatics Programmer

Kshithija Nagulapalli

Senior Bioinformatics Analyst

Zoe Goldstein

Bioinformatics Analyst

Anand Kumaraguru

Senior Associate Scientist, Analytics

Jim Roche

Bioinformatics Programmer

Ali Oku

Bioinformatics analyst

Evolution of structural rearrangements in prostate cancer intracranial metastases

Intracranial metastases in prostate cancer are uncommon but clinically aggressive. A detailed molecular characterization of prostate cancer intracranial metastases would improve our understanding of their pathogenesis and the search for new treatment strategies. We evaluated the clinical and molecular characteristics...

Authors:  Nicolas Robine   Will Hooper   Tim Chu  

Molecular and Clinical Epidemiology of SARS-CoV-2 Infection among Vaccinated and Unvaccinated Individuals in a Large Healthcare Organization from New Jersey

New Jersey was among the first states impacted by the COVID-19 pandemic, with one of the highest overall death rates in the nation. Nevertheless, relatively few reports have been published focusing specifically on New Jersey. Here we report on molecular,...

Authors:  Michael Zody   Andre Corvelo  

Polyclonal lymphoid expansion drives paraneoplastic autoimmunity in neuroblastoma

Neuroblastoma is a lethal childhood solid tumor of developing peripheral nerves. Two percent of children with neuroblastoma develop opsoclonus myoclonus ataxia syndrome (OMAS), a paraneoplastic disease characterized by cerebellar and brainstem-directed autoimmunity but typically with outstanding cancer-related outcomes. We compared...

Authors:  Zalman Vaksman  

DETECT: Feature extraction method for disease trajectory modeling in electronic health records

Modeling with longitudinal electronic health record (EHR) data proves challenging given the high dimensionality, redundancy, and noise captured in EHR. In order to improve precision medicine strategies and identify predictors of disease risk in advance, evaluating meaningful patient disease trajectories...

Authors:  Marta Byrska-Bishop  

Unexpected frequency of the pathogenic AR CAG repeat expansion in the general population

CAG repeat expansions in exon 1 of the AR gene on the X chromosome cause spinal and bulbar muscular atrophy, a male-specific progressive neuromuscular disorder associated with a variety of extra-neurological symptoms. The disease has a reported male prevalence of...

Authors:  Giuseppe Narzisi  

Genetic Predisposition to Neurological Complications in Patients with COVID-19

Several studies have identified rare and common genetic variants associated with severe COVID-19, but no study has reported genetic determinants as predisposition factors for neurological complications. In this report, we identified rare/unique structural variants (SVs) implicated in neurological functions in...

Authors:  Michael Zody  


Lancet is a somatic variant caller (SNVs and indels) for short read data. Lancet uses a localized micro-assembly strategy to detect somatic mutation with high sensitivity and accuracy on a tumor/normal pair. Lancet is based on the colored de Bruijn...

Contributors:  Michael Zody   Nicolas Robine   Giuseppe Narzisi   Andre Corvelo   Rajeeva Musunuri  


nygc-short-alignment-marking is a tool to mark short alignments in a bam file. It parses the bam and marks as unmapped a read with alignment length below a user-defined threshold. Reads are not filtered from the bam file but kept as...

Contributors:  Andre Corvelo  


Conpair is a fast and robust method dedicated for human tumor-normal studies to perform concordance verification (i.e. samples coming from the same individual), as well as cross-individual contamination level estimation in whole-genome and whole-exome sequencing experiments. Importantly, our method of...

Contributors:  Michael Zody  


taxMaps is an ultra-efficient, customizable and fully scalable taxonomic classification tool for short-read data designed to deal with large DNA/RNA metagenomics samples. Its performance and comprehensiveness makes it highly suitable for unbiased contamination detection in large-scale sequencing operations, microbiome studies...

Contributors:  Michael Zody   Nicolas Robine   Andre Corvelo   Wayne Clarke  


SCANVIS is a set of tools for SCoring, ANnotating and VISualizing splice junctions using gencode annotation. It scores splice junctions by using a Relative Read Support (RRS) measure that relates the reads supporting a query junction to reads supporting nearby annotated splice junctions....

SCANVIS visualization

Contributors:  Nicolas Robine   Heather Geiger  

NYGC Cancer Pipeline

NYGC's cancer pipeline identifies somatic and germline variants from whole genome sequencing (WGS), whole exome sequencing (WES) or targeted panel tumor and normal data. The pipeline can be run on sequencing data from human, mouse and patient-derived xenograft (PDX) models. Additionally,...

NYGC Somatic WGS v6 pipeline

Contributors:  Nicolas Robine   Tim Chu   Jennifer Shelton