Hi this is Hirak,
[NEW] I am actively looking for research internships[] in Data Analysis, Applied ML and Bioinformatics, scroll to know more about me.


I am a computer science Ph.D student at Stony Brook University. In collaboration with my advisor Rob Patro, I primarily work on two broad areas of computational biology. On one hand, I design ultra fast mapping algorithms and compression schemes for raw sequencing data. On the other hand, I also work on the application of machine learning algorithms to analyse heterogeneous, large-scale public datasets.


[one page] , [Multi-Page]

Brief Bio

I received my masters degree in Computer Science from Indian Statistical Institute (ISI) in 2013. Before that I completed my bachelors (B.Tech) from Kalyani Govt. Engineering College in 2011, in Computer Science. At ISI, I defended my masters thesis in graph theory under the supervision of Prof. Bhargab Bhattacharyya and Prof. Sandip Das. Before joining Stony Brook, I was a researcher for some time at National University of Singapore. I also worked as a research fellow at Indian Institute of Technology (IIT), Kharagpur.

During a brief period I was a visiting researcher at Indian Statistical Institute, at the same time I served as a guest lecturer at computer science department, Vivekananda University. I had the great fortune to briefly work with Swami Sarvottamanandaji, fondly known as Sreesh Maharaj. During those days, I somehow acquired an erdos number of 3, which I am still extremly proud of.


Papers (pre-prints + journals + conferences)

  • A space and time-efficient index for the compacted colored de Bruijn graph by Fatemeh Almodaresi*, Hirak Sarkar*, Rob Patro. [biorXiv’17]

  • Towards selective-alignment: Bridging the accuracy gap between alignment-based and alignment-free transcript quantification, by Hirak Sarkar*, Mohsen Zakeri*, Laraib Malik, Rob Patro. Submitted to Bioinformatics, 2017 [biorXiv’17]

  • Quark enables semi-reference-based compression of RNA-seq data, by Hirak Sarkar and Rob Patro.[ Bioinformatics’17, biorXiv ]

  • RapMap: A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes, by Avi Srivastava, Hirak Sarkar, Nitish Gupta and Rob Patro. [ ISMB’16, Bioinformatics’16, biorXiv ]

  • Fast, Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes, by Avi Srivastava*, Hirak Sarkar*, Laraib Malik and Robert Patro. [ RECOMB-seq’16, arXiv ]

  • Voronoi Game on Graphs (extended version), by Sayan Bandyapadhyay, Aritra Banik, Sandip Das, Hirak Sarkar. [ TCS’15, WALCOM’13, arXiv ]


  • Pufferfish: A fast graph-based indexing and query strategy for large genomic sequences} by Fatemeh Almodaresi*, Hirak Sarkar*, and Rob Patro, Poster presented in [WABI’17].

  • Joint probabilistic model for multiple steps of gene regulation by Hirak Sarkar, Yi-Fei Huang and Adam Siepel, Poster presented in [BioData’16].

(* authors with equal contribution)


  • June 2017 - Present

    Beside working on a cool data structures for pufferfish, de-Bruijn graph based mappers, I got really interested in publicly available big data. With Rob, I am now figuring out this maze of Sequence Read Archive, and trying to find meaningful insights from this exponentially increasing public database. Keep checking for exciting upcoming news. Released [biorXiv].

  • March 2017 - May 2017

    We worked on a new way for solving the abundance calculation problem for RNA-seq read mapping. Kallisto and RapMap are not as accurate as you think in special cases, such as discussed here. We propose Selective Alignment. Preprinted [biorXiv].

  • Oct 2016 - Feb 2017

    We worked on RNA-seq compression algorithm. Developed a state-of-the-art RNA-seq compression tool, introducing the idea of semi-reference based compression, now published in Bioinformatics now. I also worked on a poster with Prof Siepel, presented in BioData.

  • May 2016 - July 2016

    I collaborated with Prof. Adam Siepel. Learned a lot about probabilistic graphical model. Developed a model for incorporating information from GRO-seq data and RNA-seq data. Such multimodal analysis has not been explored before.

  • May 2016

    Yayy, I passed my Research Proficiency Examination [slides]. Honored to have Prof Mike Schatz(JHU), Prof Adam Siepel(CSHL) and my supervisor Prof. Robert Patro on my committee.

  • May 2015 - May 2016

    I started working with Prof Robert Patro, excited, at the end of almost an year we (with my other labmates) developed two tools, RapMap and RapClust.

  • Aug 2014 - May 2015

    I joined Stony Brook University and recieved presitigious CS Chair Fellowship of 10,000$. I met my supervisor Rob Patro and cleared all 5 qualifier courses :).


Feel free to contact me for more information !