I will post about my venture into Data Science and Machine Learning here. It mostly includes papers from the domain of Bioinformatics and this is my area of Ph.D and I am upadated with the domain. Also unlike computer vision which has received a lot of attention from the ML community before, Bioinformatics very recently has become a subject of application of different parts of deep learning.
I am a computer science Ph.D student at Stony Brook University. In collaboration with my advisor Rob Patro, I primarily work on two broad areas of computational biology. On one hand, I design ultra fast mapping algorithms and compression schemes for raw sequencing data. On the other hand, I also work on the application of machine learning algorithms to analyse heterogeneous, large-scale public datasets.
I received my masters degree in Computer Science from Indian Statistical Institute (ISI) in 2013. Before that I completed my bachelors (B.Tech) from Kalyani Govt. Engineering College in 2011, in Computer Science. At ISI, I defended my masters thesis in graph theory under the supervision of Prof. Bhargab Bhattacharyya and Prof. Sandip Das. Before joining Stony Brook, I was a researcher for some time at National University of Singapore. I also worked as a research fellow at Indian Institute of Technology (IIT), Kharagpur.
During a brief period I was a visiting researcher at Indian Statistical Institute, at the same time I served as a guest lecturer at computer science department, Vivekananda University. I had the great fortune to briefly work with Swami Sarvottamanandaji, fondly known as Sreesh Maharaj. During those days, I somehow acquired an erdos number of 3, which I am still extremly proud of.
A space and time-efficient index for the compacted colored de Bruijn graph by Fatemeh Almodaresi*, Hirak Sarkar*, Rob Patro. [biorXiv’17]
Towards selective-alignment: Bridging the accuracy gap between alignment-based and alignment-free transcript quantification, by Hirak Sarkar*, Mohsen Zakeri*, Laraib Malik, Rob Patro. Submitted to Bioinformatics, 2017 [biorXiv’17]
Pufferfish: A fast graph-based indexing and query strategy for large genomic sequences} by Fatemeh Almodaresi*, Hirak Sarkar*, and Rob Patro, Poster presented in [WABI’17].
Joint probabilistic model for multiple steps of gene regulation by Hirak Sarkar, Yi-Fei Huang and Adam Siepel, Poster presented in [BioData’16].
(* authors with equal contribution)
I worked on three different problems at Facebook, Menlo Park, ranging from optimizing scalable data pipelines to using transfer learning to satellite images. I gained expertise in large scale query optimization, spatial clustering and deep learning based image classification techniques
Beside working on a cool data structures for pufferfish, de-Bruijn graph based mappers, I got really interested in publicly available big data. With Rob, I am now figuring out this maze of Sequence Read Archive, and trying to find meaningful insights from this exponentially increasing public database. Keep checking for exciting upcoming news. Released [biorXiv].
I collaborated with Prof. Adam Siepel. Learned a lot about probabilistic graphical model. Developed a model for incorporating information from GRO-seq data and RNA-seq data. Such multimodal analysis has not been explored before.
I joined Stony Brook University and recieved presitigious CS Chair Fellowship of 10,000$. I met my supervisor Rob Patro and cleared all 5 qualifier courses :).
Feel free to contact me for more information !