Inside Phosphorus: Meet Roman Shraga, Data Scientist!

Hey Roman, tell us a little bit about yourself!
My family immigrated to Brooklyn from Uzbekistan when I was two years old and I grew up here in New York City. I majored in Computer Science at Brown University and my first job out of college was at Microsoft, where I was on a team that focused on building machine learning models to enhance the experience on Bing, Microsoft’s search engine. After that I worked for two years at a company called PlaceIQ, where I built analytics pipelines to process and analyze terabytes of data. Now I am working with geneticists, molecular biologists, and engineers to use my data science skills to unlock the full power of genomics.
What is your title at Phosphorus? What do you do for the company?
My title is Data Scientist and my role involves using machine learning and statistics to derive value from data. As a genomics company, we have many sources of data: raw genetic information from our lab, clinical data shared with us by our research partners, and all kinds of usage and logistics data from our applications. It is my job to use this data to answer questions and improve our products.
What do you enjoy most about your job?
I like that the work we do has a tangible impact on people’s lives. Whether we develop a new assay, improve something about our process, or push the envelope with research, our work makes a difference. This kind of environment fosters learning, encourages collaboration, and makes everyone want to overachieve. It is really a privilege to go to work each day with a group of people who share these values.
What have you learned from working at Phosphorus?
I’ve learned so much that it is hard to keep this answer brief! I have learned about the complexity and beauty of human biology, about the healthcare ecosystem, about what it takes to run a clinical laboratory, about bioinformatics and working with genomics data, and about the power of modern software to improve the functioning of complex processes.
What has been your favorite project so far?
My favorite project has been building tools to detect copy number variants using next generation sequencing data. Copy number variants are an important type of structural variation in genes that are known to cause disease in certain cases. Detecting these accurately is an important capability for clinical genetics, but doing so is challenging for a variety of reasons. I have been working on developing normalization techniques and algorithms to uncover these variants.
Do you have a personal mantra?
In terms of data modeling, I think complexity has to be earned. That is I believe you should choose the simplest model possible and prove that any additional complexity you introduce has a measurable benefit in terms of performance. In my field people are too often excited to use a shiny new tool, even if it adds no benefit to a simpler, more understandable, and easier-to-maintain solution.
Tell us something about you most people don’t know.
I am an avid watcher of cooking and home renovation shows. Top Chef and House Hunters are my favorites.
What is one thing you are looking forward to in your free time this next year?
In the beginning of December, I’m getting a puppy!
Do you have any hobbies?
I like discovering new and interesting places. One of the things I love about living in New York is that there is always a new restaurant to try, an interesting event to attend, or an undiscovered neighborhood to stroll around. I try to experience something new as often as I can.
Which ice cream flavor best describes you?
Mint chocolate chip.