For many years, medical discoveries and advancements were limited to people who had medical training and access to complicated lab equipment. Nowadays, you don’t necessarily need to have a medical background to create something that can improve or save lives. Instruments like data science and machine learning allow people with different educational backgrounds to analyze medical-related data. With these skills, they can capitalize on their professional perspectives and experience and use them to come up with fresh ideas that can potentially benefit all of us.
Carlos Coral-Gomez is the poster child for this trend. Originally a physicist and seismologist, he began studying cancer after a scare in his family. Relying on his experience in data analysis and optimal control systems research, he came up with an idea for a biomedical tool that could help doctors diagnose and classify diseases like cancer. With the help of TripleTen, he is now immersing himself in data science to make his project become a reality.
Something worth fighting for
Carlos Coral-Gomez has spent his entire career working with data. He was first introduced to data as a college student in the school of physics and mathematics and later as a graduate student of geophysics in Russia. His work with data continued when he returned to Colombia and became a professor and consultant of seismology and seismic risk assessment. Because of the nature of geophysical processes, his job involved working with large amounts of data in order to infer geophysical phenomena characteristics not directly observable, such as earthquake sources, lithosphere structures, etc.
"Geophysics is like a kind of “forensic science” in which data, as evidence, is used in “reverse” to solve an “inverse problem” that tries to understand or deduct the cause knowing its effect. The inverse problem is an amazingly interesting problem of physical sciences which was thoroughly investigated by Russian scientists well before data science became formally established at the end of the 20th century."
Computers by that time were “primitive” compared to those we have today, and researchers had to create computer programs for every single geophysics model they needed to build. “By that time, we had to produce our own computer programs to solve specific mathematical problems. It was time consuming, and it took a lot of effort trying to produce a good program.” “Nowadays,” says Carlos, “you can solve a data-related problem with just a couple of lines of code, but thirty years ago it took weeks or months, writing hundreds or thousands of code lines.”
Carlos could have lived and thrived in Colombia as a physicist and seismologist, but his career suddenly took an unexpected turn when a tragedy unfolded in his family. His two-year old daughter was diagnosed with a rare type of cancer. In search of medical help, Carlos and his family moved to the U.S. At the hospital, Carlos met patients of different ages with distinct types of cancer. He was devastated. His analytical mind, though, spurred into action. He remembers asking himself, “How could a professional physicist help people alleviate the suffering caused by such a malignant disease?”
Connecting the dots
Carlos realized that cancer could be perceived as a control (regulation) process failure. Normally, a cell controls all the processes within and around it. Sometimes, something in this process goes wrong, and the cell can’t divide the way it’s supposed to, causing various diseases. His interests and thoughts started to migrate to the field of genomics — the study of human genes and their interaction with each other and a person's environment. Trying to get to the heart of the matter, he enrolled in a master’s program in biomedical engineering and biotechnology at the University of Massachusetts.
Meanwhile, Carlos worked as an analyst in the field of optimal estimation theory and control of dynamic systems. This is a cross-disciplinary field that includes physics, math, computer science, statistics and decision theory, data processing and optimization, and optimal control theory. “I would say that this field is also a part of machine learning, except that it is concerned with dynamical systems, rather than with statical, ‘non-evolving’ systems,” he says.
Inspired by this concept, he envisioned the genome as a dynamic system. With this idea in mind, Carlos decided to create a prototype of a medical tool that would help doctors to diagnose, classify, and make prognosis of genome-related diseases such as cancer. Essentially, he wanted to create a computational genomic system.
Not being a computer scientist by education, Carlos built the first prototype using languages he knew from his previous experience: matlab, C, C++, and C#. While he managed to create a user-friendly interface, he also wanted to achieve a good data visualization. However, building his own computer programs from scratch for every new task was really exhausting. After realizing he needed another programming language to solve both these issues, he started googling machine learning, and came across TripleTen's Data Science course.
I was looking on the web for some info on machine learning. Suddenly, I saw that TripleTen course, and it caught my attention. When I saw the topics covered in the course, I noticed they included ones I had been interested in for a while, particularly Python programming. I wasn't looking for a course like this one, but {there} it was.
The course’s intensity and composition, combined with the affordable price and knowledge of the Yandex brand, convinced Carlos to give it a try.
Recycling his knowledge
While Carlos studied many programming languages from Fortran to C++, he wasn’t aware of the potential of Python, the language used for machine learning, at first.
“I saw that in the last three or four years Python has developed at a fast speed. There is a lot of work being done in Python, so I think that it’s a good choice, especially for machine learning.” The course gave him some difficulties at the beginning, he says. The curriculum was intense, and he had never studied Python before, unlike many of his course peers who got acquainted with it in school. Moreover, he was used to having things work in certain ways with other programming languages, but it didn’t work out the same way with Python, so it took him a while to adjust. “So, to work out the projects, I spent a lot of time consulting the internet, learning how to use Python commands and properties.” However, Carlos was in no hurry to finish each sprint: he stopped working due to the pandemic, and decided to fully focus on the course and take time to thoroughly understand each new concept.
“I’m motivated because the course is fascinating. I think it's getting more engaging than it was at the beginning because I'm getting more familiar with the programming,” he shares.
What’s more, Carlos had to adjust to working with business-related data, which is the focus of the projects he must complete within the DS course. However, it turned out that science and research-related skills and knowledge could be transferred quite easily into the business domain. “We learn in the first course of physics, which is mechanics, that wherever there is a force, there will always be a reaction.” Of course, it's not a simple formula, he explains, but the concept itself is the same: if you apply a force or action to a business, you obtain a reaction. However, you should keep in mind that the environment is still different: “The thing is that you need to understand how to pre-process the data in order to transform it to a suitable form for a certain algorithm.” Carlos isn’t afraid of the challenges — rather, he’s enjoying the difficulty of the projects. “You know, I like challenges... ever since I was a kid. I think that's a way of living… I like the process of solving something that you don't know the solution to,” he shares.
So, what’s next? Halfway into the course, Carlos stays determined. “Now I already know a little bit about Python and visualization. And now I'm really into the path of machine learning. The next topic will be machine learning in business — that will be exciting,” he says. He plans to update the first version of his prototype with the new skills he’ll acquire. “It is a computer program or ‘toy’ developed to demonstrate the proof of concept of the proposed computational genomic system. The system is conceived as a tool to help physicians with the diagnosis, classification, and prognosis of genome related diseases like cancer, and it is based on the extraction of biomedical knowledge from large amounts of genomic and related data. My educated guess is that such a system might also find applications in other fields of life sciences, biopharma, and biotechnology.”
We’re rooting for you, Carlos! If you would like to explore the world of data science, we can help! Start learning with us and take your career to the next level.