In late February, researchers at the Ontario, Canada-based McMaster University developed a tool that they believe can help determine how the virus that causes COVID-19 is spreading and whether it is evolving. Ultimately, their aim is to share with the international health sciences community.
As explained by the researchers at the time, the tool is a set of molecular “fishing hooks” designed to isolate the virus, SARS-CoV-2, from biological samples. This enables laboratory researchers to gain insight into the properties of the isolated virus by then using technology called next-generation sequencing.
“You wouldn’t use this technology to diagnose the patient, but you could use it to track how the virus evolves over time, how it transmits between people, how well it survives outside the body, and to find answers to other questions,” said principal investigator Andrew McArthur, Ph.D., associate professor of biochemistry and biomedical sciences, and a member of the Michael G. DeGroote Institute for Infectious Disease Research (IIDR) at McMaster. “Our tool, partnered with next-generation sequencing, can help scientists understand, for example, if the virus has evolved between patient A and patient B.”
Specializing in infectious diseases and superbugs, McArthur runs the university’s McArthur Lab. He and his team are fighting the virus through data, and he recently spoke with Managing Editor Rajiv Leventhal about how his lab is using technology to speed up the decision-making process for the global battle against COVID-19, the core lessons learned from his research on the virus so far, his thoughts on vaccine development, and more. Below are excerpts of that interview.
Can you detail how this genotyping tool has progressed from its initial development earlier this year?
There are two sides to this—the laboratory side and the analytical side, and we contributed to both. Our favorite phrase here is, ‘you play to where the puck is going.’ Last summer, before all this started, we started collaborating with the [Toronto-based] Sunnybrook Health Sciences Centre’s infectious disease group, and asked them what their biggest problem was. They said it was viral respiratory infections, and that there isn’t a decent diagnostic tool for about 30 percent of the viruses that infect people, meaning the physician often has to guess. My group specializes in genomic surveillance; in order to capture and sequence the genome of pathogens and then figure out exactly what it’s doing and what it is, we do that on a surveillance computational level, but we also translate that data into workable lab techniques.
We started working with Sunnybrook to develop a “bait capture tool” that can specifically isolate respiratory viruses. We were designing this to capture every known respiratory virus on the planet. We were finishing the design phase and getting ready to conduct trials for Sunnybrook when the SARS-CoV-2 coronavirus came out. We did include coronaviruses [in our capture] because there are seasonal ones, as well as MERS and the original SARS, but when we saw the Wuhan genome, we knew this [capture] wouldn’t contain [that virus]. So we rapidly pivoted and designed a bait capture platform to capture just this virus. We weren’t the only ones; our colleagues at Harbor BioSciences in Michigan, who we often collaborate with, did the same thing almost to the same hour. So we collaborated to make the best possible tool.
With molecular epidemiology, instead of doing a simple lab test to ask if a person is infected (yes/no), which is what you need for front-line epidemiology and contact tracing, molecular tracing involves using the entire genome sequence of the pathogen. So in this case the virus is over 28,000 base pairs long, [meaning] 28,000 data points per infection. And that gives you a huge high resolution to say, OK, I can now connect the strain to this part of Ontario and this patient.
We now know New York City was the first wave, and the bulk of their infections came from travelers from Europe. They [concluded] that by analyzing the genome of the virus. So we worked on building those lab techniques and ended up using those with Sunnybrook, and helped them isolate the first culture of the live virus for the first time in Canada. When you culture a virus, you are now growing it in cells to test drugs and vaccines. What’s critical there is to make sure the virus doesn’t become adapted to the lab, and that it still reflects what it does in human beings. So we constantly sequence the strain growing in the lab to make sure it hasn’t changed at all, and that it still reflects what human beings are undergoing in the communities.
The challenge then becomes that everyone globally is sequencing; Canada is sort of federated, so we have a national program that is run at the provincial level to sequence every positive case in the country. That’s the goal—to sequence 150,000 cases—McMaster will handle about one-third of Ontario patients. So you now have an incredible amount of data, and other countries are doing it, too, so you enter [into] a data analytics problem.
This type of analytics and computer science is what you call NP—you cannot write a mathematical formula to solve it; you have to do it with empirical and heuristic searching of massive amounts of data. When everyone sequences these viruses, it goes to a global repository that uses the gold standard method that takes days. So you might upload data from your community and you [may] wait a week or two until the results are available. It’s designed to work at the global level, so you can see if the strains you are seeing are related to the strains in Netherlands, for example.
We worked with a group at Vector Institute for Artificial Intelligence [based in Toronto], where we introduced the problem that we need faster answers to these questions. We then applied machine learning to this data, and now they have this coronavirus genotyping tool that is running, and that allows us to use the global massive database to rapidly analyze our sequences in context to analyze if the ones I sequenced last night are related to the cases in New York, for example.
The two biggest bottlenecks are how fast our DNA sequencer can run, and then how fast the data can be analyzed. So we are trying to buy faster sequencing gear, and to no one’s surprise, the market is pretty well sold out at the moment. But we are solving higher computation.
[Officials point out that the lab replaced its traditional legacy storage infrastructure with Pure FlashBlade, a file and object storage designed to support highly complex processes. This sped up the time to research and seek cures for superbugs and to sequence genomes for COVID by allowing analysis of select data sets 24X faster, they attest, noting that it allows the McArthur Lab to keep up with DNA sequencing data, generating insights that lead to faster identification of global threats].
From your research in which you isolated infected patients and studied genomic sequences, what have been the core lessons learned?
Our top goal is to have data on how the virus came to our shores and how it spread, but there is a lag in getting that [knowledge]. Iceland and Australia already have that data, and Canada will get it soon. Another goal is to always watch the virus to make sure that it’s not evolving to becoming more dangerous or infective. Some viruses, like influenza, can evolve very quickly, [causing us] to retool the vaccine every year.
We know now from Canada and international sequencing that this coronavirus evolves very slowly. It’s tough to even say there are differences between the strains; they are really similar and evolve so slowly, so the virus is the virus. This is a really good thing since it means it’s unlikely to become more dangerous, with greater virulence. More importantly than that, if we spend millions of dollars to bring a vaccine to the table, it would be heartbreaking to find out that the virus has evolved away from that.
Today, we have high odds that if someone produces a viable vaccine, it will work since this coronavirus will not have evolved. It also means you have a higher shot of a vaccine, if it can induce a big enough immune response—which is still an open question—meaning you probably don’t have to use a different vaccine every year while we try to eliminate this virus. All the sequencing to date supports that this is a stable genome.
We do see what others have reported, that there’s a known mutation in the spike protein that others have associated with higher rates of infectivity. The lab evidence on that is still mixed, but we see this mutation more and more over time. So from a population genetics point of view, it supports the hypothesis that this thing has some sort of advantage in how it infects people. There is no evidence that is has different outcomes, so you don’t get any sicker with it, it just means that if you’re getting exposed to coronavirus in a community, this specific one has a higher chance of getting in your lung cells. The practical implications of that, however, are not different in terms of wearing PPE and washing your hands, etc.
To that end, what’s your perspective on the vaccine development process? Are you confident we will get an effective one, and if so, when might we see that come to market?
There are three parts to vaccine development, with the first being the safety aspect. It has to be a safe vaccine without [major] side effects. We are confident on that; we have a good track record on vaccines not having side effects, with decades of experience. The other side of the coin is that the vaccine has to produce a strong and lasting immune response, and the door is still open on that. We don’t know. There are many options; we can get a coronavirus vaccine that is fantastic—one injection and you’re immune for the rest of your life. We could also get one that’s more seasonal, so you need another one every six months until the virus has been eliminated.
The third piece is if the target is stable. However we build the vaccine, will it target the [virus]? That’s my job as a genomicist, and I am quite confident. So many people have gotten infected in this pandemic, and if the virus were going to mutate in an expected way, we would have seen it by now because it’s gone through so many people. We have sequenced so many strains of this virus and so many patients, and we’re not seeing it drift in the genome space. I remain hopeful that what comes out of this is a vaccine that delivers a powerful immune response.
Herd immunity is a term that’s been thrown around often with this virus. What have you learned about this coronavirus potentially weakening or burning out after infecting a certain number of people?
We know so little about coronaviruses as a family, and they are very diverse. The “burns yourself out” theory kind of assumes that you are getting herd immunity, and the immunity question is an open door. We don’t know if we’re generating a herd immunity, or if we’re generating an immunity that lasts just four weeks, two months, or six months. If we knew that this virus generated a stronger immune response, and it was robust, then the herd immunity model would work, and over time this would become a rarer thing of unexposed populations.
There is no strong data yet of true reinfection, though we have had cases of turning up positive again, but we don’t know for sure if that was a latent, inactive virus that was in that person’s body the whole time, or if he or she was really reinfected. I think it’s safe to assume that this isn’t something that will just go away naturally. And there is nothing artificial about the virus—we interacted it by encroaching upon the ecosystem of its host—but in this case, because of the scale and because it’s so novel, and that it has jumped into a human being where it’s not meant to be, we will probably need therapeutics and vaccines, or what we’re doing now, which is social behavior as our most dominant tool.
What’s next for you and your team?
We have about 35,000 positive samples in Ontario to still get through, and that will require a heck of a lot of computation to get that done. We need to get that baseline done because Canada’s future is deciding when to open borders. And we will get outbreaks. Our focus needs to be rapid response, so if there’s an outbreak in a community, we want to sequence the genome to [analyze] what caused the outbreak.