How Data Science and Machine Learning are helping to fight Covid
In this article, we discuss different ways in which data science can help in preventing Covid deaths and infections, along with algorithms used and datasets available.
High level overview of different ways in which data science can help in the Coronavirus pandemic.
Covid is a world wide problem. Unlike previous pandemics like the plague, Ebola, SARS, we are now blessed with well developed data collection and data management tools and techniques, as well as computing resources and algorithms to understand and ultimately fight the spread of the virus.
Estimating the mortality of people infected with Covid, risk factors, modelling the growth of the disease, developing the vaccines and sequencing the gene, all this involve data and hence data science can help in this. So unlike previous epidemics, controlling Covid is as much a data war as a medical war.
Current large scale Covid related projects running in universities, governments, and companies
- US has a Covid tracking project that collated all the data in one place (https://covidtracking.com/) to track the growth as a whole.
- European Union has a number of projects (https://ellis.eu/covid-19/projects).
- Indian universities like IISc had a number of Covid related projects (https://covid19.iisc.ac.in/)
- World bank has projects ( https://www.worldbank.org/en/who-we-are/news/coronavirus-covid19 )related to affect of Covid on the economy, such as projections on how much demand will be there for goods and services in economies affected by Covid.
- IEEE has funding for grassroots projects related to Covid (https://hac.ieee.org/funding-opportunities/covid-19-projects/)
Some of the available projects might have funding available but might be constrained by country or region. For example US based projects might be limited to US researchers and so on.
Use of data science in contact tracing of Covid infected persons
Contact tracing is critical in controlling the spread of a pandemic like Covid by knowing whom each infected person had come in contact with within the last few days, and then quarantining them in return so that the disease may not spread further.
Many governments have introduced contact tracing apps. Some examples are:
- Singapore’s SQREEM Covid Contact tracing app used machine learning models to model and predict how many people might have come in contact with a given person over a given time. It works this way: given the person’s home and office locations and the number of devices that have entered the positions (in 5 square meter blocks) in a given time, it predicts the number of people likely to be in contact with the infected persons. https://www.aithority.com/technology/analytics/tracing-a-million-steps-sqreem-launches-ai-driven-contact-tracing-and-communications-platform-to-fight-covid-19/
- UK’s NHS Covid 19 App https://www.nhsx.nhs.uk/covid-19-response/nhs-covid-19-app/).
- India’s Aarogya Setu app
Models using data visualization and analytics tools can be used, along with contact tracing apps, to track the spread of the virus. Graph models can be used to model people’s connections and get an estimate of how fast a disease is likely to spread in an area.
Use of data science to track the spread of Covid, and deciding which areas to lock down and which areas to open
MIT researchers trained a neural network to predict how much effect quarantine has on controlling the spread of Covid. They used epidemiological models (used to analyze epidemic spreads) including the SEIR model and SIR model (susceptible infected recovered) of differential equations, added some parameters for quarantine control and fitted the overall model to real data using a neural network. Their paper is here : https://www.medrxiv.org/content/10.1101/2020.04.03.20052084v1
One can read more about such models from this Wikipedia article: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology
A good introduction to the SIR model is here https://science.thewire.in/the-sciences/coronavirus-pandemic-infectious-disease-transmission-modelling-kermack-mckendrick-theory-seir-model/
Use of data science to form projections of how much the Covid curve will rise, how long it will take to fall and how it can be flattened.
The same epidemic modeling algorithms (like SIR and SEIR) we discussed earlier can be used to form projections of the Covid curve for a particular country or region. Such a curve would predict when it will be flattened and how much would it rise. Using this data, governments might be able to make decisions about when and how to open up the wider economy.
Using a data driven model, one company released projections for all of the US states and other countries, for the Covid rising graph of infections and deaths. This was also based on the SEIR model to simulate the epidemic. https://covid19-projections.com/
Use of data science in predicting how much a person is vulnerable to the Covid, whether they will need hospitalisation
The infection risk and severity risk of a person contacting Covid can be predicted using various features such as age, gender, social conditions, income, area of living, occupation, lifestyle related features etc. A machine learning model may be fitted with all these features using existing patient data, and used to predict the risk. Using this knowledge, doctors may be able to better decide who needs immediate hospitalisation, who might need a ventilator and who might be ok with just home quarantine.
There is a study in the British medical journal Lancet on the risk factors for Covid using available data from Covid patients: https://www.thelancet.com/journals/lanonc/article/PIIS1470-2045(20)30309-0/fulltext
Available Covid specific data sets
Some of the available datasets are as follows:
2. https://data.world/datasets/covid-19
3. Kaggle competitions https://www.kaggle.com/tags/covid19
4. China Daily cases dataset https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/MR5IJN
5. Covid global hackathon https://covid-global-hackathon.devpost.com/
6. Public datasets from Google https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&filter=category:covid19&pli=1
7. Compendium of datasets https://www.marktechpost.com/2020/04/12/list-of-covid-19-resources-for-machine-learning-and-data-science-research/
How data and knowledge is shared between different countries to fight the pandemic
Data science has also enabled efficient data sharing between different countries. Since Covid is a worldwide phenomenon, data sharing is absolutely critical in combating the virus. By studying the conditions in other countries and how it spread there, policy makers can help prevent Covid in their own countries. For example they can determine that a particular area, or particular patients, are more susceptible to Covid. Accordingly they can create policies and make medical resources more available to those areas or those groups of patients.
For example, the China government has made the datasets of the Wuhan Covid patients online. So have most European counties and USA about their respective cases.
Data shared by Italy related to Covid cases: https://data.humdata.org/dataset/covid-19-mobility-italy
This link gives the datasets specific to each country: https://lionbridge.ai/datasets/coronavirus-datasets-from-every-country/
Use of data science in development of vaccines and medicines for Covid
There are multiple ways in which data science can be used in medicine to develop a treatment or vaccine for Covid.
Covid is a fast mutating virus, so it is important to identify the mutations of the virus quickly. Using machine learning (mainly decision trees) to find the distance between sequences, European researchers identified the genomic signature for Covid DNA. This is their paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0232391
Another paper describes research done for classifying novel pathogens on the basis of similarity to the Covid gene sequence: https://pubmed.ncbi.nlm.nih.gov/32330208/ So one can quickly identify if a new virus is a variant of Covid or just another flu.
Google’s Deepmind has used machine learning to predict the structures of the proteins in the Covid 19 virus https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19. This knowledge would be essential in developing medicines and vaccines.
Similarly, research is ongoing to identify the antibodies in the bloodstream of patients etc. This will similarly help in the development of the Covid vaccines.
EVQLV is a US based biotech startup company that uses machine learning techniques to generate millions of therapeutic antibodies to fight Covid.
Research on using machine learning for recognizing patterns from the Covid gene in order to develop vaccines: https://www.brookings.edu/techstream/can-artificial-intelligence-help-us-design-vaccines/
When a new vaccine or medicine is developed, testing on patients using randomized control trials, and using it to validate how much the medicine is effective, also uses standard data science techniques.
Use of data science in future to prevent occurrence of epidemics and pandemics
Analysis of tweets by geography (using clustering and other techniques) and in real time can help quickly predict an emerging epidemic or other natural disaster. Use of NLP (natural language processing) and speech processing tools on local news reports can be similarly used to flag epidemics in real time before they become too large. Social networks can be used to predict the spread as well. Analysis of mathematical models like SEIR, as we discussed above, can model the growth of epidemics generally.
Limitations of using data science to fight Covid
Having listed different ways data science can be useful, it is worthwhile to remember that there are some limitations, due to which over reliance on data science is not useful. Any data science model is only as good as the training data. So it is quite possible for some models to not work sometimes, if the data it is based on is not accurate or not applicable or if there are some errors in the model. Also, there are other factors involved when fighting Covid: a machine learning model can predict that the number of cases will rise to a certain number at a certain time, but the government has to make enough hospital beds and ventilators available by that time. The model may predict that this area is good to lock down, but the government has to make sure the people have enough food and other provisions within that area and that the lock down is enforced properly. A contact tracing app may be very accurate, but it depends on the behavior of people to follow proper social distancing measures. So data science alone cannot control everything or solve the Covid problem by itself. It needs help and cooperation from the government and wider society for it to work.