The good data in Madrid continues
The regional authority tweeted out an upbeat graphic. Was it designed to send the wrong message?
- Daily reports from the regional health authority
- Weekly reports from the regional health authority
- More on the daily reports from the regional health authority
- Hospital data since September
import pandas as pd
import re
import matplotlib.pyplot as plt
import plotnine
from plotnine import *
import warnings
warnings.filterwarnings('ignore')
On 3rd October 2020, the official account of the regional authority of Madrid put out a series of tweets under the ending The good data in Madrid continues. Here we will attempt to reproduce the first graphic and statistics and reflect if this communicates an appropriate message.
This has also been covered by El Diario ¿Está Madrid realmente doblegando la curva? Los datos que no muestran los tuits triunfalistas de Ayuso
📉 Continúan los buenos datos en la Comunidad de Madrid.
— Comunidad de Madrid (@ComunidadMadrid) October 3, 2020
Entre todos, derrotaremos al virus. ¡Vamos! 💪 pic.twitter.com/2xfkHELFsG
'La evolución de los casos diagnosticados durante la semana de 21 al 27 de septiembre es un 25% inferior a los de la semana del 14 al 20 de septiembre.'
'The evolution of the cases diagnosed during the week of September 21 to 27 is 25% lower than those of the week of September 14 to 20.'
The graph includes the annotation of -24.6%. The first challenge is to generate this statistic.
def find_statistic(date,a,b):
return print('report {}, total week 1: {}, total week 2: {}, % change: {}'.format(date,sum(a),sum(b),100*(-1+sum(b)/sum(a))))
find_statistic('El Diario', [26781], [20206])
From the daily report from the regional authority for 2 October 2020, the cases diagnosed for these two weeks were
sept_14_20=[2148,5562,5101,4965,6658,1615,1011]
sept_21_27=[1956,4617,4206,3852,4508,1194,770]
find_statistic('2 Oct',sept_14_20,sept_21_27)
Let's have a look at earlier reports and see if we can get closer to -24.6%.
sept_14_20=[2145,5550,5089,4934,6643,1608,1008]
sept_21_27=[1949,4600,4190,3834,4461,1188,768]
find_statistic('1 Oct',sept_14_20,sept_21_27)
sept_14_20=[2139,5550,5072,4917,6633,1601,1001]
sept_21_27=[1941,4580,4152,3818,4325,1109,743]
find_statistic('30 Sept',sept_14_20,sept_21_27)
sept_14_20=[2133,5460,5057,4897,6605,1591,996]
sept_21_27=[1932,4550,4100,3727,3741,980,694]
find_statistic('29 Sept',sept_14_20,sept_21_27)
sept_14_20=[2129,5448,5046,4885,6583,1584,984]
sept_21_27=[1906,4498,3844,3236,3078,769,457]
find_statistic('28 Sept',sept_14_20,sept_21_27)
The variation in the percentage change is due to the series for the daily figures changing as the information is backfilled. The daily reports of cases confirmed by PCR are updated as new cases are notified. They are allocated to the date on which the test was taken.
Casos positivos de Covid-19 confirmados por PCR
La Comunidad de Madrid consolida diariamente la serie de casos confirmados por PCR, asignando a los casos nuevos notificados la fecha en la que se toma la muestra. Se realiza una actualización diaria de la serie de casos que se adjunta.
Looking at the latest date:
sept_14_20=[2161,5591,5123,4963,6696,1622,1027]
sept_21_27=[1985,4659,4248,3880,4572,1219,791]
find_statistic('6 Oct',sept_14_20,sept_21_27)
Based on the regional authorities latest daily figures the % change between the two weeks is -21.4%. Falling from 27_183 positive PCRs to 21_354 positive PCRs in a week.
plt.scatter(['28-09','29-09','30-09','01-10','02-10','05-10'],[-33.28,-26.24,-23.20,-22.19,-22.01,-21.54,])
plt.xlabel('date of report')
plt.ylabel('% change total week on week')
plt.ylim(-35,-20)
# get rid of the frame
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.title('Compare week 14-20 Sept. to 21-27 Sept.');
Table 1 of the weekly epidemiological report for week 39 (provisional data) contains data similar to that shown in the tweet.
Week 36 | Week 37 | Week 38 | Week 39 | |
---|---|---|---|---|
Dates | 31-08 - 06-09 | 07-09 - 13-09 | 14-09 - 20-09 | 21-09 - 27-09 |
Total | 19665 | 24402 | 28685 | 21981 |
Change | 1,24 | 1,18 | 0,77 |
find_statistic('Weekly epidemiological',[28685],[21981])
# https://www.comunidad.madrid/sites/default/files/doc/sanidad/epid/informe_epidemiologico_semanal_covid.pdf
# Week 39 provisional data
Let's look at the weekly data provided by the regional authority for municipalities and districts.
munis_df=pd.read_csv('./fastpages/covid19_tia_muni_y_distritos_s.csv', delimiter=';', encoding='latin')
munis_df.fecha_informe=munis_df.fecha_informe.apply(lambda x: x[5:10])
munis_df.casos_confirmados_totales=munis_df.casos_confirmados_totales.fillna(0).astype('int')
munis_df[['municipio_distrito','fecha_informe','casos_confirmados_totales']].head()
df_casos=munis_df[['municipio_distrito','fecha_informe','casos_confirmados_totales']].pivot(index='municipio_distrito', columns='fecha_informe', values='casos_confirmados_totales')
df_casos=df_casos.fillna(0).astype('int')
df_casos.sort_values('09/29', ascending=False).head(5)
find_statistic('weekly_report', [sum(df_casos.iloc[:,-2]-df_casos.iloc[:,-3])],[sum(df_casos.iloc[:,-1]-df_casos.iloc[:,-2])])
weekly_new_cases=[sum(df_casos.iloc[:,-i]-df_casos.iloc[:,(-i-1)]) for i in range(1,11)][::-1]; weekly_new_cases
weekly_pct_change=[100*(-1+weekly_new_cases[i+1]/weekly_new_cases[i]) for i in range(1,len(weekly_new_cases)-1)];weekly_pct_change
It's not clear to me why working with the weekly municipality data there is only a 4% fall in the last week in the number of cases, far from the 25% we are looking for.
fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.bar(df_casos.columns[-10:],weekly_new_cases)
plt.ylabel('cases')
plt.xlabel('date')
# get rid of the frame
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.title('cases added in previous week\nbased on report of 29/09/2020');
fig, ax = plt.subplots(1,1,figsize=(12,6))
plt.plot(df_casos.columns[-8:],weekly_pct_change[-8:])
plt.ylim(-40,100)
plt.grid(b=True, which='major', axis='y')
plt.xlabel('date')
plt.ylabel('% change')
plt.title('change in number of new cases added weekly');
daily_df=pd.read_excel('./fastpages/CAM_casos_diarios.xlsx', skipfooter=18).rename(columns={'Unnamed: 0':'fecha'})
daily_df.cumul_casos_201006.tail()
# Excel created by hand from the daily pdf reports
fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.bar(daily_df.fecha[:-1],daily_df.casos_201006[:-1])
# get rid of the frame
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.title('cases added daily');
daily_df.fecha=daily_df.fecha.apply(lambda x: str(x)[5:10])
weekly_df=daily_df.iloc[::7]
weekly_df.loc[:,'cases_added']=weekly_df.cumul_casos_201006.diff().fillna(0).astype('int')
weekly_df['percent_change']=100*weekly_df.cases_added.pct_change()
weekly_df.loc[weekly_df.index[:2],'percent_change']=0
weekly_df[['fecha','percent_change']]
fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.bar(weekly_df.fecha[1:-1],weekly_df.cases_added[1:-1])
# get rid of the frame
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.xlabel('date')
plt.ylabel('new cases')
plt.title('cases added weekly');
fig, ax = plt.subplots(1,1,figsize=(12,6))
plt.plot(weekly_df.fecha[2:-1],weekly_df.percent_change[2:-1])
plt.ylim(-40,100)
plt.grid(b=True, which='major', axis='y')
plt.xlabel('date')
plt.ylabel('% change')
plt.title('change in number of new cases added weekly');
Using the daily reports we get close to reproducing the graphic tweeted out. This graphic shows the percentage change between the total number of new cases in one week compared to the previous week. It is a measure of change but in this context is very misleading.
A simple bar chart shows not only the change in the number of cases but also the magnitude of the problem. If cases double in a week from 100 to 200 the change is 100% but you have some possibility of tracing the contacts of 200 people. Once you have 25_000 new cases in a week, even if there are 'only' 25_000 new cases the following week (so 0% change) you have a major problem.
fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.scatter(daily_df.fecha, daily_df.uci)
plt.xlabel('date')
plt.ylabel('number of people in intensive care')
# get rid of the frame
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.xticks(rotation=270)
plt.ylim(0,600)
plt.title('People in Intensive Care');
fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.scatter(daily_df.fecha, daily_df.hospitalizados)
plt.xlabel('date')
# get rid of the frame
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.xticks(rotation=270)
plt.ylim(0,3500)
plt.ylabel('number of people in hospital')
plt.title('People in Hospital');
Going in the right direction!