import pandas as pd
import re

import matplotlib.pyplot as plt

import plotnine
from plotnine import *

import warnings
warnings.filterwarnings('ignore')

On 3rd October 2020, the official account of the regional authority of Madrid put out a series of tweets under the ending The good data in Madrid continues. Here we will attempt to reproduce the first graphic and statistics and reflect if this communicates an appropriate message.

This has also been covered by El Diario ¿Está Madrid realmente doblegando la curva? Los datos que no muestran los tuits triunfalistas de Ayuso

📉 Continúan los buenos datos en la Comunidad de Madrid.

Entre todos, derrotaremos al virus. ¡Vamos! 💪 pic.twitter.com/2xfkHELFsG
— Comunidad de Madrid (@ComunidadMadrid) October 3, 2020

'La evolución de los casos diagnosticados durante la semana de 21 al 27 de septiembre es un 25% inferior a los de la semana del 14 al 20 de septiembre.'

'The evolution of the cases diagnosed during the week of September 21 to 27 is 25% lower than those of the week of September 14 to 20.'

The graph includes the annotation of -24.6%. The first challenge is to generate this statistic.

def find_statistic(date,a,b):
  return print('report {}, total week 1: {}, total week 2: {}, % change: {}'.format(date,sum(a),sum(b),100*(-1+sum(b)/sum(a))))

find_statistic('El Diario', [26781], [20206])

report El Diario, total week 1: 26781, total week 2: 20206, % change: -24.550987640491396

Daily reports from the regional health authority

From the daily report from the regional authority for 2 October 2020, the cases diagnosed for these two weeks were

sept_14_20=[2148,5562,5101,4965,6658,1615,1011]
sept_21_27=[1956,4617,4206,3852,4508,1194,770]
find_statistic('2 Oct',sept_14_20,sept_21_27)

report 2 Oct, total week 1: 27060, total week 2: 21103, % change: -22.0140428677014

Let's have a look at earlier reports and see if we can get closer to -24.6%.

sept_14_20=[2145,5550,5089,4934,6643,1608,1008]
sept_21_27=[1949,4600,4190,3834,4461,1188,768]
find_statistic('1 Oct',sept_14_20,sept_21_27)

report 1 Oct, total week 1: 26977, total week 2: 20990, % change: -22.192979204507544

sept_14_20=[2139,5550,5072,4917,6633,1601,1001] 
sept_21_27=[1941,4580,4152,3818,4325,1109,743]
find_statistic('30 Sept',sept_14_20,sept_21_27)

report 30 Sept, total week 1: 26913, total week 2: 20668, % change: -23.20439936090365

sept_14_20=[2133,5460,5057,4897,6605,1591,996]
sept_21_27=[1932,4550,4100,3727,3741,980,694]
find_statistic('29 Sept',sept_14_20,sept_21_27)

report 29 Sept, total week 1: 26739, total week 2: 19724, % change: -26.235087325629237

sept_14_20=[2129,5448,5046,4885,6583,1584,984]
sept_21_27=[1906,4498,3844,3236,3078,769,457]
find_statistic('28 Sept',sept_14_20,sept_21_27)

report 28 Sept, total week 1: 26659, total week 2: 17788, % change: -33.27581679732923

The variation in the percentage change is due to the series for the daily figures changing as the information is backfilled. The daily reports of cases confirmed by PCR are updated as new cases are notified. They are allocated to the date on which the test was taken.

Casos positivos de Covid-19 confirmados por PCR

La Comunidad de Madrid consolida diariamente la serie de casos confirmados por PCR, asignando a los casos nuevos notificados la fecha en la que se toma la muestra. Se realiza una actualización diaria de la serie de casos que se adjunta.

Looking at the latest date:

sept_14_20=[2161,5591,5123,4963,6696,1622,1027]
sept_21_27=[1985,4659,4248,3880,4572,1219,791]
find_statistic('6 Oct',sept_14_20,sept_21_27)

report 6 Oct, total week 1: 27183, total week 2: 21354, % change: -21.443549277121733

Based on the regional authorities latest daily figures the % change between the two weeks is -21.4%. Falling from 27_183 positive PCRs to 21_354 positive PCRs in a week.

plt.scatter(['28-09','29-09','30-09','01-10','02-10','05-10'],[-33.28,-26.24,-23.20,-22.19,-22.01,-21.54,])
plt.xlabel('date of report')
plt.ylabel('% change total week on week')
plt.ylim(-35,-20)
# get rid of the frame
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.title('Compare week 14-20 Sept. to 21-27 Sept.');

Weekly reports from the regional health authority

Table 1 of the weekly epidemiological report for week 39 (provisional data) contains data similar to that shown in the tweet.

	Week 36	Week 37	Week 38	Week 39
Dates	31-08 - 06-09	07-09 - 13-09	14-09 - 20-09	21-09 - 27-09
Total	19665	24402	28685	21981
Change		1,24	1,18	0,77

find_statistic('Weekly epidemiological',[28685],[21981]) 
# https://www.comunidad.madrid/sites/default/files/doc/sanidad/epid/informe_epidemiologico_semanal_covid.pdf
# Week 39 provisional data

report Weekly epidemiological, total week 1: 28685, total week 2: 21981, % change: -23.371099877985014

Let's look at the weekly data provided by the regional authority for municipalities and districts.

munis_df=pd.read_csv('./fastpages/covid19_tia_muni_y_distritos_s.csv', delimiter=';', encoding='latin')
munis_df.fecha_informe=munis_df.fecha_informe.apply(lambda x: x[5:10])
munis_df.casos_confirmados_totales=munis_df.casos_confirmados_totales.fillna(0).astype('int')
munis_df[['municipio_distrito','fecha_informe','casos_confirmados_totales']].head()

df_casos=munis_df[['municipio_distrito','fecha_informe','casos_confirmados_totales']].pivot(index='municipio_distrito', columns='fecha_informe', values='casos_confirmados_totales')
df_casos=df_casos.fillna(0).astype('int')
df_casos.sort_values('09/29', ascending=False).head(5)

find_statistic('weekly_report', [sum(df_casos.iloc[:,-2]-df_casos.iloc[:,-3])],[sum(df_casos.iloc[:,-1]-df_casos.iloc[:,-2])])

report weekly_report, total week 1: 29316, total week 2: 28164, % change: -3.9295947605403225

weekly_new_cases=[sum(df_casos.iloc[:,-i]-df_casos.iloc[:,(-i-1)]) for i in range(1,11)][::-1]; weekly_new_cases

[1311, 2835, 5586, 10549, 14681, 17299, 20086, 24374, 29316, 28164]

weekly_pct_change=[100*(-1+weekly_new_cases[i+1]/weekly_new_cases[i]) for i in range(1,len(weekly_new_cases)-1)];weekly_pct_change

[97.03703703703704,
 88.84711779448622,
 39.169589534553026,
 17.832572713030448,
 16.110757847274403,
 21.34820272826845,
 20.275703618609995,
 -3.9295947605403225]

It's not clear to me why working with the weekly municipality data there is only a 4% fall in the last week in the number of cases, far from the 25% we are looking for.

fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.bar(df_casos.columns[-10:],weekly_new_cases)
plt.ylabel('cases')
plt.xlabel('date')
# get rid of the frame
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.title('cases added in previous week\nbased on report of 29/09/2020');

fig, ax = plt.subplots(1,1,figsize=(12,6))
plt.plot(df_casos.columns[-8:],weekly_pct_change[-8:])
plt.ylim(-40,100)
plt.grid(b=True, which='major', axis='y')
plt.xlabel('date')
plt.ylabel('% change')
plt.title('change in number of new cases added weekly');

More on the daily reports from the regional health authority

daily_df=pd.read_excel('./fastpages/CAM_casos_diarios.xlsx', skipfooter=18).rename(columns={'Unnamed: 0':'fecha'})
daily_df.cumul_casos_201006.tail()
# Excel created by hand from the daily pdf reports

80    257145.0
81    257849.0
82    258102.0
83    259322.0
84         NaN
Name: cumul_casos_201006, dtype: float64

fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.bar(daily_df.fecha[:-1],daily_df.casos_201006[:-1])
# get rid of the frame
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.title('cases added daily');

daily_df.fecha=daily_df.fecha.apply(lambda x: str(x)[5:10])
weekly_df=daily_df.iloc[::7]
weekly_df.loc[:,'cases_added']=weekly_df.cumul_casos_201006.diff().fillna(0).astype('int')
weekly_df['percent_change']=100*weekly_df.cases_added.pct_change()
weekly_df.loc[weekly_df.index[:2],'percent_change']=0
weekly_df[['fecha','percent_change']]

fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.bar(weekly_df.fecha[1:-1],weekly_df.cases_added[1:-1])
# get rid of the frame
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.xlabel('date')
plt.ylabel('new cases')
plt.title('cases added weekly');

fig, ax = plt.subplots(1,1,figsize=(12,6))
plt.plot(weekly_df.fecha[2:-1],weekly_df.percent_change[2:-1])
plt.ylim(-40,100)
plt.grid(b=True, which='major', axis='y')
plt.xlabel('date')
plt.ylabel('% change')
plt.title('change in number of new cases added weekly');

Using the daily reports we get close to reproducing the graphic tweeted out. This graphic shows the percentage change between the total number of new cases in one week compared to the previous week. It is a measure of change but in this context is very misleading.

A simple bar chart shows not only the change in the number of cases but also the magnitude of the problem. If cases double in a week from 100 to 200 the change is 100% but you have some possibility of tracing the contacts of 200 people. Once you have 25_000 new cases in a week, even if there are 'only' 25_000 new cases the following week (so 0% change) you have a major problem.

Hospital data since September

With the number of new daily cases there should be a fall in the number of people hospitalized, although with a time lag. Is this already a pattern?

fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.scatter(daily_df.fecha, daily_df.uci)
plt.xlabel('date')
plt.ylabel('number of people in intensive care')
# get rid of the frame
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.xticks(rotation=270)
plt.ylim(0,600)
plt.title('People in Intensive Care');

fig, ax = plt.subplots(1,1,figsize=(10,6))
plt.scatter(daily_df.fecha, daily_df.hospitalizados)
plt.xlabel('date')
# get rid of the frame
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.xticks(rotation=270)
plt.ylim(0,3500)
plt.ylabel('number of people in hospital')
plt.title('People in Hospital');

Going in the right direction!

fecha_informe	05/26	06/02	06/09	06/16	06/23	06/30	07/07	07/14	07/21	07/28	08/04	08/11	08/18	08/25	09/01	09/08	09/15	09/22	09/29
municipio_distrito
Madrid-Puente de Vallecas	3150	3199	3242	3279	3328	3344	3382	3424	3454	3523	3668	3997	4847	6190	7575	9033	10834	12628	14397
Madrid-Carabanchel	2798	2848	2883	2910	2943	2959	2991	3039	3073	3124	3293	3689	4442	5272	6289	7570	8840	10356	11656
Madrid-Ciudad Lineal	2315	2339	2373	2394	2406	2436	2480	2514	2542	2573	2663	2871	3233	3813	4559	5345	6569	7702	8527
Madrid-Latina	2890	2946	2982	3018	3055	3086	3104	3129	3166	3233	3344	3585	3995	4351	4777	5572	6189	7300	8202
Fuenlabrada	1626	1645	1659	1682	1699	1716	1733	1744	1783	1841	1957	2171	2663	3168	3960	4580	5608	6651	7703

	fecha	percent_change
0	07-14	0.000000
7	07-21	0.000000
14	07-28	66.260658
21	08-04	88.095238
28	08-11	64.323272
35	08-18	56.399621
42	08-25	31.090399
49	09-01	3.849711
56	09-08	13.163754
63	09-15	25.689833
70	09-22	2.038820
77	09-29	-23.604986
84	10-06	-100.000000

	municipio_distrito	fecha_informe	casos_confirmados_totales
0	Madrid-Retiro	09/29	3940
1	Madrid-Salamanca	09/29	4770
2	Madrid-Centro	09/29	4820
3	Madrid-Arganzuela	09/29	5293
4	Madrid-Chamartín	09/29	4543