SPAIN-AI 2020 Hackathon Reto Series Temporales
Cómo No Ganar Un Reto
- Workshop Hackathon Spain AI 2020
- Reto: Lo que tienía que hacer
- Realidad: Lo que hice
- Relato: Lo que me debería haber hecho
- Reflexiones
If ifs and ands were pots and pans...
https://www.spain-ai.com/hackathon2020_reto_Series_Temporales.php
https://competitions.codalab.org/competitions/28630
This blog post details the code I used in this competition. It is provided here as a support to the SPAIN-AI presentation scheduled for 15 May 2021. Both the presentation and this post are in Spanish, since SPAIN-AI serves a Spanish speaking audience.
Esta blog detalla el código que utilicé en este concurso. Se ofrece aquí como apoyo a la presentación de SPAIN-AI prevista para el 15 de mayo de 2021. Tanto la presentación como este blog están en español, ya que SPAIN-AI atiende a un público hispanoparlante.
Workshop Hackathon Spain AI 2020
https://www.linkedin.com/feed/update/urn:li:activity:6798658932182142976/
Imports
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
import warnings
warnings.simplefilter("ignore", UserWarning)
pd.set_option('display.max_rows', 100)
Entender el reto
Crea una cartera de activos (darwins), maximizando el ratio de Sharpe:
- escoje 18
darwinsde los 96 que están en los datos de entrenamiento - asigna una parte de la inversión total a cada uno de los 18
darwins - ajusta la asignación a lo largo del tiempo, cada hora desde el 18 agosto 2020 hasta el 24 diciembre 2020, los 2229 horas que están en el fichero
submission.csv - calcula el ratio de Sharpe
No es un problema de predicción (todos los datos son disponible el darwinex.com) pero de creación de una cartera diversificada de activos no correlacionados entre sí, que tienen rendimientos estables.
La métrica a maximizar
$$S = \frac{E[R_a-R_b]}{\sigma_a}$$
$R_a$ es el rendimiento de la cartera
$R_b$ es el rendimiento de una inversión de referencia
$\sigma_a$ es la desviación estándar (volatilidad) del exceso de rendimiento de la inversión
Asumiendo que $R_b$ fuera constante, hay que maximizar el rendimiento dividido por la volatilidad, así que buscamos rendimientos estables.
Necesita % inversión por cada hora
xs = [2,6,10,14,18,22,25,35,40,43,45,50,54,57,59,66,70,77]
ys = np.array([0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055])*100
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot()
ax.scatter(xs,ys, color='#ffe559')
ax.set_xlabel('Darwin')
ax.set_ylabel('% inversión')
annotations=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
for i, label in enumerate(annotations):
plt.annotate(label, (xs[i], ys[i]))
plt.xticks([])
plt.tight_layout()
plt.savefig("darwins.png")
plt.show()
candles=pd.DataFrame(columns=['date','close','max','min','open','std_dev','score','darwin'])
filenames = [x for x in os.listdir('./data/TrainCandles')]
for filename in filenames:
df=pd.read_csv('./data/TrainCandles/'+filename).rename(columns={'Unnamed: 0':'date'})
df.date = pd.to_datetime(df.date)
df['std_dev']=df.std(axis=1)
df['score']=(round((df.close-df.open)/df.std_dev,6)).fillna(0)
df['darwin']=filename[-13:-10]
candles=candles.append(df)
candles.head()
plt.hist(candles.score, color='#FFE599')
plt.title('Count of hourly scores');
Identifica qué darwin tenía el máximo score en cada hora.
Conta cuánto veces cada darwin tiene el máximo score.
Investiga esos darwins primero.
lst=[candles[(candles['date']==hour) & (candles.score==candles[candles['date']==hour].score.max())].darwin.to_list()[0] for hour in sorted(candles['date'].unique())]
count_dict=Counter(lst)
df=pd.DataFrame.from_dict(count_dict, orient='index').sort_values(0, ascending=False)
darwins_lst=df.index.to_list()
darwins_lst=darwins_lst+['MMY','TMF'] # the 2 darwins that never had max score
df.T
def plot_historic_dars(dars):
rows, cols=2, int(.5+len(dars)/2)
fig, ax = plt.subplots(nrows = rows, ncols = cols, figsize=(20, 8))
for i in range(rows):
for j in range(cols):
darwin=dars[cols*i+j]
ax[i][j].plot(candles[candles.darwin==darwin].date.apply(lambda x: x.date()),candles[candles.darwin==darwin]['open']-candles[candles.darwin==darwin]['open'][0], color='#ffd966')
ax[i][j].set_title(darwin, color='#ffd966', loc='center', y=0.9)
ax[i][j].set_ylim(-50,100)
ax[i][j].set_frame_on(False)
ax[i][j].set_xlabel('Periodo de Entrenamiento')
ax[i][j].set_xticklabels([])
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
plot_historic_dars(df.index[:6])
Descarga datos de los darwins con más horas de máximo score
Desde agosto 2020 hasta diciembre 2020 desde darwinex.com, utilizando ftp
https://github.com/darwinex/darwinexapis/blob/master/darwinexapis/API/DarwinDataAnalyticsAPI/DWX_Data_Analytics_API.py
top=['BSX','FNM','ZTY','CBY','NYD','TKT','BFS','NWO','HZY','NVL','YEC','TER','PUL','VRT','NCT','MUF','FSK','PEW','LEN','LUG','PHI','BGN','TXR','UYZ','MET','REU','UEI','ZVQ',
'ZCD','SYO','BZC','XRX','ULT','HQU','WWT','CIS','TRO','FFV','MCA','AWW','GGR','AZG','GFJ','LWK','VVC','WFJ','OJG','OOS','SRI','LWE','HEO','RAT','TDD','ZXW','OXR','ACY',
'GRI','HCC','PPT','FIR','ULI','ZAB','ZUJ','SKN','EEY','SKI','SEH','NSC','SHC','EOP','WXN','LHB','SBY','IDT','RWJ','JTL']
Imports and server
from ftplib import FTP
from tqdm import tqdm
from io import BytesIO
import gzip
FTP_CRED = {'username': USERNAME,
'password': PASSWORD,
'server': "darwindata.darwinex.com",
'port': 21}
dwx_ftp_hostname=FTP_CRED['server']
dwx_ftp_user=FTP_CRED['username']
dwx_ftp_pass=FTP_CRED['password']
server = FTP(dwx_ftp_hostname)
server.login(dwx_ftp_user, dwx_ftp_pass)
{DARWIN_TICKER}.{PRODUCTRISK}.{COLOUR}{PRODUCTID}_YYYY-MM-DD.HH.csv.gz 'former_var10'
year='2020'
darwins_lst_dld=['FNM']
for darwin in darwins_lst_dld:
print(darwin)
for month in ['08','09','10','11','12']:
quote_files = []
server.retrlines(f'NLST {darwin}/_{darwin}_former_var10/quotes/{year}-{month}/', quote_files.append)
quote_files = [f'{darwin}/_{darwin}_former_var10/quotes/{year}-{month}/{quote_file}' for quote_file in quote_files]
# Process tick data files
tqdm.write(f'\n[KERNEL] {len(quote_files)} files retrieved.. post-processing now, please wait..', end='')
ticks_df = pd.DataFrame()
ticks_pbar = tqdm(quote_files, position=0, leave=True)
for tick_file in ticks_pbar:
# Clear / reinitialize buffer
retbuf = BytesIO()
server.retrbinary(f"RETR {tick_file}", retbuf.write)
retbuf.seek(0)
# Extract data from BytesIO object
ret = [line.strip().decode().split(',') for line in gzip.open(retbuf)]
ticks_df = pd.concat([ticks_df, pd.DataFrame(ret[1:])], axis=0)
# Clean up
ticks_df.columns = ['timestamp','quote']
ticks_df.timestamp = ticks_df.timestamp.apply(pd.to_numeric)
ticks_df.set_index('timestamp', drop=True, inplace=True)
ticks_df.index = pd.to_datetime(ticks_df.index, unit='ms')
ticks_df.quote = ticks_df.quote.apply(pd.to_numeric)
ticks_df.dropna()
fn='quotes/'+darwin+'_'+year+'_'+month+'_quotes.csv'
ticks_df.to_csv('./data/'+fn)
new
darwins_lst_dld=['PPT']
for darwin in darwins_lst_dld: #to do
print(darwin)
for month in ['08','09','10','11','12']:
quote_files = []
server.retrlines(f'NLST {darwin}/quotes/{year}-{month}/', quote_files.append)
quote_files = [f'{darwin}/quotes/{year}-{month}/{quote_file}' for quote_file in quote_files]
# Process tick data files
tqdm.write(f'\n[KERNEL] {len(quote_files)} files retrieved.. post-processing now, please wait..', end='')
ticks_df = pd.DataFrame()
ticks_pbar = tqdm(quote_files, position=0, leave=True)
for tick_file in ticks_pbar:
# Clear / reinitialize buffer
retbuf = BytesIO()
server.retrbinary(f"RETR {tick_file}", retbuf.write)
retbuf.seek(0)
# Extract data from BytesIO object
ret = [line.strip().decode().split(',') for line in gzip.open(retbuf)]
ticks_df = pd.concat([ticks_df, pd.DataFrame(ret[1:])], axis=0)
# Clean up
ticks_df.columns = ['timestamp','quote']
ticks_df.timestamp = ticks_df.timestamp.apply(pd.to_numeric)
ticks_df.set_index('timestamp', drop=True, inplace=True)
ticks_df.index = pd.to_datetime(ticks_df.index, unit='ms')
ticks_df.quote = ticks_df.quote.apply(pd.to_numeric)
ticks_df.dropna()
fn='quotes/'+darwin+'_'+year+'_'+month+'_quotes.csv'
ticks_df.to_csv('./data/'+fn)
def create_hourly(fn, darwin):
df=pd.read_csv('./data/quotes/'+fn)
df.timestamp=pd.to_datetime(df.timestamp)
df['date']=df.timestamp.dt.date
df['hour']=df.timestamp.dt.hour
df1=df.groupby(['date','hour']).agg({'quote': ['min','max','var','count','first','last']}).fillna(0)
df1.columns=df1.columns.droplevel()
df1['darwin']=darwin
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055] return df1
Importa los datos
hourly = pd.DataFrame(columns=['min','max','var','count','first','last','score','darwin'])
for darwin in top:
for filename in [darwin+'_2020_08_quotes.csv',
darwin+'_2020_09_quotes.csv',
darwin+'_2020_10_quotes.csv',
darwin+'_2020_11_quotes.csv',
darwin+'_2020_12_quotes.csv']:
hourly=hourly.append(create_hourly(filename, darwin))
hourly['score']=round((hourly['last']-hourly['first'])/np.sqrt(hourly['var']),4).fillna(0)
hourly['return']=hourly['last']-hourly['first']
hourly.reset_index(inplace=True)
hourly.rename(columns={'index':'hour'},inplace=True)
hourly.hour=hourly.hour.apply(lambda x: pd.Timestamp(x[0])+pd.to_timedelta(x[1], unit='h'))
print(hourly.shape)
hourly.head()
df de rendimientos por hora por cada darwin desde el 18 agosto 2020 hasta el 24 diciembre 2020
dars=sorted(hourly.darwin.unique())
lst=[hourly[hourly.darwin==dar][(hourly.hour>='2020-08-18 00:00:00') & (hourly.hour<'2020-12-24 22:00:00')][['hour','return']].rename(columns={'return':dar}) for i,dar in enumerate(dars)]
hly_rtns=pd.merge(lst[0], lst[1], how='outer')
for i in range(2,len(dars)):
hly_rtns=pd.merge(hly_rtns,lst[i], how='outer')
hly_rtns.fillna(0., inplace=True)
hly_rtns.head()
darwins con el rendimiento medio por hora más alto
means=pd.DataFrame(hly_rtns.mean(0), columns=['mean'])
means.sort_values('mean', ascending=False).T
results=[1.72, 1.58, 4.71, 1.7, 4.69, 4.81, 5.38, 5.29, 5.82, 6.18, 6.81, 6.72, 6.70, 6.81, 7.15, 7.94, 7.87, 7.79, 7.08, 7.98, 8.33, 8.29, 8.24, 8.17, 7.99, 8.18, 8.24]
# quitando resultados NAN cuando la suma de la fila no era exactamente 1.0
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
ax.plot(results, color='#ffe599', linewidth=2)
ax.set_title('Historial de Envíos a CodaLab', fontsize=20)
ax.set_xlabel('Envío', fontsize=14)
ax.set_ylabel('Sharpe Ratio on Leaderboard', fontsize=14)
ax.set_ylim(0,10)
ax.set_xlim(0,26)
ax.annotate('8.33', xy=(20, 8.33), xycoords='data',
xytext=(0.8, 0.95), textcoords='axes fraction',
arrowprops=dict(facecolor='black', shrink=0.05),
horizontalalignment='right', verticalalignment='top',
fontsize=20
)
ax.axvline(x=20, ymin=0, ymax=1, color='red', alpha=.5, linestyle='dotted')
ax.axhline(y=0, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=2, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=4, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=6, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=8, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=10, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
for i, label in enumerate(range(27)):
ax.annotate(label, (i, results[i]+.1), fontsize=12);
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('envios.png', dpi=300)
-
2 - asigna a cada
darwin1/18 de la inversión, sin cambiar al largo del tiempo: 'ysharpe_ratio': 4.71, 'cumulative_return': 4.92 -
6 - agrupa los 18
darwinsen 3 grupos: bajo, medio y alto rendimiento. Menor peso por los bajos, más peso por los altos. 'ysharpe_ratio': 5.38, 'cumulative_return': 5.97 -
10 - https://es.mathworks.com/help/finance/portfolio.estimatemaxsharperatio.html?s_tid=srchtitle con MATLAB instalado en el PC, quitando FNM, MET, NVL, REU, VRT y añadiendo ZTY, NYD, TKT, NWO, YEC. Optimizar en MATLAB.
-
15 - 3 rondas de top 22
darwinsy luego quita 4 -
20 - descarga datos por más
darwins
Coge los primeros 22 Darwins del listado y pone sus medios de return,AssetMean y covarianzas AssetCovar en MATLAB https://www.mathworks.com/help/finance/portfolio.estimatemaxsharperatio.html
‘Estimate Efficient Portfolio that Maximizes the Sharpe Ratio for a Portfolio Object with Semicontinuous and Cardinality Constraints’,
para seleccionar los mejores 18 activos de los 22. (Con más de 22 activos muchas veces no daba pesos porque no convergió.)
p = Portfolio('AssetMean', AssetMean, 'AssetCovar', AssetCovar);
p = setDefaultConstraints(p);
p = setMinMaxNumAssets(p, 18, 18);
pesos = estimateMaxSharpeRatio(p,'Method','iterative')
`
Seguí bajando el listado de darwins quitando los 4 darwins dejado el el rondo anterior y añadiendo 4 más para ver si mejoraba el resultado.
Si el resultado era mejor, lo puse en el Leaderboard.
dars=['AZG','BFS','FSK','JTL','LUG','MUF','NCT','NWO','PEW','PHI','PUL','TER','TXR','UEI','UYZ','WWT','XRX','ZCD']
wts=[0.0061,0.0109,0.1885,0.0079,0.0157,0.0077,0.019,0.0084,0.0046,0.0056,0.0099000000000001,0.0064,0.0153,0.0062,0.0026,0.65,0.0134,0.0218]
fig , ax = plt.subplots(nrows = 6, ncols = 3, figsize=(20, 8))
rows,cols = 6,3
for i in range(rows):
for j in range(cols):
darwin=dars[cols*i+j]
wt=' '+str(wts[cols*i+j]*100)[:5]+'%'
ax[i][j].plot(range(len(hourly[hourly.darwin==darwin])),hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0], color='#ffd966')
ax[i][j].set_ylim(-35,50)
ax[i][j].set_title(darwin+wt)
ax[i][j].set_frame_on(False)
ax[i][j].axes.get_xaxis().set_visible(False)
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
# get 'eod_ts' from the example submission file
sub=pd.read_csv('./data/submission.csv')
sub.eod_ts = pd.to_datetime(sub.eod_ts)
# create new submission file
new_sub=pd.DataFrame(columns=dars)
new_sub['eod_ts']=sub.eod_ts
new_sub.set_index('eod_ts', inplace=True)
# % allocation for each darwin
for i,dar in enumerate(dars):
new_sub[dar]=wts[i]
# check that all rows sum to 1.0
print(new_sub[dars].sum(1).sum())
assert new_sub[dars].sum(1).sum()==len(sub)
# rename columns
for col in new_sub.columns:
new_sub=new_sub.rename(columns={col:'allo_'+col})
# save submission file
new_sub.reset_index(inplace=True)
new_sub.to_csv('./data/sub.csv',index=False)
new_sub.head()
len(darwins_lst)
hourly = pd.DataFrame(columns=['min','max','var','count','first','last','score','darwin'])
for darwin in darwins_lst:
for filename in [darwin+'_2020_08_quotes.csv',
darwin+'_2020_09_quotes.csv',
darwin+'_2020_10_quotes.csv',
darwin+'_2020_11_quotes.csv',
darwin+'_2020_12_quotes.csv']:
hourly=hourly.append(create_hourly(filename, darwin))
hourly['score']=round((hourly['last']-hourly['first'])/np.sqrt(hourly['var']),4).fillna(0)
hourly['return']=hourly['last']-hourly['first']
hourly.reset_index(inplace=True)
hourly.rename(columns={'index':'hour'},inplace=True)
hourly.hour=hourly.hour.apply(lambda x: pd.Timestamp(x[0])+pd.to_timedelta(x[1], unit='h'))
print(hourly.shape)
hourly.head()
dars=sorted(hourly.darwin.unique())
lst=[hourly[hourly.darwin==dar][(hourly.hour>='2020-08-18 00:00:00') & (hourly.hour<'2020-12-24 22:00:00')][['hour','return']].rename(columns={'return':dar}) for i,dar in enumerate(dars)]
hly_rtns=pd.merge(lst[0], lst[1], how='outer')
for i in range(2,len(dars)):
hly_rtns=pd.merge(hly_rtns,lst[i], how='outer')
hly_rtns.fillna(0., inplace=True)
hly_rtns.head()
means=pd.DataFrame(hly_rtns.mean(0), columns=['mean'])
means.sort_values('mean', ascending=False).T
def plot_dars(to_matlab):
rows, cols = 2, 11
fig , ax = plt.subplots(nrows = rows, ncols = cols, figsize=(22, 8))
for i in range(rows):
for j in range(cols):
darwin=to_matlab[cols*i+j]
color='#ffd966'
if darwin in drop:
color='red'
ax[i][j].plot(hourly[hourly.darwin==darwin].hour,hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0], color=color)
#print(min(hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0]),max(hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0]))
ax[i][j].set_ylim(-40,55)
ax[i][j].set_title(darwin, color=color)
ax[i][j].set_frame_on(False)
ax[i][j].axes.get_xaxis().set_visible(False)
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
to_matlab=sorted(means.sort_values('mean', ascending=False).index[:22])
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))
drop=['BGN','CBY','SRI','UYZ']
wts=[0.0183,0.0485,0.0924,0.0353,0.1918,0.0837,0.0061,0.0604,0.0509,0.0553,0.0386,0.0686,0.0192,0.0605,0.0162,0.0203,0.1249,0.0090]
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'HZY', 'JTL', 'NWO', 'NYP', 'PHI', 'PUL', 'SEH', 'TKT', 'WFJ', 'ZAB', 'ZCD', 'ZVQ']
plot_dars(to_matlab)
plt.savefig('round1.png')
to_matlab=sorted(means.sort_values('mean', ascending=False).index[:26])
to_matlab.remove('BGN')
to_matlab.remove('CBY')
to_matlab.remove('SRI')
to_matlab.remove('UYZ')
print('new:',sorted(means.sort_values('mean', ascending=False).index[22:26]))
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))
drop=['HZY','WFJ','SEH','TKT']
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]
plot_dars(to_matlab)
plt.savefig('round2.png')
Prueba de concepto: Codalab Score, 7 abril: 10.30
to_matlab=sorted(means.sort_values('mean', ascending=False).index[:30])
to_matlab.remove('BGN')
to_matlab.remove('CBY')
to_matlab.remove('SRI')
to_matlab.remove('UYZ')
to_matlab.remove('HZY')
to_matlab.remove('WFJ')
to_matlab.remove('SEH')
to_matlab.remove('TKT')
print('new:',sorted(means.sort_values('mean', ascending=False).index[26:30]))
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))
drop=['HCC','JTL','MUF','VRT']
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'UEI', 'YFC', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0165,0.0340,0.0680,0.0242,0.1321,0.0676,0.2115,0.0117,0.0385,0.0568,0.0292,0.0328,0.0646,0.0218,0.0718,0.0124,0.0980,0.0084]
plot_dars(to_matlab)
dars=['AZG','BFS','FSK','JTL','LUG','MUF','NCT','NWO','PEW','PHI','PUL','TER','TXR','UEI','UYZ','WWT','XRX','ZCD']
wts=[0.0061,0.0109,0.1885,0.0079,0.0157,0.0077,0.019,0.0084,0.0046,0.0056,0.0099000000000001,0.0064,0.0153,0.0062,0.0026,0.65,0.0134,0.0218]
labels = dars
w='#FFFFFF'
r='#E60026'
colors = [w,w,r,w,r,r,r,w,r,w,w,r,w,r,r,r,r,w]
left = np.array(0.)
patch_handles = []
fig, ax = plt.subplots(1, 1, figsize=(30, 1.5))
for i, w, l in zip(range(len(dars)),wts, labels):
patch_handles.append(ax.barh(0, w, align='center', left=left,
color=colors[i], edgecolor='black'))
left += w
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y+.1, s=l[0], ha='center',va='center')
ax.text( x, y, s=l[1], ha='center',va='center')
ax.text( x, y-.1, s=l[2], ha='center',va='center')
plt.yticks([])
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('winning_entry.png', dpi=300)
plt.show()
sum=0
for d,c,w in zip(dars, colors, wts):
if c == '#E60026': sum += w
sum
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]
labels = dars
w='#FFFFFF'
y='#FFE599'
colors =[y,w,w,y,y,y,y,y,w,y,w,y,w,w,w,y,w,y]
left = np.array(0.)
patch_handles = []
fig, ax = plt.subplots(1, 1, figsize=(30, 1.5))
for i, w, l in zip(range(len(dars)),wts, labels):
patch_handles.append(ax.barh(0, w, align='center', left=left,
color=colors[i], edgecolor='black'))
left += w
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y+.1, s=l[0], ha='center',va='center')
ax.text( x, y, s=l[1], ha='center',va='center')
ax.text( x, y-.1, s=l[2], ha='center',va='center')
plt.yticks([])
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('better_entry.png', dpi=300)
plt.show()
sum=0
for d,c,w in zip(dars, colors, wts):
if c == '#FFE599': sum += w
sum
def plot_train_and_test_dars(dars, color='black'):
rows, cols=2, int(.5+len(dars)/2)
fig, ax = plt.subplots(nrows = rows, ncols = cols, figsize=(25, 6))
for i in range(rows):
for j in range(cols):
darwin=dars[cols*i+j]
ax[i][j].plot(candles[candles.darwin==darwin].date.apply(lambda x: x.date()).to_list()+hourly[hourly.darwin==darwin].hour.apply(lambda x: x.date()).to_list(),
(candles[candles.darwin==darwin]['open']-candles[candles.darwin==darwin]['open'][0]).to_list()+(hourly[hourly.darwin==darwin]['first']-candles[candles.darwin==darwin]['open'].to_list()[0]).to_list(),
color=color)
ax[i][j].set_title(darwin, color=color, loc='center', y=0.8)
ax[i][j].set_ylim(-50,170)
ax[i][j].set_frame_on(False)
ax[i][j].set_xlabel('Entrenamiento | Test')
ax[i][j].set_xticklabels([])
ax[i][j].axvline(x=hourly[hourly.darwin==darwin].hour.apply(lambda x: x.date()).to_list()[0], color='grey', linestyle='dashed')
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
dars=['AUX','BOT','EOP','ERQ','FIR','HEO','MUF','NYP','ZAB','ZVQ']
plot_train_and_test_dars(dars,color='#ffd966')
plt.tight_layout()
plt.savefig('train_test.png', dpi=600)
dars=['FSK','LUG','MUF','NCT','PEW','TER','UEI','UYZ','WWT','XRX']
plot_train_and_test_dars(dars,color='red')
dars=['AZG','BFS','JTL','NWO','PHI','PUL','TXR','ZCD']
plot_train_and_test_dars(dars)
- El hecho de repensar lo que hice por el reto con la idea de explicarlo a otros resultó ser una forma estupenda de revelar los fallos.
- ¡Para y piensa! Haz un 'time-out' para repensar. Vale la pena revisar todos los puntos de decisión y replantar el problema.
- ¡Ten paciencia!
- espera el tiempo de descarga de todos los datos, aunque sean horas
- espera a que el algoritmo de MATLAB converja, invertas horas en hacerlo
- ¡Concéntrate en una sola tarea!
- Todos tenemos que lidiar con muchas tareas.
- Necesitas tener la cabeza despejada.
- Es mejor trabajar secuencialmente que en paralelo (no soy un GPU).
- Cree en ti mismo.
- Aprendemos haciendo.
- ...
También he conseguido el cuarto puesto, junto con #javic y #agnprz, en el reto de visión por ordenador y el sexto puesto en el reto de PLN.
Si tienes datos a entender, estaré encantada de ayudarte 😊