SPAIN-AI 2020 Hackathon Reto Series Temporales
Cómo No Ganar Un Reto
- Workshop Hackathon Spain AI 2020
- Reto: Lo que tienía que hacer
- Realidad: Lo que hice
- Relato: Lo que me debería haber hecho
- Reflexiones
If ifs and ands were pots and pans...
https://www.spain-ai.com/hackathon2020_reto_Series_Temporales.php
https://competitions.codalab.org/competitions/28630
This blog post details the code I used in this competition. It is provided here as a support to the SPAIN-AI presentation scheduled for 15 May 2021. Both the presentation and this post are in Spanish, since SPAIN-AI serves a Spanish speaking audience.
Esta blog detalla el código que utilicé en este concurso. Se ofrece aquí como apoyo a la presentación de SPAIN-AI prevista para el 15 de mayo de 2021. Tanto la presentación como este blog están en español, ya que SPAIN-AI atiende a un público hispanoparlante.
Workshop Hackathon Spain AI 2020
https://www.linkedin.com/feed/update/urn:li:activity:6798658932182142976/
Imports
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
import warnings
warnings.simplefilter("ignore", UserWarning)
pd.set_option('display.max_rows', 100)
Entender el reto
Crea una cartera de activos (darwins
), maximizando el ratio de Sharpe
:
- escoje 18
darwins
de los 96 que están en los datos de entrenamiento - asigna una parte de la inversión total a cada uno de los 18
darwins
- ajusta la asignación a lo largo del tiempo, cada hora desde el 18 agosto 2020 hasta el 24 diciembre 2020, los 2229 horas que están en el fichero
submission.csv
- calcula el ratio de Sharpe
No es un problema de predicción (todos los datos son disponible el darwinex.com
) pero de creación de una cartera diversificada de activos no correlacionados entre sí, que tienen rendimientos estables.
La métrica a maximizar
$$S = \frac{E[R_a-R_b]}{\sigma_a}$$
$R_a$ es el rendimiento de la cartera
$R_b$ es el rendimiento de una inversión de referencia
$\sigma_a$ es la desviación estándar (volatilidad) del exceso de rendimiento de la inversión
Asumiendo que $R_b$ fuera constante, hay que maximizar el rendimiento dividido por la volatilidad, así que buscamos rendimientos estables.
Necesita % inversión por cada hora
xs = [2,6,10,14,18,22,25,35,40,43,45,50,54,57,59,66,70,77]
ys = np.array([0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055])*100
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot()
ax.scatter(xs,ys, color='#ffe559')
ax.set_xlabel('Darwin')
ax.set_ylabel('% inversión')
annotations=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
for i, label in enumerate(annotations):
plt.annotate(label, (xs[i], ys[i]))
plt.xticks([])
plt.tight_layout()
plt.savefig("darwins.png")
plt.show()
candles=pd.DataFrame(columns=['date','close','max','min','open','std_dev','score','darwin'])
filenames = [x for x in os.listdir('./data/TrainCandles')]
for filename in filenames:
df=pd.read_csv('./data/TrainCandles/'+filename).rename(columns={'Unnamed: 0':'date'})
df.date = pd.to_datetime(df.date)
df['std_dev']=df.std(axis=1)
df['score']=(round((df.close-df.open)/df.std_dev,6)).fillna(0)
df['darwin']=filename[-13:-10]
candles=candles.append(df)
candles.head()
plt.hist(candles.score, color='#FFE599')
plt.title('Count of hourly scores');
Identifica qué darwin
tenía el máximo score
en cada hora.
Conta cuánto veces cada darwin
tiene el máximo score
.
Investiga esos darwins
primero.
lst=[candles[(candles['date']==hour) & (candles.score==candles[candles['date']==hour].score.max())].darwin.to_list()[0] for hour in sorted(candles['date'].unique())]
count_dict=Counter(lst)
df=pd.DataFrame.from_dict(count_dict, orient='index').sort_values(0, ascending=False)
darwins_lst=df.index.to_list()
darwins_lst=darwins_lst+['MMY','TMF'] # the 2 darwins that never had max score
df.T
def plot_historic_dars(dars):
rows, cols=2, int(.5+len(dars)/2)
fig, ax = plt.subplots(nrows = rows, ncols = cols, figsize=(20, 8))
for i in range(rows):
for j in range(cols):
darwin=dars[cols*i+j]
ax[i][j].plot(candles[candles.darwin==darwin].date.apply(lambda x: x.date()),candles[candles.darwin==darwin]['open']-candles[candles.darwin==darwin]['open'][0], color='#ffd966')
ax[i][j].set_title(darwin, color='#ffd966', loc='center', y=0.9)
ax[i][j].set_ylim(-50,100)
ax[i][j].set_frame_on(False)
ax[i][j].set_xlabel('Periodo de Entrenamiento')
ax[i][j].set_xticklabels([])
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
plot_historic_dars(df.index[:6])
darwins
con más horas de máximo score
Descarga datos de los Desde agosto 2020 hasta diciembre 2020 desde darwinex.com, utilizando ftp
https://github.com/darwinex/darwinexapis/blob/master/darwinexapis/API/DarwinDataAnalyticsAPI/DWX_Data_Analytics_API.py
top=['BSX','FNM','ZTY','CBY','NYD','TKT','BFS','NWO','HZY','NVL','YEC','TER','PUL','VRT','NCT','MUF','FSK','PEW','LEN','LUG','PHI','BGN','TXR','UYZ','MET','REU','UEI','ZVQ',
'ZCD','SYO','BZC','XRX','ULT','HQU','WWT','CIS','TRO','FFV','MCA','AWW','GGR','AZG','GFJ','LWK','VVC','WFJ','OJG','OOS','SRI','LWE','HEO','RAT','TDD','ZXW','OXR','ACY',
'GRI','HCC','PPT','FIR','ULI','ZAB','ZUJ','SKN','EEY','SKI','SEH','NSC','SHC','EOP','WXN','LHB','SBY','IDT','RWJ','JTL']
Imports and server
from ftplib import FTP
from tqdm import tqdm
from io import BytesIO
import gzip
FTP_CRED = {'username': USERNAME,
'password': PASSWORD,
'server': "darwindata.darwinex.com",
'port': 21}
dwx_ftp_hostname=FTP_CRED['server']
dwx_ftp_user=FTP_CRED['username']
dwx_ftp_pass=FTP_CRED['password']
server = FTP(dwx_ftp_hostname)
server.login(dwx_ftp_user, dwx_ftp_pass)
{DARWIN_TICKER}.{PRODUCTRISK}.{COLOUR}{PRODUCTID}_YYYY-MM-DD.HH.csv.gz 'former_var10'
year='2020'
darwins_lst_dld=['FNM']
for darwin in darwins_lst_dld:
print(darwin)
for month in ['08','09','10','11','12']:
quote_files = []
server.retrlines(f'NLST {darwin}/_{darwin}_former_var10/quotes/{year}-{month}/', quote_files.append)
quote_files = [f'{darwin}/_{darwin}_former_var10/quotes/{year}-{month}/{quote_file}' for quote_file in quote_files]
# Process tick data files
tqdm.write(f'\n[KERNEL] {len(quote_files)} files retrieved.. post-processing now, please wait..', end='')
ticks_df = pd.DataFrame()
ticks_pbar = tqdm(quote_files, position=0, leave=True)
for tick_file in ticks_pbar:
# Clear / reinitialize buffer
retbuf = BytesIO()
server.retrbinary(f"RETR {tick_file}", retbuf.write)
retbuf.seek(0)
# Extract data from BytesIO object
ret = [line.strip().decode().split(',') for line in gzip.open(retbuf)]
ticks_df = pd.concat([ticks_df, pd.DataFrame(ret[1:])], axis=0)
# Clean up
ticks_df.columns = ['timestamp','quote']
ticks_df.timestamp = ticks_df.timestamp.apply(pd.to_numeric)
ticks_df.set_index('timestamp', drop=True, inplace=True)
ticks_df.index = pd.to_datetime(ticks_df.index, unit='ms')
ticks_df.quote = ticks_df.quote.apply(pd.to_numeric)
ticks_df.dropna()
fn='quotes/'+darwin+'_'+year+'_'+month+'_quotes.csv'
ticks_df.to_csv('./data/'+fn)
new
darwins_lst_dld=['PPT']
for darwin in darwins_lst_dld: #to do
print(darwin)
for month in ['08','09','10','11','12']:
quote_files = []
server.retrlines(f'NLST {darwin}/quotes/{year}-{month}/', quote_files.append)
quote_files = [f'{darwin}/quotes/{year}-{month}/{quote_file}' for quote_file in quote_files]
# Process tick data files
tqdm.write(f'\n[KERNEL] {len(quote_files)} files retrieved.. post-processing now, please wait..', end='')
ticks_df = pd.DataFrame()
ticks_pbar = tqdm(quote_files, position=0, leave=True)
for tick_file in ticks_pbar:
# Clear / reinitialize buffer
retbuf = BytesIO()
server.retrbinary(f"RETR {tick_file}", retbuf.write)
retbuf.seek(0)
# Extract data from BytesIO object
ret = [line.strip().decode().split(',') for line in gzip.open(retbuf)]
ticks_df = pd.concat([ticks_df, pd.DataFrame(ret[1:])], axis=0)
# Clean up
ticks_df.columns = ['timestamp','quote']
ticks_df.timestamp = ticks_df.timestamp.apply(pd.to_numeric)
ticks_df.set_index('timestamp', drop=True, inplace=True)
ticks_df.index = pd.to_datetime(ticks_df.index, unit='ms')
ticks_df.quote = ticks_df.quote.apply(pd.to_numeric)
ticks_df.dropna()
fn='quotes/'+darwin+'_'+year+'_'+month+'_quotes.csv'
ticks_df.to_csv('./data/'+fn)
def create_hourly(fn, darwin):
df=pd.read_csv('./data/quotes/'+fn)
df.timestamp=pd.to_datetime(df.timestamp)
df['date']=df.timestamp.dt.date
df['hour']=df.timestamp.dt.hour
df1=df.groupby(['date','hour']).agg({'quote': ['min','max','var','count','first','last']}).fillna(0)
df1.columns=df1.columns.droplevel()
df1['darwin']=darwin
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055] return df1
Importa los datos
hourly = pd.DataFrame(columns=['min','max','var','count','first','last','score','darwin'])
for darwin in top:
for filename in [darwin+'_2020_08_quotes.csv',
darwin+'_2020_09_quotes.csv',
darwin+'_2020_10_quotes.csv',
darwin+'_2020_11_quotes.csv',
darwin+'_2020_12_quotes.csv']:
hourly=hourly.append(create_hourly(filename, darwin))
hourly['score']=round((hourly['last']-hourly['first'])/np.sqrt(hourly['var']),4).fillna(0)
hourly['return']=hourly['last']-hourly['first']
hourly.reset_index(inplace=True)
hourly.rename(columns={'index':'hour'},inplace=True)
hourly.hour=hourly.hour.apply(lambda x: pd.Timestamp(x[0])+pd.to_timedelta(x[1], unit='h'))
print(hourly.shape)
hourly.head()
df de rendimientos por hora por cada darwin
desde el 18 agosto 2020 hasta el 24 diciembre 2020
dars=sorted(hourly.darwin.unique())
lst=[hourly[hourly.darwin==dar][(hourly.hour>='2020-08-18 00:00:00') & (hourly.hour<'2020-12-24 22:00:00')][['hour','return']].rename(columns={'return':dar}) for i,dar in enumerate(dars)]
hly_rtns=pd.merge(lst[0], lst[1], how='outer')
for i in range(2,len(dars)):
hly_rtns=pd.merge(hly_rtns,lst[i], how='outer')
hly_rtns.fillna(0., inplace=True)
hly_rtns.head()
darwins
con el rendimiento medio por hora más alto
means=pd.DataFrame(hly_rtns.mean(0), columns=['mean'])
means.sort_values('mean', ascending=False).T
results=[1.72, 1.58, 4.71, 1.7, 4.69, 4.81, 5.38, 5.29, 5.82, 6.18, 6.81, 6.72, 6.70, 6.81, 7.15, 7.94, 7.87, 7.79, 7.08, 7.98, 8.33, 8.29, 8.24, 8.17, 7.99, 8.18, 8.24]
# quitando resultados NAN cuando la suma de la fila no era exactamente 1.0
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
ax.plot(results, color='#ffe599', linewidth=2)
ax.set_title('Historial de Envíos a CodaLab', fontsize=20)
ax.set_xlabel('Envío', fontsize=14)
ax.set_ylabel('Sharpe Ratio on Leaderboard', fontsize=14)
ax.set_ylim(0,10)
ax.set_xlim(0,26)
ax.annotate('8.33', xy=(20, 8.33), xycoords='data',
xytext=(0.8, 0.95), textcoords='axes fraction',
arrowprops=dict(facecolor='black', shrink=0.05),
horizontalalignment='right', verticalalignment='top',
fontsize=20
)
ax.axvline(x=20, ymin=0, ymax=1, color='red', alpha=.5, linestyle='dotted')
ax.axhline(y=0, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=2, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=4, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=6, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=8, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=10, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
for i, label in enumerate(range(27)):
ax.annotate(label, (i, results[i]+.1), fontsize=12);
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('envios.png', dpi=300)
-
2 - asigna a cada
darwin
1/18 de la inversión, sin cambiar al largo del tiempo: 'ysharpe_ratio': 4.71, 'cumulative_return': 4.92 -
6 - agrupa los 18
darwins
en 3 grupos: bajo, medio y alto rendimiento. Menor peso por los bajos, más peso por los altos. 'ysharpe_ratio': 5.38, 'cumulative_return': 5.97 -
10 - https://es.mathworks.com/help/finance/portfolio.estimatemaxsharperatio.html?s_tid=srchtitle con MATLAB instalado en el PC, quitando FNM, MET, NVL, REU, VRT y añadiendo ZTY, NYD, TKT, NWO, YEC. Optimizar en MATLAB.
-
15 - 3 rondas de top 22
darwins
y luego quita 4 -
20 - descarga datos por más
darwins
Coge los primeros 22 Darwins del listado y pone sus medios de return
,AssetMean
y covarianzas AssetCovar
en MATLAB https://www.mathworks.com/help/finance/portfolio.estimatemaxsharperatio.html
‘Estimate Efficient Portfolio that Maximizes the Sharpe Ratio for a Portfolio Object with Semicontinuous and Cardinality Constraints’,
para seleccionar los mejores 18 activos de los 22. (Con más de 22 activos muchas veces no daba pesos
porque no convergió.)
p = Portfolio('AssetMean', AssetMean, 'AssetCovar', AssetCovar);
p = setDefaultConstraints(p);
p = setMinMaxNumAssets(p, 18, 18);
pesos = estimateMaxSharpeRatio(p,'Method','iterative')
`
Seguí bajando el listado de darwins
quitando los 4 darwins
dejado el el rondo anterior y añadiendo 4 más para ver si mejoraba el resultado.
Si el resultado era mejor, lo puse en el Leaderboard
.
dars=['AZG','BFS','FSK','JTL','LUG','MUF','NCT','NWO','PEW','PHI','PUL','TER','TXR','UEI','UYZ','WWT','XRX','ZCD']
wts=[0.0061,0.0109,0.1885,0.0079,0.0157,0.0077,0.019,0.0084,0.0046,0.0056,0.0099000000000001,0.0064,0.0153,0.0062,0.0026,0.65,0.0134,0.0218]
fig , ax = plt.subplots(nrows = 6, ncols = 3, figsize=(20, 8))
rows,cols = 6,3
for i in range(rows):
for j in range(cols):
darwin=dars[cols*i+j]
wt=' '+str(wts[cols*i+j]*100)[:5]+'%'
ax[i][j].plot(range(len(hourly[hourly.darwin==darwin])),hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0], color='#ffd966')
ax[i][j].set_ylim(-35,50)
ax[i][j].set_title(darwin+wt)
ax[i][j].set_frame_on(False)
ax[i][j].axes.get_xaxis().set_visible(False)
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
# get 'eod_ts' from the example submission file
sub=pd.read_csv('./data/submission.csv')
sub.eod_ts = pd.to_datetime(sub.eod_ts)
# create new submission file
new_sub=pd.DataFrame(columns=dars)
new_sub['eod_ts']=sub.eod_ts
new_sub.set_index('eod_ts', inplace=True)
# % allocation for each darwin
for i,dar in enumerate(dars):
new_sub[dar]=wts[i]
# check that all rows sum to 1.0
print(new_sub[dars].sum(1).sum())
assert new_sub[dars].sum(1).sum()==len(sub)
# rename columns
for col in new_sub.columns:
new_sub=new_sub.rename(columns={col:'allo_'+col})
# save submission file
new_sub.reset_index(inplace=True)
new_sub.to_csv('./data/sub.csv',index=False)
new_sub.head()
len(darwins_lst)
hourly = pd.DataFrame(columns=['min','max','var','count','first','last','score','darwin'])
for darwin in darwins_lst:
for filename in [darwin+'_2020_08_quotes.csv',
darwin+'_2020_09_quotes.csv',
darwin+'_2020_10_quotes.csv',
darwin+'_2020_11_quotes.csv',
darwin+'_2020_12_quotes.csv']:
hourly=hourly.append(create_hourly(filename, darwin))
hourly['score']=round((hourly['last']-hourly['first'])/np.sqrt(hourly['var']),4).fillna(0)
hourly['return']=hourly['last']-hourly['first']
hourly.reset_index(inplace=True)
hourly.rename(columns={'index':'hour'},inplace=True)
hourly.hour=hourly.hour.apply(lambda x: pd.Timestamp(x[0])+pd.to_timedelta(x[1], unit='h'))
print(hourly.shape)
hourly.head()
dars=sorted(hourly.darwin.unique())
lst=[hourly[hourly.darwin==dar][(hourly.hour>='2020-08-18 00:00:00') & (hourly.hour<'2020-12-24 22:00:00')][['hour','return']].rename(columns={'return':dar}) for i,dar in enumerate(dars)]
hly_rtns=pd.merge(lst[0], lst[1], how='outer')
for i in range(2,len(dars)):
hly_rtns=pd.merge(hly_rtns,lst[i], how='outer')
hly_rtns.fillna(0., inplace=True)
hly_rtns.head()
means=pd.DataFrame(hly_rtns.mean(0), columns=['mean'])
means.sort_values('mean', ascending=False).T
def plot_dars(to_matlab):
rows, cols = 2, 11
fig , ax = plt.subplots(nrows = rows, ncols = cols, figsize=(22, 8))
for i in range(rows):
for j in range(cols):
darwin=to_matlab[cols*i+j]
color='#ffd966'
if darwin in drop:
color='red'
ax[i][j].plot(hourly[hourly.darwin==darwin].hour,hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0], color=color)
#print(min(hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0]),max(hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0]))
ax[i][j].set_ylim(-40,55)
ax[i][j].set_title(darwin, color=color)
ax[i][j].set_frame_on(False)
ax[i][j].axes.get_xaxis().set_visible(False)
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
to_matlab=sorted(means.sort_values('mean', ascending=False).index[:22])
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))
drop=['BGN','CBY','SRI','UYZ']
wts=[0.0183,0.0485,0.0924,0.0353,0.1918,0.0837,0.0061,0.0604,0.0509,0.0553,0.0386,0.0686,0.0192,0.0605,0.0162,0.0203,0.1249,0.0090]
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'HZY', 'JTL', 'NWO', 'NYP', 'PHI', 'PUL', 'SEH', 'TKT', 'WFJ', 'ZAB', 'ZCD', 'ZVQ']
plot_dars(to_matlab)
plt.savefig('round1.png')
to_matlab=sorted(means.sort_values('mean', ascending=False).index[:26])
to_matlab.remove('BGN')
to_matlab.remove('CBY')
to_matlab.remove('SRI')
to_matlab.remove('UYZ')
print('new:',sorted(means.sort_values('mean', ascending=False).index[22:26]))
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))
drop=['HZY','WFJ','SEH','TKT']
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]
plot_dars(to_matlab)
plt.savefig('round2.png')
Prueba de concepto: Codalab Score, 7 abril: 10.30
to_matlab=sorted(means.sort_values('mean', ascending=False).index[:30])
to_matlab.remove('BGN')
to_matlab.remove('CBY')
to_matlab.remove('SRI')
to_matlab.remove('UYZ')
to_matlab.remove('HZY')
to_matlab.remove('WFJ')
to_matlab.remove('SEH')
to_matlab.remove('TKT')
print('new:',sorted(means.sort_values('mean', ascending=False).index[26:30]))
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))
drop=['HCC','JTL','MUF','VRT']
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'UEI', 'YFC', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0165,0.0340,0.0680,0.0242,0.1321,0.0676,0.2115,0.0117,0.0385,0.0568,0.0292,0.0328,0.0646,0.0218,0.0718,0.0124,0.0980,0.0084]
plot_dars(to_matlab)
dars=['AZG','BFS','FSK','JTL','LUG','MUF','NCT','NWO','PEW','PHI','PUL','TER','TXR','UEI','UYZ','WWT','XRX','ZCD']
wts=[0.0061,0.0109,0.1885,0.0079,0.0157,0.0077,0.019,0.0084,0.0046,0.0056,0.0099000000000001,0.0064,0.0153,0.0062,0.0026,0.65,0.0134,0.0218]
labels = dars
w='#FFFFFF'
r='#E60026'
colors = [w,w,r,w,r,r,r,w,r,w,w,r,w,r,r,r,r,w]
left = np.array(0.)
patch_handles = []
fig, ax = plt.subplots(1, 1, figsize=(30, 1.5))
for i, w, l in zip(range(len(dars)),wts, labels):
patch_handles.append(ax.barh(0, w, align='center', left=left,
color=colors[i], edgecolor='black'))
left += w
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y+.1, s=l[0], ha='center',va='center')
ax.text( x, y, s=l[1], ha='center',va='center')
ax.text( x, y-.1, s=l[2], ha='center',va='center')
plt.yticks([])
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('winning_entry.png', dpi=300)
plt.show()
sum=0
for d,c,w in zip(dars, colors, wts):
if c == '#E60026': sum += w
sum
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]
labels = dars
w='#FFFFFF'
y='#FFE599'
colors =[y,w,w,y,y,y,y,y,w,y,w,y,w,w,w,y,w,y]
left = np.array(0.)
patch_handles = []
fig, ax = plt.subplots(1, 1, figsize=(30, 1.5))
for i, w, l in zip(range(len(dars)),wts, labels):
patch_handles.append(ax.barh(0, w, align='center', left=left,
color=colors[i], edgecolor='black'))
left += w
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y+.1, s=l[0], ha='center',va='center')
ax.text( x, y, s=l[1], ha='center',va='center')
ax.text( x, y-.1, s=l[2], ha='center',va='center')
plt.yticks([])
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('better_entry.png', dpi=300)
plt.show()
sum=0
for d,c,w in zip(dars, colors, wts):
if c == '#FFE599': sum += w
sum
def plot_train_and_test_dars(dars, color='black'):
rows, cols=2, int(.5+len(dars)/2)
fig, ax = plt.subplots(nrows = rows, ncols = cols, figsize=(25, 6))
for i in range(rows):
for j in range(cols):
darwin=dars[cols*i+j]
ax[i][j].plot(candles[candles.darwin==darwin].date.apply(lambda x: x.date()).to_list()+hourly[hourly.darwin==darwin].hour.apply(lambda x: x.date()).to_list(),
(candles[candles.darwin==darwin]['open']-candles[candles.darwin==darwin]['open'][0]).to_list()+(hourly[hourly.darwin==darwin]['first']-candles[candles.darwin==darwin]['open'].to_list()[0]).to_list(),
color=color)
ax[i][j].set_title(darwin, color=color, loc='center', y=0.8)
ax[i][j].set_ylim(-50,170)
ax[i][j].set_frame_on(False)
ax[i][j].set_xlabel('Entrenamiento | Test')
ax[i][j].set_xticklabels([])
ax[i][j].axvline(x=hourly[hourly.darwin==darwin].hour.apply(lambda x: x.date()).to_list()[0], color='grey', linestyle='dashed')
ax[i][j].axhline(y=0, color='grey', linestyle='dotted')
dars=['AUX','BOT','EOP','ERQ','FIR','HEO','MUF','NYP','ZAB','ZVQ']
plot_train_and_test_dars(dars,color='#ffd966')
plt.tight_layout()
plt.savefig('train_test.png', dpi=600)
dars=['FSK','LUG','MUF','NCT','PEW','TER','UEI','UYZ','WWT','XRX']
plot_train_and_test_dars(dars,color='red')
dars=['AZG','BFS','JTL','NWO','PHI','PUL','TXR','ZCD']
plot_train_and_test_dars(dars)
- El hecho de repensar lo que hice por el reto con la idea de explicarlo a otros resultó ser una forma estupenda de revelar los fallos.
- ¡Para y piensa! Haz un 'time-out' para repensar. Vale la pena revisar todos los puntos de decisión y replantar el problema.
- ¡Ten paciencia!
- espera el tiempo de descarga de todos los datos, aunque sean horas
- espera a que el algoritmo de MATLAB converja, invertas horas en hacerlo
- ¡Concéntrate en una sola tarea!
- Todos tenemos que lidiar con muchas tareas.
- Necesitas tener la cabeza despejada.
- Es mejor trabajar secuencialmente que en paralelo (no soy un GPU).
- Cree en ti mismo.
- Aprendemos haciendo.
- ...
También he conseguido el cuarto puesto, junto con #javic y #agnprz, en el reto de visión por ordenador y el sexto puesto en el reto de PLN.
Si tienes datos a entender, estaré encantada de ayudarte 😊