If ifs and ands were pots and pans...

https://www.spain-ai.com/hackathon2020_reto_Series_Temporales.php

https://competitions.codalab.org/competitions/28630

This blog post details the code I used in this competition. It is provided here as a support to the SPAIN-AI presentation scheduled for 15 May 2021. Both the presentation and this post are in Spanish, since SPAIN-AI serves a Spanish speaking audience.

Esta blog detalla el código que utilicé en este concurso. Se ofrece aquí como apoyo a la presentación de SPAIN-AI prevista para el 15 de mayo de 2021. Tanto la presentación como este blog están en español, ya que SPAIN-AI atiende a un público hispanoparlante.

Workshop Hackathon Spain AI 2020

https://www.linkedin.com/feed/update/urn:li:activity:6798658932182142976/

Imports

import os
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from collections import Counter

import warnings
warnings.simplefilter("ignore", UserWarning)

pd.set_option('display.max_rows', 100)

Reto: Lo que tienía que hacer

Entender el reto

Crea una cartera de activos (darwins), maximizando el ratio de Sharpe:

escoje 18 darwins de los 96 que están en los datos de entrenamiento
asigna una parte de la inversión total a cada uno de los 18 darwins
ajusta la asignación a lo largo del tiempo, cada hora desde el 18 agosto 2020 hasta el 24 diciembre 2020, los 2229 horas que están en el fichero submission.csv
calcula el ratio de Sharpe

No es un problema de predicción (todos los datos son disponible el darwinex.com) pero de creación de una cartera diversificada de activos no correlacionados entre sí, que tienen rendimientos estables.

La métrica a maximizar

$$S = \frac{E[R_a-R_b]}{\sigma_a}$$

$R_a$ es el rendimiento de la cartera

$R_b$ es el rendimiento de una inversión de referencia

$\sigma_a$ es la desviación estándar (volatilidad) del exceso de rendimiento de la inversión

Asumiendo que $R_b$ fuera constante, hay que maximizar el rendimiento dividido por la volatilidad, así que buscamos rendimientos estables.

Necesita % inversión por cada hora

xs = [2,6,10,14,18,22,25,35,40,43,45,50,54,57,59,66,70,77]
ys = np.array([0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055])*100
fig = plt.figure(figsize=(20,8))

ax = fig.add_subplot()
ax.scatter(xs,ys, color='#ffe559')
ax.set_xlabel('Darwin')
ax.set_ylabel('% inversión')
annotations=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
for i, label in enumerate(annotations):
    plt.annotate(label, (xs[i], ys[i]))
plt.xticks([])
plt.tight_layout()
plt.savefig("darwins.png") 
plt.show()

Realidad: Lo que hice

Explora los datos históricos de entrenamiento - calcula `score`

candles=pd.DataFrame(columns=['date','close','max','min','open','std_dev','score','darwin'])
filenames = [x for x in os.listdir('./data/TrainCandles')]
for filename in filenames:
  df=pd.read_csv('./data/TrainCandles/'+filename).rename(columns={'Unnamed: 0':'date'})
  df.date = pd.to_datetime(df.date)
  df['std_dev']=df.std(axis=1)
  df['score']=(round((df.close-df.open)/df.std_dev,6)).fillna(0)
  df['darwin']=filename[-13:-10]
  candles=candles.append(df)

candles.head()

plt.hist(candles.score, color='#FFE599')
plt.title('Count of hourly scores');

Identifica qué darwin tenía el máximo score en cada hora.

Conta cuánto veces cada darwin tiene el máximo score.

Investiga esos darwins primero.

lst=[candles[(candles['date']==hour) & (candles.score==candles[candles['date']==hour].score.max())].darwin.to_list()[0] for hour in sorted(candles['date'].unique())]
count_dict=Counter(lst)
df=pd.DataFrame.from_dict(count_dict, orient='index').sort_values(0, ascending=False)
darwins_lst=df.index.to_list()
darwins_lst=darwins_lst+['MMY','TMF'] # the 2 darwins that never had max score
df.T

def plot_historic_dars(dars):
  rows, cols=2, int(.5+len(dars)/2)
  fig, ax = plt.subplots(nrows = rows, ncols = cols, figsize=(20, 8))
  for i in range(rows):
    for j in range(cols):
      darwin=dars[cols*i+j]
      ax[i][j].plot(candles[candles.darwin==darwin].date.apply(lambda x: x.date()),candles[candles.darwin==darwin]['open']-candles[candles.darwin==darwin]['open'][0], color='#ffd966')
      ax[i][j].set_title(darwin, color='#ffd966', loc='center', y=0.9)
      ax[i][j].set_ylim(-50,100)
      ax[i][j].set_frame_on(False)
      ax[i][j].set_xlabel('Periodo de Entrenamiento')
      ax[i][j].set_xticklabels([])
      ax[i][j].axhline(y=0, color='grey', linestyle='dotted')

plot_historic_dars(df.index[:6])

Descarga datos de los `darwins` con más horas de máximo `score`

Desde agosto 2020 hasta diciembre 2020 desde darwinex.com, utilizando ftp https://github.com/darwinex/darwinexapis/blob/master/darwinexapis/API/DarwinDataAnalyticsAPI/DWX_Data_Analytics_API.py

top=['BSX','FNM','ZTY','CBY','NYD','TKT','BFS','NWO','HZY','NVL','YEC','TER','PUL','VRT','NCT','MUF','FSK','PEW','LEN','LUG','PHI','BGN','TXR','UYZ','MET','REU','UEI','ZVQ',
     'ZCD','SYO','BZC','XRX','ULT','HQU','WWT','CIS','TRO','FFV','MCA','AWW','GGR','AZG','GFJ','LWK','VVC','WFJ','OJG','OOS','SRI','LWE','HEO','RAT','TDD','ZXW','OXR','ACY',
     'GRI','HCC','PPT','FIR','ULI','ZAB','ZUJ','SKN','EEY','SKI','SEH','NSC','SHC','EOP','WXN','LHB','SBY','IDT','RWJ','JTL']

Imports and server

from ftplib import FTP
from tqdm import tqdm
from io import BytesIO
import gzip

FTP_CRED = {'username': USERNAME,
            'password': PASSWORD,
            'server': "darwindata.darwinex.com",
            'port': 21}

dwx_ftp_hostname=FTP_CRED['server']
dwx_ftp_user=FTP_CRED['username']
dwx_ftp_pass=FTP_CRED['password']

server = FTP(dwx_ftp_hostname)
server.login(dwx_ftp_user, dwx_ftp_pass)

'230-Your bandwidth usage is restricted\n230 OK. Current restricted directory is /'

{DARWIN_TICKER}.{PRODUCTRISK}.{COLOUR}{PRODUCTID}_YYYY-MM-DD.HH.csv.gz 'former_var10'

year='2020'
darwins_lst_dld=['FNM']
for darwin in darwins_lst_dld:
  print(darwin)
  for month in ['08','09','10','11','12']: 
    quote_files = []
    server.retrlines(f'NLST {darwin}/_{darwin}_former_var10/quotes/{year}-{month}/', quote_files.append)
    quote_files = [f'{darwin}/_{darwin}_former_var10/quotes/{year}-{month}/{quote_file}' for quote_file in quote_files]
  
    # Process tick data files
    tqdm.write(f'\n[KERNEL] {len(quote_files)} files retrieved.. post-processing now, please wait..', end='')
    ticks_df = pd.DataFrame()
    ticks_pbar = tqdm(quote_files, position=0, leave=True)
            
    for tick_file in ticks_pbar:
      # Clear / reinitialize buffer
      retbuf = BytesIO()
      server.retrbinary(f"RETR {tick_file}", retbuf.write)
      retbuf.seek(0)
      # Extract data from BytesIO object
      ret = [line.strip().decode().split(',') for line in gzip.open(retbuf)]
      ticks_df = pd.concat([ticks_df, pd.DataFrame(ret[1:])], axis=0)

    # Clean up
    ticks_df.columns = ['timestamp','quote']
    ticks_df.timestamp = ticks_df.timestamp.apply(pd.to_numeric)
    ticks_df.set_index('timestamp', drop=True, inplace=True)
    ticks_df.index = pd.to_datetime(ticks_df.index, unit='ms')
    ticks_df.quote = ticks_df.quote.apply(pd.to_numeric)
    ticks_df.dropna()
    fn='quotes/'+darwin+'_'+year+'_'+month+'_quotes.csv'
    ticks_df.to_csv('./data/'+fn)

FNM

  0%|          | 0/228 [00:00<?, ?it/s]

[KERNEL] 228 files retrieved.. post-processing now, please wait..

100%|██████████| 228/228 [01:41<00:00,  2.24it/s]
  0%|          | 0/259 [00:00<?, ?it/s]

[KERNEL] 259 files retrieved.. post-processing now, please wait..

100%|██████████| 259/259 [01:55<00:00,  2.24it/s]
  0%|          | 0/288 [00:00<?, ?it/s]

[KERNEL] 288 files retrieved.. post-processing now, please wait..

100%|██████████| 288/288 [02:09<00:00,  2.23it/s]
  0%|          | 0/331 [00:00<?, ?it/s]

[KERNEL] 331 files retrieved.. post-processing now, please wait..

100%|██████████| 331/331 [02:32<00:00,  2.16it/s]
  0%|          | 0/225 [00:00<?, ?it/s]

new

darwins_lst_dld=['PPT']
for darwin in darwins_lst_dld:  #to do
  print(darwin)
  for month in ['08','09','10','11','12']:
    quote_files = []
    server.retrlines(f'NLST {darwin}/quotes/{year}-{month}/', quote_files.append)
    quote_files = [f'{darwin}/quotes/{year}-{month}/{quote_file}' for quote_file in quote_files]

    # Process tick data files
    tqdm.write(f'\n[KERNEL] {len(quote_files)} files retrieved.. post-processing now, please wait..', end='')
    ticks_df = pd.DataFrame()
    ticks_pbar = tqdm(quote_files, position=0, leave=True)
            
    for tick_file in ticks_pbar:
      # Clear / reinitialize buffer
      retbuf = BytesIO()
      server.retrbinary(f"RETR {tick_file}", retbuf.write)
      retbuf.seek(0)
      # Extract data from BytesIO object
      ret = [line.strip().decode().split(',') for line in gzip.open(retbuf)]
      ticks_df = pd.concat([ticks_df, pd.DataFrame(ret[1:])], axis=0)

    # Clean up
    ticks_df.columns = ['timestamp','quote']
    ticks_df.timestamp = ticks_df.timestamp.apply(pd.to_numeric)
    ticks_df.set_index('timestamp', drop=True, inplace=True)
    ticks_df.index = pd.to_datetime(ticks_df.index, unit='ms')
    ticks_df.quote = ticks_df.quote.apply(pd.to_numeric)
    ticks_df.dropna()
    fn='quotes/'+darwin+'_'+year+'_'+month+'_quotes.csv'
    ticks_df.to_csv('./data/'+fn)

PPT

  0%|          | 0/511 [00:00<?, ?it/s]

[KERNEL] 511 files retrieved.. post-processing now, please wait..

100%|██████████| 511/511 [04:06<00:00,  2.07it/s]
  0%|          | 0/428 [00:00<?, ?it/s]

[KERNEL] 428 files retrieved.. post-processing now, please wait..

100%|██████████| 428/428 [03:23<00:00,  2.10it/s]
  0%|          | 0/26 [00:00<?, ?it/s]

[KERNEL] 26 files retrieved.. post-processing now, please wait..

100%|██████████| 26/26 [00:10<00:00,  2.54it/s]
  0%|          | 0/26 [00:00<?, ?it/s]

[KERNEL] 26 files retrieved.. post-processing now, please wait..

100%|██████████| 26/26 [00:10<00:00,  2.55it/s]
  0%|          | 0/26 [00:00<?, ?it/s]

Asigna porcentajes

Crea una cartera de los top 18 darwins e investiga su ratio de Sharpe variando porcentajes asignados

def create_hourly(fn, darwin):
  df=pd.read_csv('./data/quotes/'+fn)
  df.timestamp=pd.to_datetime(df.timestamp)
  df['date']=df.timestamp.dt.date
  df['hour']=df.timestamp.dt.hour
  df1=df.groupby(['date','hour']).agg({'quote': ['min','max','var','count','first','last']}).fillna(0)
  df1.columns=df1.columns.droplevel()
  df1['darwin']=darwin
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]  return df1

Importa los datos

hourly = pd.DataFrame(columns=['min','max','var','count','first','last','score','darwin'])
for darwin in top:
  for filename in [darwin+'_2020_08_quotes.csv',
                   darwin+'_2020_09_quotes.csv',
                   darwin+'_2020_10_quotes.csv',
                   darwin+'_2020_11_quotes.csv',
                   darwin+'_2020_12_quotes.csv']:
    hourly=hourly.append(create_hourly(filename, darwin))

hourly['score']=round((hourly['last']-hourly['first'])/np.sqrt(hourly['var']),4).fillna(0)
hourly['return']=hourly['last']-hourly['first']

hourly.reset_index(inplace=True)
hourly.rename(columns={'index':'hour'},inplace=True)
hourly.hour=hourly.hour.apply(lambda x: pd.Timestamp(x[0])+pd.to_timedelta(x[1], unit='h'))

print(hourly.shape)
hourly.head()

(108844, 10)

df de rendimientos por hora por cada darwin desde el 18 agosto 2020 hasta el 24 diciembre 2020

dars=sorted(hourly.darwin.unique())
lst=[hourly[hourly.darwin==dar][(hourly.hour>='2020-08-18 00:00:00') & (hourly.hour<'2020-12-24 22:00:00')][['hour','return']].rename(columns={'return':dar}) for i,dar in enumerate(dars)]
hly_rtns=pd.merge(lst[0], lst[1], how='outer')
for i in range(2,len(dars)):
  hly_rtns=pd.merge(hly_rtns,lst[i], how='outer')
hly_rtns.fillna(0., inplace=True)
hly_rtns.head()

darwins con el rendimiento medio por hora más alto

means=pd.DataFrame(hly_rtns.mean(0), columns=['mean'])
means.sort_values('mean', ascending=False).T

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: DataFrame.mean and DataFrame.median with numeric_only=None will include datetime64 and datetime64tz columns in a future version.
  """Entry point for launching an IPython kernel.

Historial de envíos

results=[1.72, 1.58, 4.71, 1.7, 4.69, 4.81, 5.38, 5.29, 5.82, 6.18, 6.81, 6.72, 6.70, 6.81, 7.15, 7.94, 7.87, 7.79, 7.08, 7.98, 8.33, 8.29, 8.24, 8.17, 7.99, 8.18, 8.24]
# quitando resultados NAN cuando la suma de la fila no era exactamente 1.0
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
ax.plot(results, color='#ffe599', linewidth=2)
ax.set_title('Historial de Envíos a CodaLab', fontsize=20)
ax.set_xlabel('Envío', fontsize=14)
ax.set_ylabel('Sharpe Ratio on Leaderboard', fontsize=14)
ax.set_ylim(0,10)
ax.set_xlim(0,26)
ax.annotate('8.33', xy=(20, 8.33),  xycoords='data',
            xytext=(0.8, 0.95), textcoords='axes fraction',
            arrowprops=dict(facecolor='black', shrink=0.05),
            horizontalalignment='right', verticalalignment='top',
            fontsize=20
            )
ax.axvline(x=20, ymin=0, ymax=1, color='red', alpha=.5, linestyle='dotted')
ax.axhline(y=0, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=2, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=4, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=6, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=8, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')
ax.axhline(y=10, xmin=0, xmax=1, color='orange', alpha=.5, linestyle='dotted')

for i, label in enumerate(range(27)):
    ax.annotate(label, (i, results[i]+.1), fontsize=12);
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('envios.png', dpi=300)

2 - asigna a cada darwin 1/18 de la inversión, sin cambiar al largo del tiempo: 'ysharpe_ratio': 4.71, 'cumulative_return': 4.92
6 - agrupa los 18 darwins en 3 grupos: bajo, medio y alto rendimiento. Menor peso por los bajos, más peso por los altos. 'ysharpe_ratio': 5.38, 'cumulative_return': 5.97
10 - https://es.mathworks.com/help/finance/portfolio.estimatemaxsharperatio.html?s_tid=srchtitle con MATLAB instalado en el PC, quitando FNM, MET, NVL, REU, VRT y añadiendo ZTY, NYD, TKT, NWO, YEC. Optimizar en MATLAB.
15 - 3 rondas de top 22 darwins y luego quita 4
20 - descarga datos por más darwins

Coge los primeros 22 Darwins del listado y pone sus medios de return,AssetMean y covarianzas AssetCovar en MATLAB https://www.mathworks.com/help/finance/portfolio.estimatemaxsharperatio.html

‘Estimate Efficient Portfolio that Maximizes the Sharpe Ratio for a Portfolio Object with Semicontinuous and Cardinality Constraints’,

para seleccionar los mejores 18 activos de los 22. (Con más de 22 activos muchas veces no daba pesos porque no convergió.)

p = Portfolio('AssetMean', AssetMean, 'AssetCovar', AssetCovar);
p = setDefaultConstraints(p); 
p = setMinMaxNumAssets(p, 18, 18); 
pesos = estimateMaxSharpeRatio(p,'Method','iterative')
`

Seguí bajando el listado de darwins quitando los 4 darwins dejado el el rondo anterior y añadiendo 4 más para ver si mejoraba el resultado.

Si el resultado era mejor, lo puse en el Leaderboard.

Mi envío ganador

dars=['AZG','BFS','FSK','JTL','LUG','MUF','NCT','NWO','PEW','PHI','PUL','TER','TXR','UEI','UYZ','WWT','XRX','ZCD']
wts=[0.0061,0.0109,0.1885,0.0079,0.0157,0.0077,0.019,0.0084,0.0046,0.0056,0.0099000000000001,0.0064,0.0153,0.0062,0.0026,0.65,0.0134,0.0218]

fig , ax = plt.subplots(nrows = 6, ncols = 3, figsize=(20, 8))
rows,cols = 6,3
for i in range(rows):
  for j in range(cols):
    darwin=dars[cols*i+j]
    wt=' '+str(wts[cols*i+j]*100)[:5]+'%'
    ax[i][j].plot(range(len(hourly[hourly.darwin==darwin])),hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0], color='#ffd966')
    ax[i][j].set_ylim(-35,50)
    ax[i][j].set_title(darwin+wt)
    ax[i][j].set_frame_on(False)
    ax[i][j].axes.get_xaxis().set_visible(False)
    ax[i][j].axhline(y=0, color='grey', linestyle='dotted')

Crea archivo de envío

# get 'eod_ts' from the example submission file
sub=pd.read_csv('./data/submission.csv')
sub.eod_ts = pd.to_datetime(sub.eod_ts)

# create new submission file
new_sub=pd.DataFrame(columns=dars)
new_sub['eod_ts']=sub.eod_ts
new_sub.set_index('eod_ts', inplace=True)

# % allocation for each darwin
for i,dar in enumerate(dars):
  new_sub[dar]=wts[i]

# check that all rows sum to 1.0
print(new_sub[dars].sum(1).sum())
assert new_sub[dars].sum(1).sum()==len(sub) 

# rename columns
for col in new_sub.columns:
  new_sub=new_sub.rename(columns={col:'allo_'+col})

# save submission file
new_sub.reset_index(inplace=True)
new_sub.to_csv('./data/sub.csv',index=False)
new_sub.head()

2229.0

Relato: Lo que me debería haber hecho

If ifs and ands were pots and pans...

¡Recuerde: el rendimiento pasado no es una guía para el futuro!

Con los datos de Agosto 2020 a Diciembre 2020 descargado, no hace falta mirar los darwins con buen rendimiento anterior: hay que mirar el rendimiento de los darwins de Agosto 2020 a Diciembre 2020

Descarga todos los datos de cotizaciones

len(darwins_lst)

96

hourly = pd.DataFrame(columns=['min','max','var','count','first','last','score','darwin'])
for darwin in darwins_lst:
  for filename in [darwin+'_2020_08_quotes.csv',
                   darwin+'_2020_09_quotes.csv',
                   darwin+'_2020_10_quotes.csv',
                   darwin+'_2020_11_quotes.csv',
                   darwin+'_2020_12_quotes.csv']:
    hourly=hourly.append(create_hourly(filename, darwin))

hourly['score']=round((hourly['last']-hourly['first'])/np.sqrt(hourly['var']),4).fillna(0)
hourly['return']=hourly['last']-hourly['first']

hourly.reset_index(inplace=True)
hourly.rename(columns={'index':'hour'},inplace=True)
hourly.hour=hourly.hour.apply(lambda x: pd.Timestamp(x[0])+pd.to_timedelta(x[1], unit='h'))

print(hourly.shape)
hourly.head()

(130910, 10)

dars=sorted(hourly.darwin.unique())
lst=[hourly[hourly.darwin==dar][(hourly.hour>='2020-08-18 00:00:00') & (hourly.hour<'2020-12-24 22:00:00')][['hour','return']].rename(columns={'return':dar}) for i,dar in enumerate(dars)]
hly_rtns=pd.merge(lst[0], lst[1], how='outer')
for i in range(2,len(dars)):
  hly_rtns=pd.merge(hly_rtns,lst[i], how='outer')
hly_rtns.fillna(0., inplace=True)
hly_rtns.head()

means=pd.DataFrame(hly_rtns.mean(0), columns=['mean'])
means.sort_values('mean', ascending=False).T

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: FutureWarning: DataFrame.mean and DataFrame.median with numeric_only=None will include datetime64 and datetime64tz columns in a future version.

Ronda 1: Escoge los `darwins`

def plot_dars(to_matlab):
  rows, cols = 2, 11
  fig , ax = plt.subplots(nrows = rows, ncols = cols, figsize=(22, 8))
  for i in range(rows):
    for j in range(cols):
      darwin=to_matlab[cols*i+j]
      color='#ffd966'
      if darwin in drop:
        color='red'
      ax[i][j].plot(hourly[hourly.darwin==darwin].hour,hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0], color=color)
      #print(min(hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0]),max(hourly[hourly.darwin==darwin]['first']-hourly[hourly.darwin==darwin]['first'].to_list()[0]))
      ax[i][j].set_ylim(-40,55)
      ax[i][j].set_title(darwin, color=color)
      ax[i][j].set_frame_on(False)
      ax[i][j].axes.get_xaxis().set_visible(False)
      ax[i][j].axhline(y=0, color='grey', linestyle='dotted')

to_matlab=sorted(means.sort_values('mean', ascending=False).index[:22])
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))

darwins ['AUX', 'AZG', 'BFS', 'BGN', 'BOT', 'CBY', 'EOP', 'ERQ', 'HZY', 'JTL', 'NWO', 'NYP', 'PHI', 'PUL', 'SEH', 'SRI', 'TKT', 'UYZ', 'WFJ', 'ZAB', 'ZCD', 'ZVQ']
AssetMean [0.006505, 0.013742, 0.012682, 0.009225, 0.007564, 0.006444, 0.007073, 0.010481, 0.011354, 0.011587, 0.010866, 0.007081, 0.011118, 0.006668, 0.006398, 0.007448, 0.008265, 0.008431, 0.012105, 0.006646, 0.009491, 0.016565]
AssetCovar [[ 3.48533e-01  5.05940e-02  5.21800e-03 -6.59900e-03  5.87800e-03
  -1.07740e-02 -2.00280e-02  6.02300e-03  9.98000e-04  3.73200e-03
  -3.78000e-04 -3.75060e-02  7.13300e-03 -3.89190e-02 -3.71900e-03
   1.99870e-02 -5.02000e-04  2.82200e-03  6.57500e-03 -3.35100e-03
   1.26030e-02 -1.22343e-01]
 [ 5.05940e-02  1.88605e-01  1.47700e-03 -3.25700e-03  8.50300e-03
   3.20100e-03 -1.41290e-02 -2.33000e-04  5.86900e-03 -1.27110e-02
   1.87500e-03  6.85000e-04  1.77140e-02 -1.58130e-02 -4.80700e-03
   3.00870e-02  3.64300e-03  2.49600e-03  3.11860e-02 -3.89430e-02
   5.33600e-03 -1.11429e-01]
 [ 5.21800e-03  1.47700e-03  6.66340e-02  4.20460e-02  4.46000e-04
   9.65000e-04  3.80000e-05  2.01000e-04  3.86300e-03  1.01200e-03
   4.32600e-03 -5.32900e-03  1.17200e-03  4.82800e-03  2.81400e-03
  -5.70700e-03  3.43700e-03  3.39500e-03 -9.96800e-03  7.83600e-03
   3.16200e-03 -2.52930e-02]
 [-6.59900e-03 -3.25700e-03  4.20460e-02  1.29205e-01 -1.00900e-03
   2.03350e-02  4.90600e-03 -2.14000e-04  1.82850e-02  2.26200e-03
   1.92080e-02 -1.96700e-03 -4.51600e-03  4.26400e-03  3.84700e-03
  -6.72000e-04  1.69360e-02  4.14000e-03 -6.26100e-03  9.78800e-03
   9.49100e-03  2.83310e-02]
 [ 5.87800e-03  8.50300e-03  4.46000e-04 -1.00900e-03  1.06046e-01
   1.38000e-04 -1.02500e-03  2.23500e-03 -8.89000e-04 -6.60200e-03
  -3.72000e-04 -6.02000e-04  3.39800e-03 -1.95600e-03  1.16600e-03
   1.91800e-03  5.35000e-04  7.00300e-03  5.84400e-03  7.87000e-04
   1.18000e-04 -3.93500e-03]
 [-1.07740e-02  3.20100e-03  9.65000e-04  2.03350e-02  1.38000e-04
   1.08009e-01  2.94800e-03 -1.14500e-03  3.53590e-02 -8.20000e-05
   1.84660e-02 -1.81700e-03  1.67900e-03  1.44100e-03  7.70000e-04
   7.38800e-03  1.70800e-02 -2.55000e-04  1.07940e-02 -3.16000e-04
   1.07730e-02  1.70860e-02]
 [-2.00280e-02 -1.41290e-02  3.80000e-05  4.90600e-03 -1.02500e-03
   2.94800e-03  2.18290e-02  8.71000e-04 -3.95000e-04  2.24500e-03
   2.74800e-03  1.62000e-04 -7.30800e-03  5.03400e-03  1.08920e-02
  -1.06440e-02  3.88200e-03 -3.54400e-03 -9.04700e-03  1.80380e-02
   1.36000e-04  4.48240e-02]
 [ 6.02300e-03 -2.33000e-04  2.01000e-04 -2.14000e-04  2.23500e-03
  -1.14500e-03  8.71000e-04  5.24430e-02  6.20000e-05 -6.46000e-04
   3.64000e-04  3.35700e-03  2.15400e-03  6.17700e-03  2.59300e-03
  -2.81000e-03  1.02100e-03  1.80400e-03 -3.75100e-03  5.85000e-03
   1.65500e-03 -4.96000e-04]
 [ 9.98000e-04  5.86900e-03  3.86300e-03  1.82850e-02 -8.89000e-04
   3.53590e-02 -3.95000e-04  6.20000e-05  1.29009e-01 -7.88000e-04
   2.33880e-02  3.29500e-03  1.03920e-02  2.81700e-03 -3.54500e-03
   1.08230e-02  7.21600e-03  1.62000e-03  2.33790e-02 -2.91700e-03
   1.71370e-02  1.85460e-02]
 [ 3.73200e-03 -1.27110e-02  1.01200e-03  2.26200e-03 -6.60200e-03
  -8.20000e-05  2.24500e-03 -6.46000e-04 -7.88000e-04  1.10222e-01
   3.20900e-03  1.25300e-03 -1.05300e-03 -2.37300e-03  1.00000e-03
  -3.86800e-03  2.09500e-03 -1.53200e-03 -5.44200e-03 -1.99600e-03
   9.63000e-04  4.39000e-03]
 [-3.78000e-04  1.87500e-03  4.32600e-03  1.92080e-02 -3.72000e-04
   1.84660e-02  2.74800e-03  3.64000e-04  2.33880e-02  3.20900e-03
   5.50620e-02  2.21300e-03  3.86400e-03  2.78200e-03  2.75500e-03
  -1.50000e-04  3.63450e-02  1.66500e-03  1.28890e-02  5.16900e-03
   7.19400e-03  1.75170e-02]
 [-3.75060e-02  6.85000e-04 -5.32900e-03 -1.96700e-03 -6.02000e-04
  -1.81700e-03  1.62000e-04  3.35700e-03  3.29500e-03  1.25300e-03
   2.21300e-03  6.59050e-02 -1.98100e-03  1.03200e-03 -2.12400e-03
   2.45140e-02  2.11000e-03 -6.30000e-04  2.90000e-05 -3.31160e-02
  -2.65400e-03  4.44790e-02]
 [ 7.13300e-03  1.77140e-02  1.17200e-03 -4.51600e-03  3.39800e-03
   1.67900e-03 -7.30800e-03  2.15400e-03  1.03920e-02 -1.05300e-03
   3.86400e-03 -1.98100e-03  1.45973e-01  4.73000e-04 -7.43500e-03
   1.53310e-02  4.08200e-03 -2.19000e-04  1.51780e-01 -1.45050e-02
   1.70500e-03 -4.98000e-02]
 [-3.89190e-02 -1.58130e-02  4.82800e-03  4.26400e-03 -1.95600e-03
   1.44100e-03  5.03400e-03  6.17700e-03  2.81700e-03 -2.37300e-03
   2.78200e-03  1.03200e-03  4.73000e-04  4.97750e-02  2.26000e-03
  -1.01310e-02  4.43700e-03 -3.42100e-03  9.09000e-04  6.80400e-03
   5.60000e-05  2.10800e-02]
 [-3.71900e-03 -4.80700e-03  2.81400e-03  3.84700e-03  1.16600e-03
   7.70000e-04  1.08920e-02  2.59300e-03 -3.54500e-03  1.00000e-03
   2.75500e-03 -2.12400e-03 -7.43500e-03  2.26000e-03  5.33940e-02
  -1.60650e-02  5.88000e-03 -4.89300e-03 -7.62400e-03  4.34320e-02
   1.56300e-03  1.85340e-02]
 [ 1.99870e-02  3.00870e-02 -5.70700e-03 -6.72000e-04  1.91800e-03
   7.38800e-03 -1.06440e-02 -2.81000e-03  1.08230e-02 -3.86800e-03
  -1.50000e-04  2.45140e-02  1.53310e-02 -1.01310e-02 -1.60650e-02
   7.52170e-02 -5.31000e-04  1.05820e-02  2.95880e-02 -3.44600e-02
   2.19100e-03 -1.48290e-02]
 [-5.02000e-04  3.64300e-03  3.43700e-03  1.69360e-02  5.35000e-04
   1.70800e-02  3.88200e-03  1.02100e-03  7.21600e-03  2.09500e-03
   3.63450e-02  2.11000e-03  4.08200e-03  4.43700e-03  5.88000e-03
  -5.31000e-04  8.81210e-02  2.70000e-04  6.36200e-03  5.23200e-03
   6.81800e-03  6.46800e-03]
 [ 2.82200e-03  2.49600e-03  3.39500e-03  4.14000e-03  7.00300e-03
  -2.55000e-04 -3.54400e-03  1.80400e-03  1.62000e-03 -1.53200e-03
   1.66500e-03 -6.30000e-04 -2.19000e-04 -3.42100e-03 -4.89300e-03
   1.05820e-02  2.70000e-04  2.21354e-01 -8.03900e-03  1.06400e-03
   3.21400e-03  6.52900e-03]
 [ 6.57500e-03  3.11860e-02 -9.96800e-03 -6.26100e-03  5.84400e-03
   1.07940e-02 -9.04700e-03 -3.75100e-03  2.33790e-02 -5.44200e-03
   1.28890e-02  2.90000e-05  1.51780e-01  9.09000e-04 -7.62400e-03
   2.95880e-02  6.36200e-03 -8.03900e-03  3.69196e-01 -1.89600e-03
   2.48500e-03 -7.75250e-02]
 [-3.35100e-03 -3.89430e-02  7.83600e-03  9.78800e-03  7.87000e-04
  -3.16000e-04  1.80380e-02  5.85000e-03 -2.91700e-03 -1.99600e-03
   5.16900e-03 -3.31160e-02 -1.45050e-02  6.80400e-03  4.34320e-02
  -3.44600e-02  5.23200e-03  1.06400e-03 -1.89600e-03  1.62185e-01
   4.22500e-03  5.40200e-03]
 [ 1.26030e-02  5.33600e-03  3.16200e-03  9.49100e-03  1.18000e-04
   1.07730e-02  1.36000e-04  1.65500e-03  1.71370e-02  9.63000e-04
   7.19400e-03 -2.65400e-03  1.70500e-03  5.60000e-05  1.56300e-03
   2.19100e-03  6.81800e-03  3.21400e-03  2.48500e-03  4.22500e-03
   2.74810e-02 -1.44800e-03]
 [-1.22343e-01 -1.11429e-01 -2.52930e-02  2.83310e-02 -3.93500e-03
   1.70860e-02  4.48240e-02 -4.96000e-04  1.85460e-02  4.39000e-03
   1.75170e-02  4.44790e-02 -4.98000e-02  2.10800e-02  1.85340e-02
  -1.48290e-02  6.46800e-03  6.52900e-03 -7.75250e-02  5.40200e-03
  -1.44800e-03  8.59706e-01]]

drop=['BGN','CBY','SRI','UYZ']
wts=[0.0183,0.0485,0.0924,0.0353,0.1918,0.0837,0.0061,0.0604,0.0509,0.0553,0.0386,0.0686,0.0192,0.0605,0.0162,0.0203,0.1249,0.0090]
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'HZY', 'JTL', 'NWO', 'NYP', 'PHI', 'PUL', 'SEH', 'TKT', 'WFJ', 'ZAB', 'ZCD', 'ZVQ']
plot_dars(to_matlab)
plt.savefig('round1.png')

Ronda 2: Quita los `darwins` autocorrelacionados

to_matlab=sorted(means.sort_values('mean', ascending=False).index[:26])
to_matlab.remove('BGN')
to_matlab.remove('CBY')
to_matlab.remove('SRI')
to_matlab.remove('UYZ')

print('new:',sorted(means.sort_values('mean', ascending=False).index[22:26]))
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))

new: ['FIR', 'HEO', 'MUF', 'TXR']
darwins ['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'HZY', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'SEH', 'TKT', 'TXR', 'WFJ', 'ZAB', 'ZCD', 'ZVQ']
AssetMean [0.006505, 0.013742, 0.012682, 0.007564, 0.007073, 0.010481, 0.005046, 0.006105, 0.011354, 0.011587, 0.005723, 0.010866, 0.007081, 0.011118, 0.006668, 0.006398, 0.008265, 0.006048, 0.012105, 0.006646, 0.009491, 0.016565]
AssetCovar [[ 3.48533e-01  5.05940e-02  5.21800e-03  5.87800e-03 -2.00280e-02
   6.02300e-03 -2.48400e-03 -2.05000e-03  9.98000e-04  3.73200e-03
  -2.27240e-02 -3.78000e-04 -3.75060e-02  7.13300e-03 -3.89190e-02
  -3.71900e-03 -5.02000e-04 -1.42800e-03  6.57500e-03 -3.35100e-03
   1.26030e-02 -1.22343e-01]
 [ 5.05940e-02  1.88605e-01  1.47700e-03  8.50300e-03 -1.41290e-02
  -2.33000e-04 -4.87000e-04  1.09700e-03  5.86900e-03 -1.27110e-02
   2.67000e-04  1.87500e-03  6.85000e-04  1.77140e-02 -1.58130e-02
  -4.80700e-03  3.64300e-03  1.17100e-03  3.11860e-02 -3.89430e-02
   5.33600e-03 -1.11429e-01]
 [ 5.21800e-03  1.47700e-03  6.66340e-02  4.46000e-04  3.80000e-05
   2.01000e-04  1.30000e-05 -4.32500e-03  3.86300e-03  1.01200e-03
   2.76400e-03  4.32600e-03 -5.32900e-03  1.17200e-03  4.82800e-03
   2.81400e-03  3.43700e-03  1.33400e-03 -9.96800e-03  7.83600e-03
   3.16200e-03 -2.52930e-02]
 [ 5.87800e-03  8.50300e-03  4.46000e-04  1.06046e-01 -1.02500e-03
   2.23500e-03 -1.93400e-03 -1.01900e-03 -8.89000e-04 -6.60200e-03
  -2.32400e-03 -3.72000e-04 -6.02000e-04  3.39800e-03 -1.95600e-03
   1.16600e-03  5.35000e-04  2.75000e-04  5.84400e-03  7.87000e-04
   1.18000e-04 -3.93500e-03]
 [-2.00280e-02 -1.41290e-02  3.80000e-05 -1.02500e-03  2.18290e-02
   8.71000e-04  3.62000e-04 -9.66000e-04 -3.95000e-04  2.24500e-03
   3.68900e-03  2.74800e-03  1.62000e-04 -7.30800e-03  5.03400e-03
   1.08920e-02  3.88200e-03  7.60000e-05 -9.04700e-03  1.80380e-02
   1.36000e-04  4.48240e-02]
 [ 6.02300e-03 -2.33000e-04  2.01000e-04  2.23500e-03  8.71000e-04
   5.24430e-02 -1.00500e-03 -4.59800e-03  6.20000e-05 -6.46000e-04
  -8.38000e-04  3.64000e-04  3.35700e-03  2.15400e-03  6.17700e-03
   2.59300e-03  1.02100e-03 -2.97000e-04 -3.75100e-03  5.85000e-03
   1.65500e-03 -4.96000e-04]
 [-2.48400e-03 -4.87000e-04  1.30000e-05 -1.93400e-03  3.62000e-04
  -1.00500e-03  7.19900e-03  2.24700e-03  7.00000e-06  1.48100e-03
   4.80000e-04  5.00000e-06  8.43000e-04  2.26000e-03  9.72000e-04
  -1.00000e-05  2.19000e-04  3.22000e-04  8.62000e-04 -1.46700e-03
   2.33000e-04  3.25700e-03]
 [-2.05000e-03  1.09700e-03 -4.32500e-03 -1.01900e-03 -9.66000e-04
  -4.59800e-03  2.24700e-03  1.88411e-01  4.63000e-04  1.72310e-02
   2.00000e-05  3.75500e-03  2.17900e-03  5.04000e-03  4.70000e-04
  -4.48800e-03 -4.74000e-04  6.51000e-04  2.03900e-03 -1.91200e-03
   4.65000e-04 -4.02000e-03]
 [ 9.98000e-04  5.86900e-03  3.86300e-03 -8.89000e-04 -3.95000e-04
   6.20000e-05  7.00000e-06  4.63000e-04  1.29009e-01 -7.88000e-04
   1.32600e-03  2.33880e-02  3.29500e-03  1.03920e-02  2.81700e-03
  -3.54500e-03  7.21600e-03  6.04300e-03  2.33790e-02 -2.91700e-03
   1.71370e-02  1.85460e-02]
 [ 3.73200e-03 -1.27110e-02  1.01200e-03 -6.60200e-03  2.24500e-03
  -6.46000e-04  1.48100e-03  1.72310e-02 -7.88000e-04  1.10222e-01
  -6.82000e-04  3.20900e-03  1.25300e-03 -1.05300e-03 -2.37300e-03
   1.00000e-03  2.09500e-03  4.56000e-04 -5.44200e-03 -1.99600e-03
   9.63000e-04  4.39000e-03]
 [-2.27240e-02  2.67000e-04  2.76400e-03 -2.32400e-03  3.68900e-03
  -8.38000e-04  4.80000e-04  2.00000e-05  1.32600e-03 -6.82000e-04
   3.75980e-02  3.32800e-03  1.98370e-02 -1.01700e-02  5.82800e-03
  -3.17000e-04  3.39000e-03  6.70000e-05 -9.38400e-03 -2.32100e-03
  -2.06000e-04  2.76830e-02]
 [-3.78000e-04  1.87500e-03  4.32600e-03 -3.72000e-04  2.74800e-03
   3.64000e-04  5.00000e-06  3.75500e-03  2.33880e-02  3.20900e-03
   3.32800e-03  5.50620e-02  2.21300e-03  3.86400e-03  2.78200e-03
   2.75500e-03  3.63450e-02  4.21200e-03  1.28890e-02  5.16900e-03
   7.19400e-03  1.75170e-02]
 [-3.75060e-02  6.85000e-04 -5.32900e-03 -6.02000e-04  1.62000e-04
   3.35700e-03  8.43000e-04  2.17900e-03  3.29500e-03  1.25300e-03
   1.98370e-02  2.21300e-03  6.59050e-02 -1.98100e-03  1.03200e-03
  -2.12400e-03  2.11000e-03  2.86000e-04  2.90000e-05 -3.31160e-02
  -2.65400e-03  4.44790e-02]
 [ 7.13300e-03  1.77140e-02  1.17200e-03  3.39800e-03 -7.30800e-03
   2.15400e-03  2.26000e-03  5.04000e-03  1.03920e-02 -1.05300e-03
  -1.01700e-02  3.86400e-03 -1.98100e-03  1.45973e-01  4.73000e-04
  -7.43500e-03  4.08200e-03  1.32300e-03  1.51780e-01 -1.45050e-02
   1.70500e-03 -4.98000e-02]
 [-3.89190e-02 -1.58130e-02  4.82800e-03 -1.95600e-03  5.03400e-03
   6.17700e-03  9.72000e-04  4.70000e-04  2.81700e-03 -2.37300e-03
   5.82800e-03  2.78200e-03  1.03200e-03  4.73000e-04  4.97750e-02
   2.26000e-03  4.43700e-03  2.80000e-05  9.09000e-04  6.80400e-03
   5.60000e-05  2.10800e-02]
 [-3.71900e-03 -4.80700e-03  2.81400e-03  1.16600e-03  1.08920e-02
   2.59300e-03 -1.00000e-05 -4.48800e-03 -3.54500e-03  1.00000e-03
  -3.17000e-04  2.75500e-03 -2.12400e-03 -7.43500e-03  2.26000e-03
   5.33940e-02  5.88000e-03  2.30000e-05 -7.62400e-03  4.34320e-02
   1.56300e-03  1.85340e-02]
 [-5.02000e-04  3.64300e-03  3.43700e-03  5.35000e-04  3.88200e-03
   1.02100e-03  2.19000e-04 -4.74000e-04  7.21600e-03  2.09500e-03
   3.39000e-03  3.63450e-02  2.11000e-03  4.08200e-03  4.43700e-03
   5.88000e-03  8.81210e-02  2.25000e-03  6.36200e-03  5.23200e-03
   6.81800e-03  6.46800e-03]
 [-1.42800e-03  1.17100e-03  1.33400e-03  2.75000e-04  7.60000e-05
  -2.97000e-04  3.22000e-04  6.51000e-04  6.04300e-03  4.56000e-04
   6.70000e-05  4.21200e-03  2.86000e-04  1.32300e-03  2.80000e-05
   2.30000e-05  2.25000e-03  2.40260e-02  3.04400e-03 -7.50000e-05
   9.28000e-04  6.25400e-03]
 [ 6.57500e-03  3.11860e-02 -9.96800e-03  5.84400e-03 -9.04700e-03
  -3.75100e-03  8.62000e-04  2.03900e-03  2.33790e-02 -5.44200e-03
  -9.38400e-03  1.28890e-02  2.90000e-05  1.51780e-01  9.09000e-04
  -7.62400e-03  6.36200e-03  3.04400e-03  3.69196e-01 -1.89600e-03
   2.48500e-03 -7.75250e-02]
 [-3.35100e-03 -3.89430e-02  7.83600e-03  7.87000e-04  1.80380e-02
   5.85000e-03 -1.46700e-03 -1.91200e-03 -2.91700e-03 -1.99600e-03
  -2.32100e-03  5.16900e-03 -3.31160e-02 -1.45050e-02  6.80400e-03
   4.34320e-02  5.23200e-03 -7.50000e-05 -1.89600e-03  1.62185e-01
   4.22500e-03  5.40200e-03]
 [ 1.26030e-02  5.33600e-03  3.16200e-03  1.18000e-04  1.36000e-04
   1.65500e-03  2.33000e-04  4.65000e-04  1.71370e-02  9.63000e-04
  -2.06000e-04  7.19400e-03 -2.65400e-03  1.70500e-03  5.60000e-05
   1.56300e-03  6.81800e-03  9.28000e-04  2.48500e-03  4.22500e-03
   2.74810e-02 -1.44800e-03]
 [-1.22343e-01 -1.11429e-01 -2.52930e-02 -3.93500e-03  4.48240e-02
  -4.96000e-04  3.25700e-03 -4.02000e-03  1.85460e-02  4.39000e-03
   2.76830e-02  1.75170e-02  4.44790e-02 -4.98000e-02  2.10800e-02
   1.85340e-02  6.46800e-03  6.25400e-03 -7.75250e-02  5.40200e-03
  -1.44800e-03  8.59706e-01]]

drop=['HZY','WFJ','SEH','TKT']
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]
plot_dars(to_matlab)
plt.savefig('round2.png')

Prueba de concepto: Codalab Score, 7 abril: 10.30

Ronda 3: Rinse and repeat

to_matlab=sorted(means.sort_values('mean', ascending=False).index[:30])
to_matlab.remove('BGN')
to_matlab.remove('CBY')
to_matlab.remove('SRI')
to_matlab.remove('UYZ')

to_matlab.remove('HZY')
to_matlab.remove('WFJ')
to_matlab.remove('SEH')
to_matlab.remove('TKT')

print('new:',sorted(means.sort_values('mean', ascending=False).index[26:30]))
print('darwins',to_matlab)
print('AssetMean',[round(hly_rtns[darwin].mean(),6) for darwin in to_matlab])
print('AssetCovar',np.round(np.cov(np.array([hly_rtns[dar] for dar in to_matlab]),bias=True), 6))

new: ['HCC', 'UEI', 'VRT', 'YFC']
darwins ['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HCC', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'UEI', 'VRT', 'YFC', 'ZAB', 'ZCD', 'ZVQ']
AssetMean [0.006505, 0.013742, 0.012682, 0.007564, 0.007073, 0.010481, 0.005046, 0.004247, 0.006105, 0.011587, 0.005723, 0.010866, 0.007081, 0.011118, 0.006668, 0.006048, 0.004573, 0.004117, 0.004877, 0.006646, 0.009491, 0.016565]
AssetCovar [[ 3.48533e-01  5.05940e-02  5.21800e-03  5.87800e-03 -2.00280e-02
   6.02300e-03 -2.48400e-03  1.08600e-02 -2.05000e-03  3.73200e-03
  -2.27240e-02 -3.78000e-04 -3.75060e-02  7.13300e-03 -3.89190e-02
  -1.42800e-03 -7.14000e-04  1.72140e-02  4.04300e-03 -3.35100e-03
   1.26030e-02 -1.22343e-01]
 [ 5.05940e-02  1.88605e-01  1.47700e-03  8.50300e-03 -1.41290e-02
  -2.33000e-04 -4.87000e-04  2.15010e-02  1.09700e-03 -1.27110e-02
   2.67000e-04  1.87500e-03  6.85000e-04  1.77140e-02 -1.58130e-02
   1.17100e-03 -1.85900e-03  3.25430e-02  1.32740e-02 -3.89430e-02
   5.33600e-03 -1.11429e-01]
 [ 5.21800e-03  1.47700e-03  6.66340e-02  4.46000e-04  3.80000e-05
   2.01000e-04  1.30000e-05 -2.33000e-04 -4.32500e-03  1.01200e-03
   2.76400e-03  4.32600e-03 -5.32900e-03  1.17200e-03  4.82800e-03
   1.33400e-03  4.14800e-03  5.63600e-03  2.83300e-03  7.83600e-03
   3.16200e-03 -2.52930e-02]
 [ 5.87800e-03  8.50300e-03  4.46000e-04  1.06046e-01 -1.02500e-03
   2.23500e-03 -1.93400e-03  5.03000e-03 -1.01900e-03 -6.60200e-03
  -2.32400e-03 -3.72000e-04 -6.02000e-04  3.39800e-03 -1.95600e-03
   2.75000e-04 -1.04200e-03  2.80000e-04  3.81900e-03  7.87000e-04
   1.18000e-04 -3.93500e-03]
 [-2.00280e-02 -1.41290e-02  3.80000e-05 -1.02500e-03  2.18290e-02
   8.71000e-04  3.62000e-04 -2.63100e-03 -9.66000e-04  2.24500e-03
   3.68900e-03  2.74800e-03  1.62000e-04 -7.30800e-03  5.03400e-03
   7.60000e-05  2.41000e-03 -1.52200e-03 -4.45400e-03  1.80380e-02
   1.36000e-04  4.48240e-02]
 [ 6.02300e-03 -2.33000e-04  2.01000e-04  2.23500e-03  8.71000e-04
   5.24430e-02 -1.00500e-03 -2.18400e-03 -4.59800e-03 -6.46000e-04
  -8.38000e-04  3.64000e-04  3.35700e-03  2.15400e-03  6.17700e-03
  -2.97000e-04  6.10000e-04  2.39500e-03 -2.95000e-04  5.85000e-03
   1.65500e-03 -4.96000e-04]
 [-2.48400e-03 -4.87000e-04  1.30000e-05 -1.93400e-03  3.62000e-04
  -1.00500e-03  7.19900e-03  1.05200e-03  2.24700e-03  1.48100e-03
   4.80000e-04  5.00000e-06  8.43000e-04  2.26000e-03  9.72000e-04
   3.22000e-04 -4.13000e-04 -5.47000e-04  5.80000e-05 -1.46700e-03
   2.33000e-04  3.25700e-03]
 [ 1.08600e-02  2.15010e-02 -2.33000e-04  5.03000e-03 -2.63100e-03
  -2.18400e-03  1.05200e-03  6.56330e-02  4.29000e-03 -5.74000e-04
  -2.27600e-03  8.26000e-04 -5.42500e-03  7.86100e-03 -3.31400e-03
   3.93000e-04 -3.10100e-03  9.30500e-03  3.66400e-03 -1.17730e-02
   7.95000e-04 -6.47800e-03]
 [-2.05000e-03  1.09700e-03 -4.32500e-03 -1.01900e-03 -9.66000e-04
  -4.59800e-03  2.24700e-03  4.29000e-03  1.88411e-01  1.72310e-02
   2.00000e-05  3.75500e-03  2.17900e-03  5.04000e-03  4.70000e-04
   6.51000e-04 -2.14400e-03 -7.46800e-03 -2.94500e-03 -1.91200e-03
   4.65000e-04 -4.02000e-03]
 [ 3.73200e-03 -1.27110e-02  1.01200e-03 -6.60200e-03  2.24500e-03
  -6.46000e-04  1.48100e-03 -5.74000e-04  1.72310e-02  1.10222e-01
  -6.82000e-04  3.20900e-03  1.25300e-03 -1.05300e-03 -2.37300e-03
   4.56000e-04 -3.55500e-03  5.70200e-03  8.70000e-04 -1.99600e-03
   9.63000e-04  4.39000e-03]
 [-2.27240e-02  2.67000e-04  2.76400e-03 -2.32400e-03  3.68900e-03
  -8.38000e-04  4.80000e-04 -2.27600e-03  2.00000e-05 -6.82000e-04
   3.75980e-02  3.32800e-03  1.98370e-02 -1.01700e-02  5.82800e-03
   6.70000e-05  1.20100e-03  4.14000e-04 -3.97100e-03 -2.32100e-03
  -2.06000e-04  2.76830e-02]
 [-3.78000e-04  1.87500e-03  4.32600e-03 -3.72000e-04  2.74800e-03
   3.64000e-04  5.00000e-06  8.26000e-04  3.75500e-03  3.20900e-03
   3.32800e-03  5.50620e-02  2.21300e-03  3.86400e-03  2.78200e-03
   4.21200e-03  9.69000e-04  2.10500e-03 -3.04000e-04  5.16900e-03
   7.19400e-03  1.75170e-02]
 [-3.75060e-02  6.85000e-04 -5.32900e-03 -6.02000e-04  1.62000e-04
   3.35700e-03  8.43000e-04 -5.42500e-03  2.17900e-03  1.25300e-03
   1.98370e-02  2.21300e-03  6.59050e-02 -1.98100e-03  1.03200e-03
   2.86000e-04 -2.79000e-03 -6.22000e-04 -4.85700e-03 -3.31160e-02
  -2.65400e-03  4.44790e-02]
 [ 7.13300e-03  1.77140e-02  1.17200e-03  3.39800e-03 -7.30800e-03
   2.15400e-03  2.26000e-03  7.86100e-03  5.04000e-03 -1.05300e-03
  -1.01700e-02  3.86400e-03 -1.98100e-03  1.45973e-01  4.73000e-04
   1.32300e-03 -1.03000e-04  3.93900e-03 -5.45000e-04 -1.45050e-02
   1.70500e-03 -4.98000e-02]
 [-3.89190e-02 -1.58130e-02  4.82800e-03 -1.95600e-03  5.03400e-03
   6.17700e-03  9.72000e-04 -3.31400e-03  4.70000e-04 -2.37300e-03
   5.82800e-03  2.78200e-03  1.03200e-03  4.73000e-04  4.97750e-02
   2.80000e-05  1.30000e-03 -6.17100e-03 -1.45000e-03  6.80400e-03
   5.60000e-05  2.10800e-02]
 [-1.42800e-03  1.17100e-03  1.33400e-03  2.75000e-04  7.60000e-05
  -2.97000e-04  3.22000e-04  3.93000e-04  6.51000e-04  4.56000e-04
   6.70000e-05  4.21200e-03  2.86000e-04  1.32300e-03  2.80000e-05
   2.40260e-02  6.51000e-04  4.08000e-04 -9.30000e-05 -7.50000e-05
   9.28000e-04  6.25400e-03]
 [-7.14000e-04 -1.85900e-03  4.14800e-03 -1.04200e-03  2.41000e-03
   6.10000e-04 -4.13000e-04 -3.10100e-03 -2.14400e-03 -3.55500e-03
   1.20100e-03  9.69000e-04 -2.79000e-03 -1.03000e-04  1.30000e-03
   6.51000e-04  4.65900e-02  1.05200e-03  1.78200e-03  4.35700e-03
  -1.92000e-03 -6.73000e-04]
 [ 1.72140e-02  3.25430e-02  5.63600e-03  2.80000e-04 -1.52200e-03
   2.39500e-03 -5.47000e-04  9.30500e-03 -7.46800e-03  5.70200e-03
   4.14000e-04  2.10500e-03 -6.22000e-04  3.93900e-03 -6.17100e-03
   4.08000e-04  1.05200e-03  1.23841e-01  1.36600e-03 -6.04900e-03
   2.38400e-03 -1.73490e-02]
 [ 4.04300e-03  1.32740e-02  2.83300e-03  3.81900e-03 -4.45400e-03
  -2.95000e-04  5.80000e-05  3.66400e-03 -2.94500e-03  8.70000e-04
  -3.97100e-03 -3.04000e-04 -4.85700e-03 -5.45000e-04 -1.45000e-03
  -9.30000e-05  1.78200e-03  1.36600e-03  2.54330e-02  1.33900e-03
   4.36000e-04 -2.41130e-02]
 [-3.35100e-03 -3.89430e-02  7.83600e-03  7.87000e-04  1.80380e-02
   5.85000e-03 -1.46700e-03 -1.17730e-02 -1.91200e-03 -1.99600e-03
  -2.32100e-03  5.16900e-03 -3.31160e-02 -1.45050e-02  6.80400e-03
  -7.50000e-05  4.35700e-03 -6.04900e-03  1.33900e-03  1.62185e-01
   4.22500e-03  5.40200e-03]
 [ 1.26030e-02  5.33600e-03  3.16200e-03  1.18000e-04  1.36000e-04
   1.65500e-03  2.33000e-04  7.95000e-04  4.65000e-04  9.63000e-04
  -2.06000e-04  7.19400e-03 -2.65400e-03  1.70500e-03  5.60000e-05
   9.28000e-04 -1.92000e-03  2.38400e-03  4.36000e-04  4.22500e-03
   2.74810e-02 -1.44800e-03]
 [-1.22343e-01 -1.11429e-01 -2.52930e-02 -3.93500e-03  4.48240e-02
  -4.96000e-04  3.25700e-03 -6.47800e-03 -4.02000e-03  4.39000e-03
   2.76830e-02  1.75170e-02  4.44790e-02 -4.98000e-02  2.10800e-02
   6.25400e-03 -6.73000e-04 -1.73490e-02 -2.41130e-02  5.40200e-03
  -1.44800e-03  8.59706e-01]]

drop=['HCC','JTL','MUF','VRT']
dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'UEI', 'YFC', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0165,0.0340,0.0680,0.0242,0.1321,0.0676,0.2115,0.0117,0.0385,0.0568,0.0292,0.0328,0.0646,0.0218,0.0718,0.0124,0.0980,0.0084]
plot_dars(to_matlab)

Diferencias entre los Darwins de 8,3 y los Darwins de 10,3

dars=['AZG','BFS','FSK','JTL','LUG','MUF','NCT','NWO','PEW','PHI','PUL','TER','TXR','UEI','UYZ','WWT','XRX','ZCD']
wts=[0.0061,0.0109,0.1885,0.0079,0.0157,0.0077,0.019,0.0084,0.0046,0.0056,0.0099000000000001,0.0064,0.0153,0.0062,0.0026,0.65,0.0134,0.0218]

labels = dars
w='#FFFFFF'
r='#E60026'
colors = [w,w,r,w,r,r,r,w,r,w,w,r,w,r,r,r,r,w]


left = np.array(0.)
patch_handles = []

fig, ax = plt.subplots(1, 1, figsize=(30, 1.5))

for i, w, l in zip(range(len(dars)),wts, labels):
    patch_handles.append(ax.barh(0, w, align='center', left=left,
        color=colors[i], edgecolor='black'))
    left += w

    patch = patch_handles[-1][0] 
    bl = patch.get_xy()
    x = 0.5*patch.get_width() + bl[0]
    y = 0.5*patch.get_height() + bl[1]
    ax.text(x, y+.1, s=l[0], ha='center',va='center')
    ax.text( x, y, s=l[1], ha='center',va='center')
    ax.text( x, y-.1, s=l[2], ha='center',va='center')
plt.yticks([])
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('winning_entry.png', dpi=300)
plt.show()

sum=0
for d,c,w in zip(dars, colors, wts):
  if c == '#E60026': sum += w
sum

0.9141

dars=['AUX', 'AZG', 'BFS', 'BOT', 'EOP', 'ERQ', 'FIR', 'HEO', 'JTL', 'MUF', 'NWO', 'NYP', 'PHI', 'PUL', 'TXR', 'ZAB', 'ZCD', 'ZVQ']
wts=[0.0173,0.0379,0.0629,0.0309,0.1232,0.0654,0.2552,0.0076,0.0378,0.0235,0.0329,0.0487,0.0272,0.0383,0.0762,0.0165,0.0930,0.0055]

labels = dars
w='#FFFFFF'
y='#FFE599'
colors =[y,w,w,y,y,y,y,y,w,y,w,y,w,w,w,y,w,y]

left = np.array(0.)
patch_handles = []

fig, ax = plt.subplots(1, 1, figsize=(30, 1.5))

for i, w, l in zip(range(len(dars)),wts, labels):
    patch_handles.append(ax.barh(0, w, align='center', left=left,
        color=colors[i], edgecolor='black'))
    left += w

    patch = patch_handles[-1][0] 
    bl = patch.get_xy()
    x = 0.5*patch.get_width() + bl[0]
    y = 0.5*patch.get_height() + bl[1]
    ax.text(x, y+.1, s=l[0], ha='center',va='center')
    ax.text( x, y, s=l[1], ha='center',va='center')
    ax.text( x, y-.1, s=l[2], ha='center',va='center')
plt.yticks([])
ax.set_frame_on(False)
plt.tight_layout()
plt.savefig('better_entry.png', dpi=300)
plt.show()

sum=0
for d,c,w in zip(dars, colors, wts):
  if c == '#FFE599': sum += w
sum

0.5937999999999999

def plot_train_and_test_dars(dars, color='black'):
  rows, cols=2, int(.5+len(dars)/2)
  fig, ax = plt.subplots(nrows = rows, ncols = cols, figsize=(25, 6))
  for i in range(rows):
    for j in range(cols):
      darwin=dars[cols*i+j]
      ax[i][j].plot(candles[candles.darwin==darwin].date.apply(lambda x: x.date()).to_list()+hourly[hourly.darwin==darwin].hour.apply(lambda x: x.date()).to_list(),
                    (candles[candles.darwin==darwin]['open']-candles[candles.darwin==darwin]['open'][0]).to_list()+(hourly[hourly.darwin==darwin]['first']-candles[candles.darwin==darwin]['open'].to_list()[0]).to_list(),
                     color=color)
      ax[i][j].set_title(darwin, color=color, loc='center', y=0.8)
      ax[i][j].set_ylim(-50,170)
      ax[i][j].set_frame_on(False)
      ax[i][j].set_xlabel('Entrenamiento | Test')
      ax[i][j].set_xticklabels([])
      ax[i][j].axvline(x=hourly[hourly.darwin==darwin].hour.apply(lambda x: x.date()).to_list()[0], color='grey', linestyle='dashed')
      ax[i][j].axhline(y=0, color='grey', linestyle='dotted')

dars=['AUX','BOT','EOP','ERQ','FIR','HEO','MUF','NYP','ZAB','ZVQ']
plot_train_and_test_dars(dars,color='#ffd966')
plt.tight_layout()
plt.savefig('train_test.png', dpi=600)

dars=['FSK','LUG','MUF','NCT','PEW','TER','UEI','UYZ','WWT','XRX']
plot_train_and_test_dars(dars,color='red')

dars=['AZG','BFS','JTL','NWO','PHI','PUL','TXR','ZCD']
plot_train_and_test_dars(dars)

Reflexiones

El hecho de repensar lo que hice por el reto con la idea de explicarlo a otros resultó ser una forma estupenda de revelar los fallos.
¡Para y piensa! Haz un 'time-out' para repensar. Vale la pena revisar todos los puntos de decisión y replantar el problema.
¡Ten paciencia!
- espera el tiempo de descarga de todos los datos, aunque sean horas
- espera a que el algoritmo de MATLAB converja, invertas horas en hacerlo
¡Concéntrate en una sola tarea!
- Todos tenemos que lidiar con muchas tareas.
- Necesitas tener la cabeza despejada.
- Es mejor trabajar secuencialmente que en paralelo (no soy un GPU).
Cree en ti mismo.
Aprendemos haciendo.
...

También he conseguido el cuarto puesto, junto con #javic y #agnprz, en el reto de visión por ordenador y el sexto puesto en el reto de PLN.

Si tienes datos a entender, estaré encantada de ayudarte 😊

Gracias a SPAIN-AI, Darwinex y los demás participantes.

	date	close	max	min	open	std_dev	score	darwin
0	2019-08-04 21:00:00	110.40	110.46	110.37	110.38	0.040311	0.496139	HCC
1	2019-08-04 22:00:00	110.57	110.63	110.32	110.41	0.142683	1.121370	HCC
2	2019-08-04 23:00:00	110.49	110.57	110.45	110.57	0.060000	-1.333333	HCC
3	2019-08-05 00:00:00	110.51	110.60	110.38	110.48	0.090692	0.330791	HCC
4	2019-08-05 01:00:00	109.94	110.68	109.90	110.50	0.394081	-1.421027	HCC

	hour	min	max	var	count	first	last	score	darwin	return
0	2020-08-02 21:00:00	126.4739	126.4739	0.000000	1	126.4739	126.4739	0.0000	BSX	0.0000
1	2020-08-03 20:00:00	126.4538	126.6039	0.000996	1506	126.4659	126.5240	1.8409	BSX	0.0581
2	2020-08-03 21:00:00	126.2886	126.7286	0.012971	57	126.4920	126.7286	2.0774	BSX	0.2366
3	2020-08-04 21:00:00	126.7286	126.7286	0.000000	1	126.7286	126.7286	0.0000	BSX	0.0000
4	2020-08-05 21:00:00	126.4861	126.8366	0.003487	543	126.7286	126.7779	0.8348	BSX	0.0493

	hour	min	max	var	count	first	last	score	darwin	return
0	2020-08-02 21:00:00	126.4739	126.4739	0.000000	1	126.4739	126.4739	0.0000	BSX	0.0000
1	2020-08-03 20:00:00	126.4538	126.6039	0.000996	1506	126.4659	126.5240	1.8409	BSX	0.0581
2	2020-08-03 21:00:00	126.2886	126.7286	0.012971	57	126.4920	126.7286	2.0774	BSX	0.2366
3	2020-08-04 21:00:00	126.7286	126.7286	0.000000	1	126.7286	126.7286	0.0000	BSX	0.0000
4	2020-08-05 21:00:00	126.4861	126.8366	0.003487	543	126.7286	126.7779	0.8348	BSX	0.0493

	hour	ACY	AUX	AWW	AZG	BZC	CBY	DIG	EEY	EOP	ERO	FFV	GFJ	GRI	HCC	JHI	JNE	LEN	LUG	LWE	...	PPT	PUL	RAT	SBY	SEH	SHC	SKI	SKN	SRI	SYO	TDD	TER	TRO	ULI	ULT	UPP	USX	UYZ	VRT	WFJ	WXN	YEC	YFC	ZAB	ZCD	ZUJ	ZVQ	ZXW
0	2020-08-18 08:00:00	-0.2046	-0.3729	0.0280	-0.2541	0.0451	0.0764	-0.1314	0.0204	-0.1279	0.1415	0.3504	-0.2471	-0.3225	0.4445	-0.1538	-0.3362	-0.4258	-0.3640	0.1924	...	-0.0489	0.0347	0.0000	-0.1468	-0.2239	-0.5369	0.1110	0.0789	0.1928	0.3188	0.3515	-0.0600	-0.2799	0.0607	0.7122	0.3243	0.0981	0.0000	-0.1237	-0.3184	0.042	0.0125	0.0179	-1.0387	0.0641	-0.1574	-0.0757	0.1344
1	2020-08-18 09:00:00	-0.7715	-1.3755	-0.0205	-0.0352	-0.0134	0.0346	0.0487	0.0018	0.2048	-0.0544	0.2084	0.0683	0.0423	0.0228	0.0570	0.1824	0.1580	0.0246	0.0571	...	0.1492	0.0629	0.0000	-0.0538	-0.0820	0.0460	0.0350	0.1163	-0.0742	0.7048	0.1127	-0.0921	-0.5873	-0.0594	-0.2741	-0.5190	-0.0378	0.0000	0.0460	-0.6679	0.000	-0.0682	0.0043	-0.2051	0.0091	0.0584	0.2911	0.0653
2	2020-08-18 10:00:00	-0.2072	-0.3696	-0.0200	0.1754	-0.0350	0.1176	0.0173	0.0226	-0.1140	-0.0179	-0.4030	0.0297	0.1094	0.0920	0.0203	0.0241	0.0559	0.2155	0.0632	...	-0.0878	0.0000	0.0000	-0.0460	0.0943	0.4213	0.0459	-0.4226	-0.0244	0.0976	-0.1895	-0.0513	1.1642	-0.0903	-0.0901	0.3634	-0.0124	-0.0284	0.0162	1.3240	0.000	0.0551	-0.0154	0.5658	-0.0328	0.0207	-0.1894	-0.1592
3	2020-08-18 11:00:00	0.5182	0.9239	0.0139	0.0390	0.0380	-0.5722	-0.0330	0.0014	-0.2086	0.0366	0.1120	-0.0700	-0.7223	0.0110	-0.0387	-0.1421	-0.1070	-0.4118	-0.0863	...	-0.0575	0.0000	0.0000	0.0459	0.2790	-0.6534	-0.0834	0.6474	0.0498	-0.2464	-0.1914	0.0224	-0.3642	-0.0455	0.1842	-0.4543	0.0254	0.5276	-0.0311	-0.4142	0.000	-0.3371	0.0237	0.3899	0.0122	-0.0396	0.1832	-0.0821
4	2020-08-18 12:00:00	3.4317	6.1181	0.0743	-0.2758	0.0951	-0.2141	-0.2208	0.1189	-0.3103	0.2415	0.3893	-0.3064	-0.6439	0.5604	-0.2584	-0.6364	-0.7154	-0.7258	0.5191	...	-0.0169	-0.0354	-0.0161	0.0484	0.1981	-0.4841	0.5091	0.6564	0.3292	0.6209	-0.2148	-0.0089	-1.0961	0.3504	0.7548	0.7006	0.1674	0.5344	-0.1306	-1.2466	0.000	0.2408	0.0241	-1.1173	0.1225	-0.2655	0.7707	0.4010

	eod_ts	allo_AZG	allo_BFS	allo_FSK	allo_JTL	allo_LUG	allo_MUF	allo_NCT	allo_NWO	allo_PEW	allo_PHI	allo_PUL	allo_TER	allo_TXR	allo_UEI	allo_UYZ	allo_WWT	allo_XRX	allo_ZCD
0	2020-08-18 00:00:00	0.0061	0.0109	0.1885	0.0079	0.0157	0.0077	0.019	0.0084	0.0046	0.0056	0.0099	0.0064	0.0153	0.0062	0.0026	0.65	0.0134	0.0218
1	2020-08-18 01:00:00	0.0061	0.0109	0.1885	0.0079	0.0157	0.0077	0.019	0.0084	0.0046	0.0056	0.0099	0.0064	0.0153	0.0062	0.0026	0.65	0.0134	0.0218
2	2020-08-18 02:00:00	0.0061	0.0109	0.1885	0.0079	0.0157	0.0077	0.019	0.0084	0.0046	0.0056	0.0099	0.0064	0.0153	0.0062	0.0026	0.65	0.0134	0.0218
3	2020-08-18 03:00:00	0.0061	0.0109	0.1885	0.0079	0.0157	0.0077	0.019	0.0084	0.0046	0.0056	0.0099	0.0064	0.0153	0.0062	0.0026	0.65	0.0134	0.0218
4	2020-08-18 04:00:00	0.0061	0.0109	0.1885	0.0079	0.0157	0.0077	0.019	0.0084	0.0046	0.0056	0.0099	0.0064	0.0153	0.0062	0.0026	0.65	0.0134	0.0218