I am currently sitting in an apartment in Los Angeles, California (first time to the west coast!) feeling exhausted after my first day participating as a research scholar at the Institute of Research Design in Librarianship at Loyola Marymount University. I described the institute in another post when I was first accepted, and I am positive I will have much more to write on this whole experience as I go through it, but tonight I am going to focus on my experience so far teaching my first full semester course at Cornell Medical– Computational Methods in Health Informatics.

So, how did I come to be teaching this class? Last October, I was asked by the Professor designing this new Comp Methods course for the Masters Program in Health Informatics for suggestions on textbooks and resources that could act as an introduction to data mining methods that wouldn’t be too overwhelming for beginners, and that could possibly make use of a common algorithm suite– I suggested Weka (and this is the book we initially decided to go with). But the more we discussed the course objectives, the more it became obvious that the students should really learn basic R coding– it’s a skillset that they wouldn’t likely learn in their other classes and one that is much more versatile than any other platform they could be using in this domain. And so we were off looking for another textbook, and our conversation about resources grew. I put together a list of suggested books, but in the end, the head instructor decided to go with a book that was suggested to him by a colleague (this one) that had a much less in-depth focus than any I had chosen. The chosen book is more concerned with specific applications of data mining methods and shorter than any other text we could find, which can be good and bad in that there is now an assumption that the students are familiar with much of the background and terminology inherent in this area (and that is certainly not the case). So we decided that I would develop a resource guide for students to supplement the textbook and be a place that students could go to for help on topics that went beyond the scope and goals of the course. The course syllabus is available here: Syllabus_HINF5008, but basically we wanted to have a course that would act as an introduction to data mining and computational approaches so that the students could make informed decisions about methodology and better communicate and collaborate with others using these techniques.

So I produced this resource guide for the class and was invited to introduce these resources and topics in the first lecture. And because I am an R user with Graduate Level training in Data Mining, I was also invited to teach a few other lectures wherein I could include some conversation about data literacy and good data management practices. For example, I was invited to teach our lectures on working with unknown values and exploring datasets graphically– each of these topics are premised with the idea that there may not be enough documentation associated with these data (metadata) to fully understand them, and thus more exploration is warranted before anything else can happen with the dataset. That situation is a good framework to discuss the value of proper data management and curation to prevent situations where not much is known about the data, so I stepped in and talked about some of the services offered by the consulting group I’m in, the Cornell Research Data Management Services Group and some best practices :) I was also invited to teach a few lectures on specific data mining techniques that interested me and to introduce the final project and places to go to to find and download datasets that might be relevant to the students’ interests. The final project rubric isn’t finished yet, but we will be having the students come up with their own data mining question pertaining to a dataset that they go out and find, and to run/explain their analyses while documenting their work. They will then present their findings individually or in groups.

As I mentioned, I am not the only instructor for this class. Rather we ended up basically splitting the content in half (this makes my being out of town much more reasonable– my co-instructor is teaching the two weeks that I am away) and as of this past Wednesday I finished teaching the first half of my content: three 1 hour lectures and three 2 hour labs. So what have I learned?

First, it feels like I’m relearning everything I know about these topics. I’m having to think about ways of approaching data mining and management topics as someone who has never heard anything about them. I was in that position just a few years ago but it’s still a challenge to step back and reframe my explanations in ways that are relatable.

Next, I almost immediately realized that teaching a full semester (even just half of it) is an incredibly time-consuming ordeal! I am still in the process of finding out how/if I will be paid for this unplanned additional teaching outside of the library, but it looks like I will be getting additional compensation since the classes are in the evening. This experience has been so much more intense than the few 1-2 hour lectures I’ve given in the past and I have a whole new-found appreciation for all of the planning that goes into a class!

I’m also seeing that teaching and working with the students is helping me to more fully understand the content of my lectures– they are asking questions! And it’s making me have to think about new angles and ways of interpreting data analyses that I wouldn’t have seen on my own. The students aren’t coming up with their own data stories yet but they are questioning the results of analyses they’re running out of their book which is very useful. It means they are thinking critically about what they’re doing. So I’m finding that teaching things that you’ve learned yourself helps you more comprehensively understand them. I’m also interacting with the students via Canvas which is a completely new experience for me.

To conclude, here’re some lecture outlines from my first lecture and most recent lecture to give you an idea of what’s going on: Lecture1_HINF5008; Lecture7_HINF And an example of how we’re running the lab below. We decided to have them use the textbook as a lab guide and to use lectures as our venue to more fully discuss theory. This is the lab script I wrote for my most recent week as instructor. You can scroll left and right in the code block. Looking forward to the second half of this semester and getting feedback from the students about how they liked the class.

```
# HINF: 5008 Week 7 Lab
# June 11th, 2014; 5-7 pm
# Predicting: Support Vector Machines, Monte Carlo Evaluation
# Book section 3.1-3.4.2.1 (pp. 126-164)
# Name your file using the convention: LastNameFirstInitial_Lab7
# You will need objects that you created in lab 6. If you have cleared your workspace, re-run any needed script from lab 6.
# Remember that the prediction models we're using were all selected because these techniques are well known by their ability
# to handle highly nonlinear regression problems-- like those inherent in time series prediction.
# Many other approaches can be applied to problems like ours though, so do not assume you are limited to the approaches we discuss here.
# You will find numbered questions throughout this lab guide.
# These correspond to questions in a Word document available for download in the Assignments tab of Canvas.
# (The 9 questions are the same in both places.)
# As usual, you will be handing in 1) lab script, and 2) homework assignment
# Please hand in your lab script and homework assignment as two separate files.
# The homework assignment for this week is to answer the 9 questions.
# Use the homework Word document (rather than the lab script) as a template to record your answers for the homework.
# To facilitate grading, please include only the answers to the questions in the homework. In other words, all code
# and terminal output should be restricted to the lab script; please do not include any script or code in the homework
# unless it's necessary to answer the question.
#Load packages from book:
install.packages("randomForest")
library(randomForest)
install.packages("quantmod")
library(quantmod)
install.packages("kernlab")
library(kernlab)
install.packages("e1071")
library(e1071)
install.packages("mda")
library(mda)
install.packages("PerformanceAnalytics")
library(PerformanceAnalytics)
# Support Vector Machines (SVMs) - supervised learning method for classification and regression tasks
# ** Question 1: Review - What is the difference between classification and regression?
# You may use online resources or resources on the class guide at http://med.cornell.libguides.com/HINF5008
# to answer this question if necessary- please cite your source if this is the case
# SVM is better at generalizing than our earlier ANN
# The basic idea behind SVMs is that of mapping the original data into a
# new, high-dimensional space so that it's possible to apply linear models to
# obtain a separating hyperplane (pg.127)
# The mapping of the original data into this new space is carried out with the help of kernel functions
# See lecture 7 notes and Section 9.3 in Han's "Data Mining: Concepts and Techniques" available on the resource guide for more
# on information on Kernel functions
# SVMs maximize the separation margin between cases belonging to diverent classes (pg.127)
# Try a regression task with SVM (pg.128)
sv <- svm(Tform, Tdata.train[1:1000, ], gamma = 0.001, cost = 100)
s.preds <- predict(sv, Tdata.train[1001:2000, ])
sigs.svm <- trading.signals(s.preds, 0.1, -0.1)
true.sigs <- trading.signals(Tdata.train[1001:2000, "T.ind.GSPC"],0.1,-0.1)
sigs.PR(sigs.svm, true.sigs)
# ** Question 2: What can we observe about the precision and recall of this example compared to the ANN from week 6 (pg.126)?
# Now try a classification task with SVM (pg.128)
data <- cbind(signals = signals, Tdata.train[, -1])
ksv <- ksvm(signals ~ ., data[1:1000, ], C = 10)
ks.preds <- predict(ksv, data[1001:2000, ])
sigs.PR(ks.preds, data[1001:2000, 1])
# ** Question 3: Why did we change the C parameter of the ksvm() function?
# See pg. 128, online resources, or use the ?ksvm command for more information about this function
# We will skip to Section 3.5 (pg. 130)
# Predictions into Actions -
# We will examine how the signal predictions we obtained with our models be used (assuming we are trading in future markets).
# Stock specific terms: (pg. 131)
# Markets are based on contracts to buy or sell a commodity on a certain date at
# the price determined by the market at the future time
# Long positions are opened by buying at time t and price p, and selling later (t + x)
# Short positions are when a trader sells at time t with the obligation of buying in the future
# Generally one opens short positions when we believe prices are going down, and long positions when we believe prices are going up.
# The trading strategies defined on pages 131-132 are summarized here:
# First trading strategy we will employ:
# End of first day, models provide evidence that prices are going down-- a low value of T (the sell signal)
# Therefore we issue sell order if one is not already being issued.
# When this order is carried out by the market at price pc sometime in the future, we will immediately post 2 other orders:
# 1. a "buy limit order" with a limit price of pr - p%, where p% is the target profit margin --
# this will only be carried out if the market price reaches the target limit price or below--
# This order expresses our target profit for the port position just opened -- we will wait 10 days for the target to be reached
# If the order isn't carried out by the 10th day we will buy at the closing price of the 10th day.
# 2. a "buy stop order" with a price limit of pr +1% --
# this order is placed with the goal of limiting eventual losses to 1%-- it will be executed if the market reaches the price pr + 1%
# Second trading strategy we will employ:
# End of first day, models provide evidence that prices are going up-- a high value of T (the sell signal)
# Therefore we issue a long order if one is not already being issued.
# We will post a buy order that will be accomplished at time t and price pr, and immediately post 2 other orders:
# 1. a "sell limit order" with a limit price of pr + p%, where p% is the target profit margin --
# this will only be carried out if the market price reaches the target limit price or above-- Sell limit order will have 10 day deadline
# 2. a "sell stop order" with a price limit of pr - 1% --
# this order is placed with the goal of limiting eventual losses to 1%-- it will be executed if the market reaches the price pr - 1%
# The metrics from 3.3.4 do not fully translate to overall economic performance, so we will use the R package:
# PerformanceAnalytics to analyse our performance metrics
# With respect to the overall results we will use:
# 1. Net balance between initial capital and the capital at the end of the testing period (profit/loss)
# 2. Percentage return that this net balance represents
# 3. The excess return over the buy and hold strategy
# More on these metrics is available on pg. 132
# For risk-related measures, we will use the Sharpe ratio coefficient to measure the return per unit of risk
# (the standard deviation of the returns)
# We will also calculate maximum draw-down-- this measures the maximum cumulative successive loss of the model
# Performance of the positions hold during the test period will be evaluated
# A simulated trader will be used to put everything together (pg. 133)
# The function trading.simulator() will be used-- this function is in the book package DMwR
# the result of the trader is an object of class tradeRecord containing information of the simulation--
# the object can be used in other functions to obtain economic evaluation metrics or graphs of the traidng activity
# the user needs to supply the simulator with trading policy functions written in such a way that the user is aware
# of how the simulator calls them
# at the end of each day d, the simulator calls the trading policy with 4 main arguments:
# 1. a vector with predicted signals until day d
# 2. market quotes up to day d
# 3. the currently opened positions
# 4. the money currently available to the trader
# Run the trading strategies reading the comments so that you understand the functions
# Strategy 1:
policy.1 <- function(signals,market,opened.pos,money,
bet=0.2,hold.time=10,
exp.prof=0.025, max.loss= 0.05
)
{
d <- NROW(market) # this is the ID of today
orders <- NULL
nOs <- NROW(opened.pos)
# nothing to do!
if (!nOs && signals[d] == 'h') return(orders)
# First lets check if we can open new positions
# i) long positions
if (signals[d] == 'b' && !nOs) {
quant <- round(bet*money/market[d,'Close'],0)
if (quant > 0)
orders <- rbind(orders,
data.frame(order=c(1,-1,-1),order.type=c(1,2,3),
val = c(quant,
market[d,'Close']*(1+exp.prof),
market[d,'Close']*(1-max.loss)
),
action = c('open','close','close'),
posID = c(NA,NA,NA)
)
)
# ii) short positions
} else if (signals[d] == 's' && !nOs) {
# this is the nr of stocks we already need to buy
# because of currently opened short positions
need2buy <- sum(opened.pos[opened.pos[,'pos.type']==-1,
"N.stocks"])*market[d,'Close']
quant <- round(bet*(money-need2buy)/market[d,'Close'],0)
if (quant > 0)
orders <- rbind(orders,
data.frame(order=c(-1,1,1),order.type=c(1,2,3),
val = c(quant,
market[d,'Close']*(1-exp.prof),
market[d,'Close']*(1+max.loss)
),
action = c('open','close','close'),
posID = c(NA,NA,NA)
)
)
}
# Now lets check if we need to close positions
# because their holding time is over
if (nOs)
for(i in 1:nOs) {
if (d - opened.pos[i,'Odate'] >= hold.time)
orders <- rbind(orders,
data.frame(order=-opened.pos[i,'pos.type'],
order.type=1,
val = NA,
action = 'close',
posID = rownames(opened.pos)[i]
)
)
}
orders
}
#Strategy 2.
policy.2 <- function(signals,market,opened.pos,money,
bet=0.2,exp.prof=0.025, max.loss= 0.05
)
{
d <- NROW(market) # this is the ID of today
orders <- NULL
nOs <- NROW(opened.pos)
# nothing to do!
if (!nOs && signals[d] == 'h') return(orders)
# First lets check if we can open new positions
# i) long positions
if (signals[d] == 'b') {
quant <- round(bet*money/market[d,'Close'],0)
if (quant > 0)
orders <- rbind(orders,
data.frame(order=c(1,-1,-1),order.type=c(1,2,3),
val = c(quant,
market[d,'Close']*(1+exp.prof),
market[d,'Close']*(1-max.loss)
),
action = c('open','close','close'),
posID = c(NA,NA,NA)
)
)
# ii) short positions
} else if (signals[d] == 's') {
# this is the money already committed to buy stocks
# because of currently opened short positions
need2buy <- sum(opened.pos[opened.pos[,'pos.type']==-1,
"N.stocks"])*market[d,'Close']
quant <- round(bet*(money-need2buy)/market[d,'Close'],0)
if (quant > 0)
orders <- rbind(orders,
data.frame(order=c(-1,1,1),order.type=c(1,2,3),
val = c(quant,
market[d,'Close']*(1-exp.prof),
market[d,'Close']*(1+max.loss)
),
action = c('open','close','close'),
posID = c(NA,NA,NA)
)
)
}
orders
}
# ** Question 4: Explain the input parameters for the functions that define policy.1 (pg. 133)
# ** signals, market,opened.pos,money, bet=0.2, hold.time=10, exp.prof=0.025, max.loss= 0.05
#Run the trading simulator with the first policy:
# Train and test periods
start <- 1
len.tr <- 1000
len.ts <- 500
tr <- start:(start+len.tr-1)
ts <- (start+len.tr):(start+len.tr+len.ts-1)
# getting the quotes for the testing period
data(GSPC)
date <- rownames(Tdata.train[start+len.tr,])
market <- GSPC[paste(date,'/',sep='')][1:len.ts]
# learning the model and obtaining its signal predictions
library(e1071)
s <- svm(Tform,Tdata.train[tr,],cost=10,gamma=0.01)
p <- predict(s,Tdata.train[ts,])
sig <- trading.signals(p,0.1,-0.1)
# now using the simulated trader
t1 <- trading.simulator(market,sig,
'policy.1',list(exp.prof=0.05,bet=0.2,hold.time=30))
# Check the results:
t1
tradingEvaluation(t1)
# Try plotting the results:
plot(t1, market, theme = "white", name = "SP500")
# Results of this trader are bad-- there is a negative return. Try the second policy
t2 <- trading.simulator(market, sig, "policy.2", list(exp.prof = 0.05, bet = 0.3))
summary(t2)
tradingEvaluation(t1)
# the return decreased further
# try a different training and testing period:
start <- 2000
len.tr <- 1000
len.ts <- 500
tr <- start:(start + len.tr - 1)
ts <- (start + len.tr):(start + len.tr + len.ts - 1)
s <- svm(Tform, Tdata.train[tr, ], cost = 10, gamma = 0.01)
p <- predict(s, Tdata.train[ts, ])
sig <- trading.signals(p, 0.1, -0.1)
t2 <- trading.simulator(market, sig, "policy.2", list(exp.prof = 0.05, bet = 0.3))
summary(t2)
tradingEvaluation(t2)
# This result was even worst-- do not be fooled by a few repetitions of the same experiment
# even if it includes 2 years of training and testing periods--
# we need more repetitions under different contions to ensure statistical reliability of our results
## Model Evaluation and Selection- How to obtain reliable estimates of the selected evalutation criteria
# Monte Carlo Estimates
# We will use these to estimate the reliability of our evaluation metrics because we cannot use cross-validatation
# ** Question 5: Why can we not use cross-validation? (pg. 141)
# We will use a train + test setup to obtain our estimates that ensures that the size of both the train and test sets
# used are smaller than N to ensure we can randomly generate different experimental scenarios
# We will use a training set of 10 years and a test set of 5 years (pg. 142) in a Monte Carlo experiment to obtain reliable
# measures of our evaluation metrics
# ** Question 6: Which windowing technique are we using here? (pg. 122 for review)
# We will then cary out paired comparisons to obtain statistical confidence levels on the observed differences of mean performance
# Create the following functions (pg. 143-144) that will be used to carry out the full train + test + evaluate cycle using different models
# Names ending in R are regression models, names ending in C are Classification models
MC.svmR <- function(form, train, test, b.t = 0.1, s.t = -0.1,
...) {
require(e1071)
t <- svm(form, train, ...)
p <- predict(t, test)
trading.signals(p, b.t, s.t)
}
MC.svmC <- function(form, train, test, b.t = 0.1, s.t = -0.1,
...) {
require(e1071)
tgtName <- all.vars(form)[1]
train[, tgtName] <- trading.signals(train[, tgtName],
b.t, s.t)
t <- svm(form, train, ...)
p <- predict(t, test)
factor(p, levels = c("s", "h", "b"))
}
MC.nnetR <- function(form, train, test, b.t = 0.1, s.t = -0.1,
...) {
require(nnet)
t <- nnet(form, train, ...)
p <- predict(t, test)
trading.signals(p, b.t, s.t)
}
MC.nnetC <- function(form, train, test, b.t = 0.1, s.t = -0.1,
...) {
require(nnet)
tgtName <- all.vars(form)[1]
train[, tgtName] <- trading.signals(train[, tgtName],
b.t, s.t)
t <- nnet(form, train, ...)
p <- predict(t, test, type = "class")
factor(p, levels = c("s", "h", "b"))
}
MC.earth <- function(form, train, test, b.t = 0.1, s.t = -0.1,
...) {
require(earth)
t <- earth(form, train, ...)
p <- predict(t, test)
trading.signals(p, b.t, s.t)
}
single <- function(form, train, test, learner, policy.func,
...) {
p <- do.call(paste("MC", learner, sep = "."), list(form,
train, test, ...))
eval.stats(form, train, test, p, policy.func = policy.func)
}
slide <- function(form, train, test, learner, relearn.step,
policy.func, ...) {
real.learner <- learner(paste("MC", learner, sep = "."),
pars = list(...))
p <- slidingWindowTest(real.learner, form, train, test,
relearn.step)
p <- factor(p, levels = 1:3, labels = c("s", "h", "b"))
eval.stats(form, train, test, p, policy.func = policy.func)
}
grow <- function(form, train, test, learner, relearn.step,
policy.func, ...) {
real.learner <- learner(paste("MC", learner, sep = "."),
pars = list(...))
p <- growingWindowTest(real.learner, form, train, test,
relearn.step)
p <- factor(p, levels = 1:3, labels = c("s", "h", "b"))
eval.stats(form, train, test, p, policy.func = policy.func)
}
# The above functions obtain predictions and collect the evaluation statistics that we want to estimate
# We do this using eval.stats (pg. 145) defined as:
eval.stats <- function(form,train,test,preds,b.t=0.1,s.t=-0.1,...) {
# Signals evaluation
tgtName <- all.vars(form)[1]
test[,tgtName] <- trading.signals(test[,tgtName],b.t,s.t)
st <- sigs.PR(preds,test[,tgtName])
dim(st) <- NULL
names(st) <- paste(rep(c('prec','rec'),each=3),
c('s','b','sb'),sep='.')
# Trading evaluation
date <- rownames(test)[1]
market <- GSPC[paste(date,"/",sep='')][1:length(preds),]
trade.res <- trading.simulator(market,preds,...)
c(st,tradingEvaluation(trade.res))
}
# Next we set up a loop to go over a set of alternative trading systems (pg. 145)
# that calls the Monte Carlo routines (single, slide, and grow) with proper parameters to obtain estimates of their performace
pol1 <- function(signals,market,op,money)
policy.1(signals,market,op,money,
bet=0.2,exp.prof=0.025,max.loss=0.05,hold.time=10)
pol2 <- function(signals,market,op,money)
policy.1(signals,market,op,money,
bet=0.2,exp.prof=0.05,max.loss=0.05,hold.time=20)
pol3 <- function(signals,market,op,money)
policy.2(signals,market,op,money,
bet=0.5,exp.prof=0.05,max.loss=0.05)
# We are now able to run the Monte Carlo experiment (code on pages 146-147) but we will NOT--
# Just look over the code and read the comments
# ** Question 7: Why aren't we running the Monte Carlo code? (pg. 146)
# Results Analysis
# Download the objects resulting from the code at http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/extraFiles.html
# We will NOT examine the file "earth.Rdata"
getwd() #make sure the files are in your working directory
load("svmR.Rdata")
load("svmC.Rdata")
load("nnetR.Rdata")
load("nnetC.Rdata")
# Precision is more important than recall in this application (pg. 148)
# We will use the function rankSystems() to examine our results
# ** Question 8: Why is precision more important than recall here? (pg. 148)
# Examine: the return of the systems (Ret), the return over the buy and hold strategy (RetOverBH),
# Percentage of profitable trades (PrecProf), SharpeRatio, and Maximum Draw-down (MaxDD) (pg. 149-150)
tgtStats <- c('prec.sb','Ret','PercProf',
'MaxDD','SharpeRatio')
allSysRes <- join(subset(svmR,stats=tgtStats),
subset(svmC,stats=tgtStats),
subset(nnetR,stats=tgtStats),
subset(nnetC,stats=tgtStats),
by = 'variants')
rankSystems(allSysRes,5,maxs=c(T,T,T,F,T))
# We have suspicious scores in our precision of buy/sell signals (obtaining 100% precision seems odd)
# Inspect these results closer:
summary(subset(svmC,
stats=c('Ret','RetOverBH','PercProf','NTrades'),
vars=c('slide.svmC.v5','slide.svmC.v6')))
#At most these methods made a single trade over the testing period with an average return of 0.25%,
# which is −77.1% below the naive buy and hold strategy. These models are useless (pg. 151)
# To reach some conclusions on the value of these variants we need to add some constraints on some of the stats
# We want a resonalble number of average trades (more than 20), an average return that is at least greater than .5%, and
# a percentage of profitable trades higher than 40%
# Check to see if there are systems that satisfy these constrains:
fullResults <- join(svmR, svmC, nnetC, nnetR, by = "variants")
nt <- statScores(fullResults, "NTrades")[[1]]
rt <- statScores(fullResults, "Ret")[[1]]
pp <- statScores(fullResults, "PercProf")[[1]]
s1 <- names(nt)[which(nt > 20)]
s2 <- names(rt)[which(rt > 0.5)]
s3 <- names(pp)[which(pp > 40)]
namesBest <- intersect(intersect(s1, s2), s3)
summary(subset(fullResults,
stats=tgtStats,
vars=namesBest))
# only 3 of the trading systems satisfy these criteria, and all of them use the regression task (have an R at the end of their name)
# The Ret of the single.nnetR.v2 shows marked instability (pg. 153) so we will compare the other two which have similar scores:
compAnalysis(subset(fullResults,
stats=tgtStats,
vars=namesBest)) # it's ok if you get warnings here (pg. 154)
# Despite the variability of the results, the above Wilcoxon signiﬁcance test tells us that the average return of
# “single.nnetR.v12”is higher than those of the other systems with 95% conﬁdence.
# Yet, with respect to the other statistics, this variant is clearly worse.
# Try Plotting to get a better idea of the distribution of the scores across all 20 repetitons:
plot(subset(fullResults,
stats=c('Ret','PercProf','MaxDD'),
vars=namesBest))
#The scores of the two systems using windowing schemas are very similar, but the results of “single.nnetR.v12” are distinct.
# We can observe that the high average return is achieved thanks to an abnormal (around 2800%) return in one of
# the iterations of the Monte Carlo experiment.
# The remainder of the scores for this system seem inferior to the scores of the other two.
# Evaluating the final Test Data
# This section presents the results obtained by the "best" models in the final evaluation period.
# This period is formed by 9 years of quotes and we will apply the five selected systems (pg. 156)
# obtain the evaluation statistics of these systems on the 9-year test period
# We need the last 10 years before the evaluation period-- the models will be obtained with these 10 years of data
# and then will be asked to make their signal predictions for the 9 year evaluation period
#Check out our best model:
getVariant("grow.nnetR.v12", fullResults) # (pg. 157)
# Conduct a deeper analysis to obtain the trading record of the system during this period
data <- tail(Tdata.train, 2540)
model <- learner("MC.nnetR", list(maxit = 750, linout = T,
trace = F, size = 10, decay = 0.001))
preds <- growingWindowTest(model, Tform, data, Tdata.eval,
relearn.step = 120)
signals <- factor(preds, levels = 1:3, labels = c("s", "h",
"b"))
date <- rownames(Tdata.eval)[1]
market <- GSPC[paste(date, "/", sep = "")][1:length(signals),
]
trade.res <- trading.simulator(market, signals, policy.func = "pol2")
#plot the results
plot(trade.res, market, theme = "white", name = "SP500 - final test")
# ** Question 9: Save your final plot as a .png and insert it into your word doc homework submission
```

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.