Today we revisit our very first post on the application of visual pattern recognition methods to stock market price predictions. This very first post. Since publication, we have completed several projects involving regression using convolutional neural networks—basically, speed prediction from moving camera images.
Getting market data into images is not a complex process but contains pitfalls: missing data points and multiple possible image sizes. Tensorflow (through Keras) can flow from directory and resize images conserving most of the information; trying to do this with very noisy data with many possible input shapes can erase valuable information in the image during automated image processing.
The objective is to transform an arbitrary quantity of market data into visual form before investigating the amount of information inside the images and how this information can be used for classification or regression. We will use market information using Quantconnect research environment. The code can be used for any other data source as long as the input format is a pandas dataframe with time in the index and values in the columns for a single financial instrument.
Our function will generate an image in which the space axes are individual stocks and time. If using RGB color, the color can encode 3 separate channels that we will use to store data. The input will be the minute resolution market data for a single company, with time in the index and market values in the columns, in this manner:
We will request this dataframe for each company in the market and obtain the corresponding color coding for each of the data elements we need. We will use the closing price for each period, the traded volume for each period, and the difference between the high and low prices for that period. This is just an example. We could encode any three values or any number of values into the image if we allow for the information to become n-dimensional; this information will be difficult to represent graphically but still usable for deep learning tasks.
Our image creation function takes a list of company symbols and a date:
def get_day_picture(symbols, date):
start = date
end = date + timedelta(days=1)
indexes = range(len(symbols))
pixels = 0
pixels_init = False
We take our time to initialize the required variables and counters. We will need one day and the next as the limits to obtain a single day´s worth of minute data from history calls.
We will iterate for every index representing each company, obtain the history for one day and assume that the correct reading for a "normal" trading day is 390 minutes:
for idx in indexes:
stock_history = self.History(symbols[idx], start, end,
Resolution.Minute)
if stock_history.empty:
continue
n_points = 390
stock_history['spread'] = stock_history['high'] -
stock_history['low']
data = stock_history[['close', 'volume', 'spread']].pct_change()
data.replace([np.inf, -np.inf], np.nan, inplace=True)
data = data.fillna(method='bfill').fillna(method='ffill')
We will take the percentage change of the data and remove any infinities and not-a-number (nan) values by filling forward and backward any missing data points. This is our data for the generation of a pixel column array for each of the dataframe columns:
for column in data.columns:
minimum = min(data[column])
data[column] = data[column] + abs(data[column].min())
data = data.values
data = 255*(data / data.max(axis=0))
For each column, we add the minimum of that column to all values to remove negative values that cannot be used to construct color images. Then we scale the values to a maximum of 255 for color-coding.
There can be many erroneous data points that can make our images of slightly different sizes if we miss pixels so that we will pad with zeroes any missing data:
if data.shape != (n_points,3):
z = np.array([0,0,0])
defect = n_points - data.shape[0]
for i in range(0,defect):
data = np.insert(data, -1, z, axis=0)
Any pixel array that is not 390 points long will get the difference padded with zeros. Iterating over a range and applying "np.insert" is possibly not the most efficient way. Alternatively, ellipsis notation ([...:-1], for example) can be used to insert a specific array that is possibly more difficult to read and understand.
The function ends by returning the corresponding daily image:
if pixels_init == False:
pixels = data
pixels_init = True
pixels = np.insert(pixels, -1, data, axis=0)
if type(pixels) == int: return Falsereturn pixels.astype(int).reshape(n_points,-1,3)
If it is the first one, the pixels are the data; if not, we insert the pixel array for the current company. For handling the image with the OpenCV package, we transform it into an array of integers.
The image for April 6th, 2021, obtained through this method is this:
img_1 = get_day_picture(sp500_symbols[0:500], datetime(2021,4,6))
The x-axis contains the companies, these are ordered by their GICS sub-industry. This ordering may contain clustered information as sectors change left to right in the image. The y-axis contains the time with the opening of the market at the top and the close at the bottom. The color encodes, in a manner imperceptible to the human eye: return in the period, traded volume change, and high-to-low difference change. We have all the components for technical analysis in a single image.
This is a five-day sequence:
It looks like noise, with some linear patterns, the images in sequence do not look very exciting. What does the change between images look like? This change will be giving us the daily differences in the values we encoded into the image; we can use the optical flow information as used in visual motion detection from the OpenCV package for this task.
The function to obtain optical flow from a pair of images, with built-in parameters (these could be changed for better visualizations) is this:
def obtain_flows(images):
flows = {}
for index in images:
array_1 = images.get(index)
array_2 = images.get(index+1, None)
if type(array_2) != np.ndarray:
print('End reached.')
break
cv_img_1 = array_1.astype(np.uint8)
cv_img_2 = array_2.astype(np.uint8)
prev = cv2.cvtColor(cv_img_1, cv2.COLOR_BGR2GRAY)
nxt = cv2.cvtColor(cv_img_2, cv2.COLOR_BGR2GRAY)
flow = np.zeros_like(cv_img_1)
flow = cv2.calcOpticalFlowFarneback(prev,
nxt, None,
0.5, 3, 15, 3, 5, 1.2, 0)
magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])
# Sets image hue according to the optical flow # direction
mask = np.zeros_like(cv_img_1)
mask[..., 1] = 255
mask[..., 0] = angle * 180 / np.pi / 2# Sets image value
according to the optical flow# magnitude (normalized)
mask[..., 2] = cv2.normalize(magnitude, None, 0, 255,
cv2.NORM_MINMAX)
# Converts HSV to RGB (BGR) color representation
flows[index] = cv2.cvtColor(mask, cv2.COLOR_HSV2BGR)
return flows
One optical flow field for a pair of images looks like this:
Patterns seem to emerge. This image above is the data flow field for the days 2021,4,5 and 2021,4,6 showing some form of activity by the color changes. It is difficult or outright impossible to interpret what it represents to the human eye and brain. This is what a 5-day variation looks like:
And here is where the predictively powerful data may be. Even if these images do not convey any information to us, mere humans, a super-advanced neural network may be able to make sense of the '90s like GIF images. If there is data in these flows, a convolutional neural network trained in regression could approximate market or sector returns some time into the future. We will train such a network in future posts.
Information in ostirion.net does not constitute financial advice; we do not hold positions in any of the companies or assets that we mention in our posts at the time of posting. If you require quantitative model development, deployment, verification, or validation, do not hesitate and contact us. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, trading, or risk evaluations.
コメント