In this work we set out to determine the impact, if any, of the analysis of news on stock price prediction, that is, are we able to predict stock movements more accurately on a consistent basis than a proposed baseline or random guessing on the basis of news’ text analysis. We co
...
In this work we set out to determine the impact, if any, of the analysis of news on stock price prediction, that is, are we able to predict stock movements more accurately on a consistent basis than a proposed baseline or random guessing on the basis of news’ text analysis. We considered a methodology to be more accurate if its success rate is greater than that of a baseline or random guess. We considered a methodology to be consistently more accurate if the average of the success rates over a specified number of runs, say one hundred, is greater than that of a baseline or random guess. As a result, we discovered that the analysis of news, though readily available with modern day technological advancements, does come paired with some problems. 1. The widespread availability of news has made it more difficult to find that news which is of importance to us, news can cover anything and everything. 2. The content of news can discuss events happening anywhere from far past to the far future, making consistent analysis difficult. 3. Most financial news sources tend to block any mass datamining attempts. These problems can mostly be solved by making use of so-called 8-K reports. These reports only cover major events of companies sorted into nine different categories. The 8-K reports reduce the time interval the news impacts from the far past and far future to an interval of five business days, as the reports ought to be published within four business days. Finally, since companies are obligated to publish these reports by the U.S. securities and exchange commission, the reports are readily available and easily accessible through the U.S. securities and exchange commission website. We can then use these texts and analyze them using a rule-based or automatic text analysis approach. However, the rule-based text approach, using lists of positive and negative words for the analysis, tends to be unreliable as text contains a plethora of challenging cases. This problem is solved by using an automatic text analysis, using predetermined scores for texts. The form of automatic text analysis used, is a decision tree approach. Though single decision trees we construct have the characteristic to over-fit, we can construct random forests of decision trees on subsets of our input data to solve this problem. For our analysis we looked at the stocks prices of Tesla, Microsoft, EA and Amazon, due to their varying values. We gave scores to the texts of the 8-K reports using the stock price movement of the day of publishing. We did this for up to 4 business days prior to publishing as well. We also compensated for the market movements using the variable for days zero to four. We gave scores from -1, 0 or 1 dependent on the price movement. This generally resulted in success rates greater than our considered baseline of 33.33% of random guessing. The highest success rate for Tesla, Microsoft, EA and Amazon were in order: 73.09%, 100.00%, 88.64%, 84.95%.