题目
现有一个csv关于2012-2018的股票交易数据文件,有五列数据:开盘价,最高价,最低价,收盘价,成交量,现在训练一个逻辑回归,预测判断次日的股市升还是降。
示例 csv文件下载:https://github.com/JintuZheng/Blog-/blob/master/FB.csv
数据示例:
导入准备
import pandas as pd
import torch
import torch.nn
import torch.optim
from debug import ptf_tensor
数据读取
(1)从CSV文件来
从csv文件读取数据,我们使用Pandas包。
打开文件,并选中所有的数据:
url = 'C:/Users/HUAWEI/Desktop/深度学习/Blog附带代码/FB.csv'
df = pd.read_csv(url, index_col=0) #读取全部数据
附带:read_csv()用法:
index_col = ['col_1','col_2'] # 读取指定的几列
error_bad_lines = False # 当某行数据有问题时,不报错,直接跳过,处理脏数据时使用
na_values = 'NULL' # 将NULL识别为空值
(2)数据预处理
#数据集的处理
'''
因为数据是日期新的占index靠前
'''
train_start, train_end=sum(df.index>='2017'),sum(df.index>='2013')
test_start, test_end=sum(df.index>='2018'),sum(df.index>='2017')
n_total_train = train_end -train_start
n_total_test = test_end -test_start
s_mean=df[train_start:train_end].mean() #计算均值,为归一化做准备
s_std=df[train_start:train_end].std() # 计算标准差,为归一化做准备
n_features=5 # 五个特征量
#选取col from 0-4 也就是Open,High,Low,Close,Volume,并进行归一化
df_feature=((df-s_mean)/s_std).iloc[:,:n_features]
s_labels=(df['Volume']<df['Volume'].shift(1)).astype(int)
##.shift(1)把数据下移一位
#用法参见:https://www.zhihu.com/question/264963268
#label建立的标准:假如今天次日的成交量大于当日的成交量,标签=1,反之=0
(3)Tensor格式转化
reshape(-1,1)=reshape(m,1),m为样本数量
x=torch.tensor(df_feature.values,dtype=torch.float32) # size: [m,5]
ptf_tensor(x,'x')
y=torch.tensor(s_labels.values.reshape(-1,1),dtype=torch.float32) # size [m,1]
ptf_tensor(y,'y')
线性逻辑分类器建立
【第一步】建立一层线性层:
fc=torch.nn.Linear(n_features,1)
【第二步】建立参数(weights和bias)的接口:
weights,bias=fc.parameters()
【第三步】建立损失函数:
criterion=torch.nn.BCEWithLogitsLoss()
【第四步】用参数初始化优化器
optimizer=torch.optim.Adam(fc.parameters())
开始迭代训练
n_steps=20001 #迭代20001次
for step in range(n_steps):
if step:
optimizer.zero_grad() # 梯度清零,不然会叠加的
loss.backward() # 计算参数的梯度
optimizer.step() # 根据参数梯度结果迭代推出新的参数
pred=fc(x) # 计算预测结果
loss=criterion(pred[train_start:train_end],y[train_start:train_end]) #计算预测的损失
if step % 500==0:
#print('#{}, 损失 = {:g}'.format(step, loss))
output = (pred > 0)
correct = (output == y.bool())
n_correct_train = correct[train_start:train_end].sum().item() #计算训练正确的数量
n_correct_test = correct[test_start:test_end].sum().item() #计算测试正确的数量
accuracy_train = n_correct_train / n_total_train #计算精确度
accuracy_test = n_correct_test / n_total_test
print('训练集准确率 = {}, 测试集准确率 = {}'.format(accuracy_train, accuracy_test))
输出结果:
练集准确率 = 0.5119047619047619, 测试集准确率 = 0.456
训练集准确率 = 0.6140873015873016, 测试集准确率 = 0.548
训练集准确率 = 0.6121031746031746, 测试集准确率 = 0.548
训练集准确率 = 0.6121031746031746, 测试集准确率 = 0.548
训练集准确率 = 0.6121031746031746, 测试集准确率 = 0.548
训练集准确率 = 0.6111111111111112, 测试集准确率 = 0.548
训练集准确率 = 0.6101190476190477, 测试集准确率 = 0.548
训练集准确率 = 0.6091269841269841, 测试集准确率 = 0.548
训练集准确率 = 0.6121031746031746, 测试集准确率 = 0.548
训练集准确率 = 0.6121031746031746, 测试集准确率 = 0.548
训练集准确率 = 0.6121031746031746, 测试集准确率 = 0.548
训练集准确率 = 0.6081349206349206, 测试集准确率 = 0.556
训练集准确率 = 0.6130952380952381, 测试集准确率 = 0.556
训练集准确率 = 0.6150793650793651, 测试集准确率 = 0.552
训练集准确率 = 0.6140873015873016, 测试集准确率 = 0.552
训练集准确率 = 0.6160714285714286, 测试集准确率 = 0.564
训练集准确率 = 0.621031746031746, 测试集准确率 = 0.576
训练集准确率 = 0.6259920634920635, 测试集准确率 = 0.588
训练集准确率 = 0.6279761904761905, 测试集准确率 = 0.588
训练集准确率 = 0.626984126984127, 测试集准确率 = 0.588
训练集准确率 = 0.6319444444444444, 测试集准确率 = 0.592
训练集准确率 = 0.6319444444444444, 测试集准确率 = 0.588
训练集准确率 = 0.6319444444444444, 测试集准确率 = 0.584
训练集准确率 = 0.6309523809523809, 测试集准确率 = 0.58
训练集准确率 = 0.626984126984127, 测试集准确率 = 0.576
训练集准确率 = 0.6220238095238095, 测试集准确率 = 0.584
训练集准确率 = 0.621031746031746, 测试集准确率 = 0.584
训练集准确率 = 0.623015873015873, 测试集准确率 = 0.608
训练集准确率 = 0.623015873015873, 测试集准确率 = 0.616
训练集准确率 = 0.623015873015873, 测试集准确率 = 0.6
训练集准确率 = 0.6240079365079365, 测试集准确率 = 0.616
训练集准确率 = 0.625, 测试集准确率 = 0.616
训练集准确率 = 0.6240079365079365, 测试集准确率 = 0.632
训练集准确率 = 0.6240079365079365, 测试集准确率 = 0.628
训练集准确率 = 0.623015873015873, 测试集准确率 = 0.632
训练集准确率 = 0.6190476190476191, 测试集准确率 = 0.624
训练集准确率 = 0.6190476190476191, 测试集准确率 = 0.624
训练集准确率 = 0.6190476190476191, 测试集准确率 = 0.624
训练集准确率 = 0.6180555555555556, 测试集准确率 = 0.62
训练集准确率 = 0.6200396825396826, 测试集准确率 = 0.632
训练集准确率 = 0.621031746031746, 测试集准确率 = 0.632
转载自原文链接, 如需删除请联系管理员。
原文链接:股票成交量预测(Pytorch基础练习),转载请注明来源!