机器学习-线性回归法

线性回归算法

  • 解决回归问题
  • 思想简单,实现容易
  • 许多强大的非线性模型的基础
  • 结果具有很好的可解释性
  • 蕴含机器学习中的很多重要思想

image.png
样本特征只有一个,称为:简单线性回归
image.png
image.png
image.png
通过分析问题,确定问题的损失函数或者效用函数
通过最优化损失函数或者效用函数,获得机器学习的模型
几乎所有参数学习算法都是这样的套路
image.png

最小二乘法

image.png
image.png
image.png
image.png
image.png
image.png

代码实现 简单线性回归法

加载数据

import numpy as np
import matplotlib.pyplot as plt
x = np.array([1., 2., 3., 4., 5.])
y = np.array([1., 3., 2., 3., 5.])
plt.scatter(x, y)
plt.axis([0, 6, 0, 6])
plt.show()

无标题.png
计算过程

x_mean = np.mean(x)
y_mean = np.mean(y)
num = 0.0
d = 0.0
for x_i, y_i in zip(x, y):num += (x_i - x_mean) * (y_i - y_mean)d += (x_i - x_mean) ** 2a = num/d
b = y_mean - a * x_mean
y_hat = a * x + b

绘图

plt.scatter(x, y)
plt.plot(x, y_hat, color='r')
plt.axis([0, 6, 0, 6])
plt.show()

无标题.png
封装我们自己的SimpleLinearRegression

import numpy as npclass SimpleLinearRegression1:def __init__(self):"""初始化Simple Linear Regression 模型"""self.a_ = Noneself.b_ = Nonedef fit(self, x_train, y_train):"""根据训练数据集x_train,y_train训练Simple Linear Regression模型"""assert x_train.ndim == 1, \"Simple Linear Regressor can only solve single feature training data."assert len(x_train) == len(y_train), \"the size of x_train must be equal to the size of y_train"x_mean = np.mean(x_train)y_mean = np.mean(y_train)num = 0.0d = 0.0for x, y in zip(x_train, y_train):num += (x - x_mean) * (y - y_mean)d += (x - x_mean) ** 2self.a_ = num / dself.b_ = y_mean - self.a_ * x_meanreturn selfdef predict(self, x_predict):"""给定待预测数据集x_predict,返回表示x_predict的结果向量"""assert x_predict.ndim == 1, \"Simple Linear Regressor can only solve single feature training data."assert self.a_ is not None and self.b_ is not None, \"must fit before predict!"return np.array([self._predict(x) for x in x_predict])def _predict(self, x_single):"""给定单个待预测数据x,返回x的预测结果值"""return self.a_ * x_single + self.b_def __repr__(self):return "SimpleLinearRegression1()"

向量化运算

image.png
image.png

class SimpleLinearRegression2:def __init__(self):"""初始化Simple Linear Regression模型"""self.a_ = Noneself.b_ = Nonedef fit(self, x_train, y_train):"""根据训练数据集x_train,y_train训练Simple Linear Regression模型"""assert x_train.ndim == 1, \"Simple Linear Regressor can only solve single feature training data."assert len(x_train) == len(y_train), \"the size of x_train must be equal to the size of y_train"x_mean = np.mean(x_train)y_mean = np.mean(y_train)self.a_ = (x_train - x_mean).dot(y_train - y_mean) / (x_train - x_mean).dot(x_train - x_mean)self.b_ = y_mean - self.a_ * x_meanreturn selfdef predict(self, x_predict):"""给定待预测数据集x_predict,返回表示x_predict的结果向量"""assert x_predict.ndim == 1, \"Simple Linear Regressor can only solve single feature training data."assert self.a_ is not None and self.b_ is not None, \"must fit before predict!"return np.array([self._predict(x) for x in x_predict])def _predict(self, x_single):"""给定单个待预测数据x_single,返回x_single的预测结果值"""return self.a_ * x_single + self.b_def __repr__(self):return "SimpleLinearRegression2()"

衡量线性回归法的指标:MSE,RMSE和MAE

image.png
image.png
image.png
代码演示
加载波士顿房产数据

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
boston = datasets.load_boston()
x = boston.data[:,5] # 只使用房间数量这个特征
y = boston.target

可视化

plt.scatter(x, y)
plt.show()

无标题.png
去除最大值

x = x[y < 50.0]
y = y[y < 50.0]
plt.scatter(x, y)
plt.show()

无标题.png
使用简单线性回归法

from playML.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, seed=666)
from playML.SimpleLinearRegression import SimpleLinearRegression
reg = SimpleLinearRegression()
reg.fit(x_train, y_train)
plt.scatter(x_train, y_train)
plt.plot(x_train, reg.predict(x_train), color='r')
plt.show()

无标题.png
预测

y_predict = reg.predict(x_test)

MSE

mse_test = np.sum((y_predict - y_test)**2) / len(y_test)

RMSE

from math import sqrtrmse_test = sqrt(mse_test)

MAE

mae_test = np.sum(np.absolute(y_predict - y_test))/len(y_test)

image.png
封装

import numpy as np
from math import sqrtdef accuracy_score(y_true, y_predict):"""计算y_true和y_predict之间的准确率"""assert len(y_true) == len(y_predict), \"the size of y_true must be equal to the size of y_predict"return np.sum(y_true == y_predict) / len(y_true)def mean_squared_error(y_true, y_predict):"""计算y_true和y_predict之间的MSE"""assert len(y_true) == len(y_predict), \"the size of y_true must be equal to the size of y_predict"return np.sum((y_true - y_predict)**2) / len(y_true)def root_mean_squared_error(y_true, y_predict):"""计算y_true和y_predict之间的RMSE"""return sqrt(mean_squared_error(y_true, y_predict))def mean_absolute_error(y_true, y_predict):"""计算y_true和y_predict之间的MAE"""return np.sum(np.absolute(y_true - y_predict)) / len(y_true)

scikit-learn中的MSE和MAE

from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
mean_squared_error(y_test, y_predict)
mean_absolute_error(y_test, y_predict)

评价回归算法指标:R Square

image.png
image.png
image.png
image.png
代码

1 - mean_squared_error(y_test, y_predict)/np.var(y_test)

封装

import numpy as np
from math import sqrtdef accuracy_score(y_true, y_predict):"""计算y_true和y_predict之间的准确率"""assert len(y_true) == len(y_predict), \"the size of y_true must be equal to the size of y_predict"return np.sum(y_true == y_predict) / len(y_true)def mean_squared_error(y_true, y_predict):"""计算y_true和y_predict之间的MSE"""assert len(y_true) == len(y_predict), \"the size of y_true must be equal to the size of y_predict"return np.sum((y_true - y_predict)**2) / len(y_true)def root_mean_squared_error(y_true, y_predict):"""计算y_true和y_predict之间的RMSE"""return sqrt(mean_squared_error(y_true, y_predict))def mean_absolute_error(y_true, y_predict):"""计算y_true和y_predict之间的MAE"""assert len(y_true) == len(y_predict), \"the size of y_true must be equal to the size of y_predict"return np.sum(np.absolute(y_true - y_predict)) / len(y_true)def r2_score(y_true, y_predict):"""计算y_true和y_predict之间的R Square"""return 1 - mean_squared_error(y_true, y_predict)/np.var(y_true)

scikit-learn中的 r2_score

from sklearn.metrics import r2_scorer2_score(y_test, y_predict)

多元线性回归

image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png
代码实现

import numpy as np
from .metrics import r2_scoreclass LinearRegression:def __init__(self):"""初始化Linear Regression模型"""self.coef_ = Noneself.intercept_ = Noneself._theta = Nonedef fit_normal(self, X_train, y_train):"""根据训练数据集X_train, y_train训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"X_b = np.hstack([np.ones((len(X_train), 1)), X_train])self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef predict(self, X_predict):"""给定待预测数据集X_predict,返回表示X_predict的结果向量"""assert self.intercept_ is not None and self.coef_ is not None, \"must fit before predict!"assert X_predict.shape[1] == len(self.coef_), \"the feature number of X_predict must be equal to X_train"X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])return X_b.dot(self._theta)def score(self, X_test, y_test):"""根据测试数据集 X_test 和 y_test 确定当前模型的准确度"""y_predict = self.predict(X_test)return r2_score(y_test, y_predict)def __repr__(self):return "LinearRegression()"

使用

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
boston = datasets.load_boston()X = boston.data
y = boston.targetX = X[y < 50.0]
y = y[y < 50.0]
from playML.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, seed=666)
from playML.LinearRegression import LinearRegressionreg = LinearRegression()
reg.fit_normal(X_train, y_train)

image.png

scikit-learn中的线性回归

加载数据

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
boston = datasets.load_boston()X = boston.data
y = boston.targetX = X[y < 50.0]
y = y[y < 50.0]from playML.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, seed=666)

线性回归

from sklearn.linear_model import LinearRegressionlin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

image.png
kNN Regressor

from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()
standardScaler.fit(X_train, y_train)
X_train_standard = standardScaler.transform(X_train)
X_test_standard = standardScaler.transform(X_test)
from sklearn.neighbors import KNeighborsRegressor
knn_reg = KNeighborsRegressor()
knn_reg.fit(X_train_standard, y_train)
knn_reg.score(X_test_standard, y_test)

image.png
超参数

from sklearn.model_selection import GridSearchCVparam_grid = [{"weights": ["uniform"],"n_neighbors": [i for i in range(1, 11)]},{"weights": ["distance"],"n_neighbors": [i for i in range(1, 11)],"p": [i for i in range(1,6)]}
]knn_reg = KNeighborsRegressor()
grid_search = GridSearchCV(knn_reg, param_grid, n_jobs=-1, verbose=1)
grid_search.fit(X_train_standard, y_train)

image.png

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/458243.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Halcon机器视觉实战----提取水平方向缝隙区域

前言 如何从一块区域内找到水平方向的缝隙区域&#xff08;不是高斯线条&#xff0c;从图像中提取&#xff0c;而是从区域内提取&#xff0c;考虑到了区域所在的方向&#xff09;&#xff1b; dev_close_window () dev_open_window (0, 0, 800, 800, black, WindowHandle) re…

从零开始手写mmo游戏从框架到爆炸(六)— 消息处理工厂

就好像门牌号一样&#xff0c;我们需要把消息路由到对应的楼栋和楼层&#xff0c;总不能像菜鸟一样让大家都来自己找数据吧。 首先这里我们参考了rabbitmq中的topic与tag模型&#xff0c;topic对应类&#xff0c;tag对应方法。 新增一个模块&#xff0c;专门记录路由eternity-…

[Tomcat问题]--使用Tomcat 10.x部署项目时,出现实例化Servlet类[xxx]异常

[Tomcat问题]–使用Tomcat 10.x部署项目时&#xff0c;出现实例化Servlet类[xxx]异常 本片博文在知乎同步更新 环境 OS: Windows 11 23H2Java Version: java 21.0.1 2023-10-17 LTSIDE: IntelliJ IDEA 2023.3.3Maven: Apache Maven 3.9.6Tomcat: Tomcat 10.1.18 ReleasedSer…

Netty源码 之 ByteBuf自适应扩缩容源码

Netty体系如何使得ByteBuf根据实际IO收发数据场景进行自适应扩容缩容的&#xff1f; IO收发数据的过程&#xff1a; read 读取&#xff08;"I"&#xff09;&#xff1a;网卡硬件通过网络传输介质读取对端传输过来的数据&#xff0c;网卡硬件再把数据写到recv-socke…

论文阅读——MP-Former

MP-Former: Mask-Piloted Transformer for Image Segmentation https://arxiv.org/abs/2303.07336 mask2former问题是&#xff1a;相邻层得到的掩码不连续&#xff0c;差别很大 denoising training非常有效地稳定训练时期之间的二分匹配。去噪训练的关键思想是将带噪声的GT坐标…

Golang 学习(一)基础知识

面向对象 Golang 也支持面向对象编程(OOP)&#xff0c;但是和传统的面向对象编程有区别&#xff0c;并不是纯粹的面向对象语言。 Golang 没有类(class)&#xff0c;Go 语言的结构体(struct)和其它编程语言的类(class)有同等的地位&#xff0c;Golang 是基于 struct 来实现 OOP…

由vscode自动升级导致的“终端可以ssh服务器,但是vscode无法连接服务器”

问题描述 简单来说就是&#xff0c;ssh配置没动&#xff0c;前两天还可以用vscode连接服务器&#xff0c;今天突然就连不上了&#xff0c;但是用本地终端ssh可以顺利连接。 连接情况 我的ssh配置如下&#xff1a; Host gpu3HostName aaaUser zwx现在直接在终端中进行ssh&am…

分布式事务组件Seata的TCC常见问题及解决方案

分布式事务组件Seata的TCC常见问题及解决方案 在 TCC 模型执行的过程中&#xff0c;还可能会出现各种异常&#xff0c;其中最为常见的有空回滚、幂等、悬挂等。TCC 模式是分布式事务中非常重要的事务模式&#xff0c;但是幂等、悬挂和空回滚一直是 TCC 模式需要考虑的问题&…

用云手机打造tiktok账号需要注意些什么?

随着tiktok平台的火热&#xff0c;越来越多的商家开始尝试更高效的tiktok运营方法。其中&#xff0c;tiktok云手机作为一种新科技引起了很多人的注意&#xff0c;那么用云手机运营tiktok需要注意些什么&#xff1f;下文将对此进行详细解析。 1. 不是所有的云手机都适合做tiktok…

人工智能 | 深度学习的进展

深度学习的进展 深度学习是人工智能领域的一个重要分支&#xff0c;它利用神经网络模拟人类大脑的学习过程&#xff0c;通过大量数据训练模型&#xff0c;使其能够自动提取特征、识别模式、进行分类和预测等任务。近年来&#xff0c;深度学习在多个领域取得了显著的进展&#…

安卓学习笔记之八:本地化的简单例子(kotlin版本)

本地化及多语言支持&#xff0c;是目前手机软件必须面对的问题&#xff0c;这里用一个简单的例子来说明在Android Studio下如何实现。 创建一个Empty Views Activity项目&#xff0c;语言选择Kotlin 实现一个简单的功能&#xff0c;一条欢迎&#xff0c;一个按钮&#xff0c;…

C#(C Sharp)学习笔记_If条件判断语句【五】

前言&#xff1a; 本期学习的是编程语言中的主要语句&#xff1a;if-条件判断语句。在这里我们会学到&#xff1a;if语法&#xff0c;if-else&#xff0c;和if嵌套。话不多说&#xff0c;我们开始吧&#xff01; 什么是条件判断语句&#xff1f; 条件语句是用来判断给定的条件…