Machine Learning ---- Feature Scaling

目录

 一、What is feature scaling::

二、Why do we need to perform feature scaling?

三、How to perform feature scaling:

        1、Normalization:

        2、Mean normalization:

        3、Standardization (data needs to follow a normal distribution):


 一、What is feature scaling:

        Simply put, it is the process of normalizing the units of data, which results in significant differences in the non unit values of various data in the training dataset. However, we use normalization and other methods to stabilize the data range within a relatively small area.

二、Why do we need to perform feature scaling?

        I have read many articles, and it's like how we often have a one-sided understanding of something due to its overly prominent side. For the more valuable side, we unconsciously lean towards the past. It is best for us to understand this point from a contour map:

        Using the example said by Andrew Ng, let's assume that his housing price prediction is:

Total square meter: 300 square meters~2000 square metersNumber of rooms: 1 to 5
w_1 = 50w_2 = 0.1
w_1 = 0.1w_2 = 50

        Meanwhile, assuming b=50, for a 2000 square meter, 5-room house, the normal price would be 500000 yuan:

        At the same time, assuming b=50, for a 2000 square meter, 5-room house, the normal price is 500000 yuan. Therefore, when we bring in two different groups of w1 and w2 in the list, we can find that the factor with the larger value is: the total square * 50+room * 0.1, which gives a value of about 100000 yuan, while the other group is about 500000 yuan.

        We can find that we prefer a smaller value with a larger corresponding coefficient. So, what is the relationship between this and gradient descent?

        We can understand it from the contour map:

        This is a contour map of J(\vec{w},b)  ,So we can take a look at how gradient descent may go if it needs to reach its minimum point:

        Due to the short axis range corresponding to size and the long axis corresponding to room, in order to obtain a minimum value that satisfies the condition through gradient descent, this situation may occur, leading to slower convergence. That's why we need to perform feature scaling, and if the image is not an ellipse but a circle, its effect is the best case.

        At the same time, we can also combine Euclidean distance for understanding

三、How to perform feature scaling:

        1、Normalization:

x^{'} = \frac{x - min(x)}{max(x) - min(x)}

        The corresponding value range is [0,1], but there are also more flexible forms:

x^{'} = a + \frac{x - min(x)}{max(x) - min(x)}(b - a)

        The corresponding value range is [a, b]. Generally speaking, the values of a and b should not be too large or too small, and [-5, 5] are suitable.

        2、Mean normalization:

x^{'} = \frac{x - \bar{x}}{max(x) - min(x)}

        3、Standardization (data needs to follow a normal distribution):

x^{'} = \frac{x - \bar{x}}{\sigma }

        The denominator corresponds to the standard deviation of x, which is actually the standardized formula for a normal distribution:

x^{'} = \frac{x - \mu}{\sigma }

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/548919.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Java——网络编程

网络编程基础类 InetAddress类 java.net.InetAddress类用来封装计算机的IP地址和DNS(没有端口信息),它包括一个主机名和一个ip地址,是java对IP地址的高层表示。大多数其他网络类都要用到这个类,包括Sorket、ServerSocker、URL、DatagramSorket、DatagramPacket等常…

LIGHTHOUSE Apex RBP应用案例|汽车涂装行业 电动汽车电池制造行业的颗粒物监测首选

Lighthouse ApexBP汽车制造中的颗粒物监测技术无疑是汽车制造领域的一项革命性发展。它不仅提供了全面、高精度的颗粒检测,而且能够轻松集成到现有的制造流程中,满足自动化需求,加强质量控制,确保电动汽车电池生产的安全性和效率。…

在Latex中优雅的插入svg图片(Ubuntu22.04)

文章目录 一、前言二、准备工作三、脚本编程四、结论 一、前言 在 LaTeX \LaTeX LATE​X 中,插入图片常用的为 figure 环境加 \includegraphics 命令: \begin{figure}[!htbp]\centering\includegraphics[width\textwidth]{图片名.jpg/jpeg/png/pdf}\c…

【LeetCode热题100】104. 二叉树的最大深度(二叉树)

一.题目要求 给定一个二叉树 root ,返回其最大深度。 二叉树的 最大深度 是指从根节点到最远叶子节点的最长路径上的节点数。 二.题目难度 简单 三.输入样例 示例 1: 输入:root [3,9,20,null,null,15,7] 输出:3 示例 2&am…

【文本挖掘与文本分析】上机实验二

实验目的和要求 了解ROSTContentMining5.8可视化标签云的基本操作;采集某部小说进行分词与词频分析基于某背景图制作词云 或采集二十大报告进行分词与词频分析;基于某背景图制作二十大报告的词云; 数据来源 《射雕英雄传》或《鬼吹灯之精绝…

Java后端面试:框架篇高频面试(Spring、SpringMVC、SpringBoot、MyBatis)

👨‍🎓作者简介:一位大四、研0学生,正在努力准备大四暑假的实习 🌌上期文章:Java后端面试:MySQL面试篇(底层事务、SQL调优) 📚订阅专栏:Java后端面…

数字化转型导师坚鹏:人工智能在金融机构数字化转型中的应用

人工智能在金融机构数字化转型中的应用 课程背景: 金融机构数字化转型离不开人工智能,在金融机构数字化转型中,人工智能起到至关重要的作用,很多机构存在以下问题: 不清楚人工智能产业对我们有什么影响?…

C++_day4:成员函数版本和全局函数版本实现算术运算符的重载

1、成员函数版本和全局函数版本实现算术运算符的重载 程序代码&#xff1a; #include <iostream>using namespace std;//封装一个 名叫Number 的类 class Number {//全局函数做友元&#xff0c;让一些函数访问一个类的私有数据成员friend const Number operator-(const…

JavaWeb后端——分层解耦 IOC DI

分层/三层架构概述 三层架构&#xff1a;Controller、Service、Dao 解耦/IOC&DI概述 分层解耦 容器称为&#xff1a;IOC容器/Spring容器 IOC 容器中创建&#xff0c;管理的对象&#xff0c;称为&#xff1a;bean 对象 IOC&DI入门 实现 IOC&DI 需要的注解&#…

产品经理:前端实现网页防篡改,你会怎么做?

公众号&#xff1a;程序员白特&#xff0c;欢迎一起交流学习~ 如果产品经理要求系统中某个页面的输入框做防止篡改处理&#xff0c;你会怎么做呢&#xff1f; 需求梳理 首先&#xff0c;什么是防篡改&#xff1f; 简单来说&#xff0c;就是用户输入input框值&#xff0c;我们…

TrueNAS怎么设置中文,最新2024版本安装详细说明

首先我们做好安装前的准备工作 1&#xff0c;ISO镜像安装包 2&#xff0c;虚拟机&#xff08;建议使用ESXI虚拟机环境&#xff09; 如果是物理机安装&#xff0c;建议先给底层安装虚拟机系统esxi&#xff0c;再在上面安装方便以后的管理&#xff0c;如果你想物理机直接安装&a…

【SpringCloud】使用Seata实现分布式事务

目录 一、Seata 框架的需求背景二、Seata 事务模式与架构2.1 Seata 组成2.2 Seata 事务模式 三、Seata 实战演示3.1 部署 Seata Server3.1.1 下载 Seata Server3.1.2 更改 Seata Server 配置3.1.3 创建 Seata Server 所需的数据库、数据库表3.1.4 启动 Seata Server 3.2 Seata …