【机器学习】Nonlinear Independent Component Analysis - Aapo Hyvärinen

Linear independent component analysis (ICA)

x i ( k ) = ∑ j = 1 n a i j s j ( k ) for all  i = 1 … n , k = 1 … K ( ) x_i(k) = \sum_{j=1}^{n} a_{ij}s_j(k) \quad \text{for all } i = 1 \ldots n, k = 1 \ldots K \tag{} xi(k)=j=1naijsj(k)for all i=1n,k=1K()

  • x i ( k ) x_i(k) xi(k) is the i i i-th observed signal in sample point k k k (possibly time)
  • a i j a_{ij} aij constant parameters describing “mixing”
  • Assuming independent, non-Gaussian latent “sources” s j s_j sj
  • ICA is identifiable, i.e. well-defined. Observing only x i x_i xi we can recover both a i j a_{ij} aij and s j s_j sj .

Fundamental difference between ICA and PCA

  • PCA doesn’t find the original coordinates, ICA does.

在这里插入图片描述

  • PCA, Gaussian factor analysis are not identifiable:
    • Any orthogonal rotation is equivalent: s ′ = U s s' = Us s=Us has same distribution.

Nonlinear ICA is an unsolved problem

  • Extend ICA to nonlinear case to get general disentanglement?

  • Unfortunately, “basic” nonlinear ICA is not identifiable:

  • If we define nonlinear ICA model for random variables ( x_i ) as

    x i = f i ( s 1 , … , s n ) , i = 1 … n x_i = f_i(s_1, \ldots, s_n) , i = 1 \ldots n xi=fi(s1,,sn),i=1n

    we cannot recover original sources (Darmois, 1952; Hyvärinen & Pajunen, 1999)

Darmois construction

  • Darmois (1952) showed the impossibility of nonlinear ICA:

  • For any x 1 , x 2 x_1, x_2 x1,x2, can always construct y = g ( x 1 , x 2 ) y = g(x_1, x_2) y=g(x1,x2) independent of x 1 x_1 x1 as

    g ( ξ 1 , ξ 2 ) = P ( x 2 < ξ 2 ∣ x 1 = ξ 1 ) g(\xi_1, \xi_2) = P(x_2 < \xi_2 | x_1 = \xi_1) g(ξ1,ξ2)=P(x2<ξ2x1=ξ1)

  • Independence alone too weak for identifiability:

    • We could take x 1 x_1 x1 as an independent component which is absurd
  • Looking at non-Gaussianity equally absurd:

    • Scalar transform h ( x 1 ) h(x_1) h(x1) can give any distribution

Time-contrastive learning

  • Observe n n n-dim time series x ( t ) x(t) x(t)
  • Divide x ( t ) x(t) x(t) into T T T segments (e.g., bins with equal sizes)
  • Train MLP to tell which segment a single data point comes from
    • Number of classes is T T T
    • Labels given by index of segment
    • Multinomial logistic regression
  • In hidden layer h h h, NN should learn to represent nonstationarity 非平稳性 (= differences between segments)
  • Could this really do Nonlinear ICA?
Pasted image 20231120155648
  • Assume data follows nonlinear ICA model x ( t ) = f ( s ( t ) ) x(t) = f(s(t)) x(t)=f(s(t)) with
    • smooth, invertible nonlinear mixing f : R n → R n f : \mathbb{R}^n \rightarrow \mathbb{R}^n f:RnRn
    • components s i ( t ) s_i(t) si(t) are nonstationary, e.g., in variances
  • Assume we apply time-contrastive learning on x ( t ) x(t) x(t)
    • using MLP with hidden layer in h ( x ( t ) ) h(x(t)) h(x(t)) with dim ( h ) = dim ( x ) \text{dim}(h) = \text{dim}(x) dim(h)=dim(x)
  • Then, TCL will find s ( t ) 2 = A h ( x ( t ) ) s(t)^2 = Ah(x(t)) s(t)2=Ah(x(t)) for some linear mixing matrix A A A. (Squaring is element-wise)
  • I.e.: TCL demixes nonlinear ICA model up to linear mixing (which can be estimated by linear ICA) and up to squaring.
  • This is a constructive proof of identifiability
  • Imposing independence at every segment -> more constraints -> unique solution. 增加了限制保证了indentifiability

用MLP,通过自监督分类(某一个信号来自于哪个时间段)来训练网络。这样MLP可以表示不同时间段内的信号差。而后原始信号 s 2 s^2 s2 可以表示为观测值(x)经MLP隐藏层分离结果的线性组合。

Deep Latent Variable Models

  • General framework with observed data vector x x x and latent s s s:
    p ( x , s ) = p ( x ∣ s ) p ( s ) , p ( x ) = ∫ p ( x , s ) d s p(x, s) = p(x|s)p(s), \quad p(x) = \int p(x, s)ds p(x,s)=p(xs)p(s),p(x)=p(x,s)ds
    where θ \theta θ is a vector of parameters, e.g., in a neural network

  • In variational autoencoders (VAE):

    • Define prior so that s s s white Gaussian (thus s i s_i si; all independent)
    • Define posterior so that x = f ( s ) + n x = f(s) + n x=f(s)+n
  • Looks like Nonlinear ICA, but not identifiable

    • By Gaussianity, any orthogonal rotation is equivalent:
      s ′ = M s has exactly the same distribution if  M T M = I s' = Ms \text{ has exactly the same distribution if } M^TM = I s=Ms has exactly the same distribution if MTM=I

Conditioning DLVM’s by another variable

通过引入一个新的变量u来解,比如找视频和音频的关系,时间t就可以作为辅助变量(auxiliary varibale)。通过条件独立(conditional independent)来解。

Conclusion

  • Typical deep learning needs class labels, or some targets

  • If no class labels: unsupervised learning

  • Independent component analysis is a principled approach

    • can be made nonlinear
  • Identifiable: Can recover components that actually created the data (unlike PCA, VAE etc)

  • Special assumptions needed for identifiability, one of:

    • Nonstationarity (“time-contrastive learning”)
    • Temporal dependencies (“permutation-contrastive learning”)
    • Existence of auxiliary (conditioning) variable (e.g., “iVAE”)
  • Self-supervised methods are easy to implement

  • Connection to DLVM’s can be made → iVAE

  • Principled framework for “disentanglement”

总结来说Linear ICA是可解的,对于Nonlinear ICA则需要增加额外的假设才能可解(原始信号可分离)。Nonlinear ICA的思想可以用在深度学习的其他模型上。

Reference

  1. https://www.youtube.com/watch?v=_cBLSNRWt8c

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/208216.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【done+重点】剑指Offer56-I:找出数组中2个只出现1次的整数

力扣&#xff0c;https://leetcode.cn/problems/shu-zu-zhong-shu-zi-chu-xian-de-ci-shu-lcof/description/ 题目&#xff1a;一个整型数组nums里除两个数字之外&#xff0c;其他数字都出现了两次。请写程序找出这两个只出现一次的数字。要求时间复杂度是O(n)&#xff0c;空间…

前缀和——DP35 【模板】二维前缀和

文章目录 &#x1f34e;1. 题目&#x1f352;2. 算法原理&#x1f345;3. 代码实现 &#x1f34e;1. 题目 题目链接&#xff1a;【模板】二维前缀和_牛客题霸_牛客网 (nowcoder.com) 描述 给你一个 n 行 m 列的矩阵 A &#xff0c;下标从1开始。 接下来有 q 次查询&#xff0…

Can‘t open the append-only file: Permission denied

redis rdb aof-CSDN博客 Cant open the append-only file: Permission denied E:\Document_Redis_Windows\redis-2.4.5-win32-win64\64bit E:\Document_Redis_Windows\redis-2.4.5-win32-win64\64bit\redis.conf 还是不行&#xff0c;就要修改权限了&#xff0c;windows【完全控…

智慧储能边缘计算网关应用,提升能源效率

智慧储能通过边缘计算网关物联网技术来实现对储能电池等设备的在线监控和远程管理。边缘计算网关可以将储能数据转化为可用的信息&#xff0c;并传输到储能系统中&#xff0c;为储能管理提供优化与调度等数据支持。 边缘计算网关在智慧储能系统中起到了关键的作用。IR4000边缘计…

探究Kafka原理-1.初识Kafka

&#x1f44f;作者简介&#xff1a;大家好&#xff0c;我是爱吃芝士的土豆倪&#xff0c;24届校招生Java选手&#xff0c;很高兴认识大家&#x1f4d5;系列专栏&#xff1a;Spring源码、JUC源码、Kafka原理&#x1f525;如果感觉博主的文章还不错的话&#xff0c;请&#x1f44…

PCIE链路训练-状态跳转1

A&#xff1a;12ms超时&#xff0c;或者再任何lane上检测到Electrical Idle Exit&#xff1b; B&#xff1a; 1.发送“receiver detection”之后没有一个lane的接收逻辑被rx检测到 2.不满足条件c&#xff0c;比如两次detection出现差别&#xff1b; C&#xff1a;发送端在没…

HarmonyOS(三)—— 应用程序入口—UIAbility

前言 学习过android的同学都是知道Activity&#xff0c;Activity是Android组件中最基本也是最为常见用的四大组件之一&#xff0c;用户可以用来交互为了完成某项任务。 Activity中所有操作都与用户密切相关&#xff0c;是一个负责与用户交互的组件&#xff0c;可以通过setCon…

leetcode:520. 检测大写字母

一、题目&#xff1a; 链接&#xff1a;520. 检测大写字母 - 力扣&#xff08;LeetCode&#xff09; 函数原型&#xff1a;bool detectCapitalUse(char* word) 二、思路&#xff1a; 本题较为简单&#xff0c;分为三种情况&#xff1a; 1.首字母大写&#xff0c;其余小写 2.首字…

基于STM32的色彩识别与分类算法优化

基于STM32的色彩识别与分类算法优化是一项与图像处理和机器学习相关的研究任务&#xff0c;旨在实现高效的色彩识别和分类算法在STM32微控制器上的运行。本文将介绍基于STM32的色彩识别与分类算法优化的原理和实现步骤&#xff0c;并提供相应的代码示例。 1. 色彩识别与分类概…

红黑树java实现

红黑树的性质 红黑树是一课二叉搜索树&#xff0c;它在每个结点上增加了一个存储位来表示结点的颜色&#xff0c;可以使RED或BLACK。通过对任何一条从根到叶子的简单路径上各个结点的颜色进行约束&#xff0c;红黑树确保没有一条路径会比其他路径长出2倍&#xff0c;因而是近似…

php文件上传例子

目录结构&#xff1a; index.html代码&#xff1a; <!DOCTYPE html> <html><head><title>文件上传</title><meta charset"utf-8"></head><body><form action"./up.php" method"post" encty…

基于STM32的手势识别算法研究与应用

基于STM32的手势识别算法在人机交互和智能设备控制中具有重要的应用价值。本文将介绍基于STM32的手势识别算法的研究原理和实现步骤&#xff0c;并提供相应的代码示例。 1. 手势识别概述 手势识别是一种通过分析人体的手部动作和姿势来识别和理解人的意图的技术。基于STM32的…