PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

本文首发于公众号:机器感知

PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

图片

PTQ4SAM: Post-Training Quantization for Segment Anything

图片

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax th......

AniTalker: Animate Vivid and Diverse Talking Faces through  Identity-Decoupled Facial Motion Encoding

图片

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the n......

Matten: Video Generation with Mamba-Attention

图片

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten. ......

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

图片

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD......

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

图片

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/. ......

Efficient Text-driven Motion Generation via Latent Consistency Training

图片

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent r......

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

图片

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the ......

From Generalization Analysis to Optimization Designs for State Space  Models

图片

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results. ......

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

图片

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and we......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/674472.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

uni-app+vue3 +uni.connectSocket 使用websocket

前言 最近在uni-appvue3websocket实现聊天功能,在使用websocket还是遇到很多问题 这次因为是app手机应用,就没有使用websocket对象,使用的是uni-app的uni.connectSocket 为了方便测试这次用的是node.js一个简单的dom,来联调模拟…

基于Springboot的校园新闻管理系统(有报告)。Javaee项目,springboot项目。

演示视频: 基于Springboot的校园新闻管理系统(有报告)。Javaee项目,springboot项目。 项目介绍: 采用M(model)V(view)C(controller)三层体系结构…

EOCR-ELR-30RM7Q电机保护器 施耐德韩国三和

EOCR-ELR-30RM7Q电机保护器 施耐德韩国三和 基于MCU(微处理器)的密集型设计 精确的接地故障保护功能 电力系统和电动机的接地故障保护 零序电流互感器监测接地故障 电流和故障延时单独设定 LED显示电源输入和运行状态 嵌入式安装 EOCR主要产品有电子式电动机保护继电器&#xf…

如何在已经安装好的PostgreSQL14中安装uuid 扩展

当前环境 PG14.8 LINUX 8.8 存在问题: 开发人员问,PG中,支持 生成UUID吗,具体是什么,答,类似这个函数 uuid_generate_v4() 看了一下, select uuid_generate_v4();会报错&#xff0…

GDPU 天码行空11

(一)实验目的 1、掌握JAVA中IO中各种类及其构造方法; 2、重点掌握IO中类所具有的IO操作方法; 3、熟悉软件中登录模块的开发方法; 4、掌握IO中读写常用方法。 5、进一步熟悉正则规则的使用方法。 (二&…

Linux内核之获取文件系统超级块:sget用法实例(六十八)

简介: CSDN博客专家,专注Android/Linux系统,分享多mic语音方案、音视频、编解码等技术,与大家一起成长! 优质专栏:Audio工程师进阶系列【原创干货持续更新中……】🚀 优质专栏:多媒…

SpringBoot+Vue+Element-UI实现医患档案管理系统

目录 前言介绍 系统展示 管理员页面 患者管理 诊疗信息管理 病历信息管理 处方信息管理 患者页面 医生页面 部分核心代码 病历信息 上传文件 数据库配置 前言介绍 随着信息技术和网络技术的飞速发展,人类已进入全新信息化时代,传统管理技…

【LAMMPS学习】八、基础知识(5.9)LAMMPS 近场动力学

8. 基础知识 此部分描述了如何使用 LAMMPS 为用户和开发人员执行各种任务。术语表页面还列出了 MD 术语,以及相应 LAMMPS 手册页的链接。 LAMMPS 源代码分发的 examples 目录中包含的示例输入脚本以及示例脚本页面上突出显示的示例输入脚本还展示了如何设置和运行各种模拟。 …

20240506 深度学习高级技术点

1.基于BN层剪枝 基于Batch Normalization (BN)层进行剪枝是一种常用的模型压缩方法,特别是在卷积神经网络(CNNs)中。BN层在训练期间用于加速收敛和提高模型的泛化能力,而在剪枝过程中,BN层提供的统计信息(特别是均值(mean)和方差…

【练习2】

1.汽水瓶 ps:注意涉及多个输入&#xff0c;我就说怎么老不对&#xff0c;无语~ #include <cmath> #include <iostream> using namespace std;int main() {int n;int num,flag,kp,temp;while (cin>>n) {flag1;num0;temp0;kpn;while (flag1) {if(kp<2){if(…

Redis(Redis配置和订阅发布)

文章目录 1.Redis配置1.网络配置1.配置文件位置 /etc/redis.conf2.bind&#xff08;注销支持远程访问&#xff09;1.默认情况bind 127.0.0.1 只能接受本机的访问2.首先编辑配置文件3.进入命令模式输入/bind定位&#xff0c;输入n查找下一个&#xff0c;shift n查找上一个&…

【微服务】网关(详细知识以及登录验证)

微服务网关 网关网关路由快速入门路由属性 路由断言网关登录校验自定义过滤器实现登录校验网关传递用户OpenFeign传递用户 网关 网络的关口&#xff0c;负责请求的路由&#xff0c;转发&#xff0c;身份校验 当我们把一个单体项目分成多个微服务并部署在多台服务器中&#xff…