END-TO-END OPTIMIZED IMAGE COMPRESSION论文阅读

END-TO-END OPTIMIZED IMAGE COMPRESSION

文章目录

      • END-TO-END OPTIMIZED IMAGE COMPRESSION
        • 单词
          • 重要
          • 不重要
        • 摘要:

单词

重要

image compression 图像压缩

quantizer 量化器

rate–distortion performance率失真性能

不重要

a variant of 什么什么的一个变体

construct 构造

entropy 熵

discrete value 离散值

摘要:

We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural net- works, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate–distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate– distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate–distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.

我们描述了一种图像压缩方法,包括非线性分析变换、均匀量化器和非线性合成变换。这些变换是在卷积线性滤波器和非线性激活函数的三个连续阶段中构造的。与大多数卷积神经网络不同,受用于模拟生物神经元的网络的启发,选择联合非线性来实现一种局部增益控制形式。使用随机梯度下降的变体,我们在训练图像数据库上联合优化整个模型的率失真性能,引入量化器产生的不连续损失函数的连续代理。在某些条件下,松弛损失函数可以解释为由变分自动编码器实现的生成模型的对数似然。然而,与这些模型不同的是,压缩模型必须在速率失真曲线上的任何给定点上运行,如权衡参数所指定。在一组独立的测试图像中,我们发现优化方法通常比标准 JPEG 和 JPEG 2000 压缩方法表现出更好的率失真性能。更重要的是,我们观察到所有比特率下所有图像的视觉质量都有显着改善,这得到了使用 MS-SSIM 的客观质量估计的支持

  • Data compression is a fundamental and well-studied problem in engineering, and is commonly formulated with the goal of designing codes for a given discrete data ensemble with minimal entropy (Shannon, 1948).The solution relies heavily on knowledge of the probabilistic structure of the data, and thus the problem is closely related to probabilistic source modeling. However, since all practical codes must have finite entropy, continuous-valued data (such as vectors of image pixel in- tensities) must be quantized to a finite set of discrete values, which introduces error. In this context, known as the lossy compression problem, one must trade off two competing costs: the entropy of the discretized representation (rate) and the error arising from the quantization (distortion).Different compression applications, such as data storage or transmission over limited-capacity channels, demand different rate–distortion trade-offs.

  • 数据压缩是工程中一个基本且经过充分研究的问题,通常以为给定离散数据集合设计具有最小熵的代码为目标而制定(Shannon,1948)。该解决方案在很大程度上依赖于数据概率结构的知识,因此该问题与概率源建模密切相关。**然而,由于所有实际代码都必须具有有限的熵,因此连续值数据(例如图像像素强度的向量)必须量化为一组有限的离散值,这会引入误差。**在这种情况下,称为有损压缩问题,必须权衡两个相互竞争的成本:离散表示的熵(速率)和量化产生的误差(失真)。不同的压缩应用,例如数据存储或通过有限容量通道传输,需要不同的速率-失真权衡

  • Joint optimization of rate and distortion is difficult.Without further constraints, the general problem of optimal quantization in high-dimensional spaces is intractable (Gersho and Gray, 1992).For this reason, most existing image compression methods operate by linearly transforming the data vector into a suitable continuous-valued representation, quantizing its elements independently, and then encoding the resulting discrete representation using a lossless entropy code (Wintz, 1972; Netravali and Limb,1980).

  • 速率和失真的联合优化很困难。如果没有进一步的约束,高维空间中最优量化的一般问题是棘手的(Gersho 和 Gray,1992)。因此,大多数现有的图像压缩方法通过将数据向量线性变换为合适的连续值表示,独立量化其元素,然后使用无损熵代码对所得离散表示进行编码(Wintz,1972;Netravali 和 Limb, 1980)。

  • This scheme is called transform coding due to the central role of the transformation.For example, JPEG uses a discrete cosine transform on blocks of pixels, and JPEG 2000 uses a multi-scale orthogonal wavelet decomposition. Typically, the three components of transform coding methods – transform, quantizer, and entropy code – are separately optimized (often through manual parameter adjustment).

  • 由于变换的核心作用,该方案被称为变换编码。例如,JPEG 对像素块使用离散余弦变换,而 JPEG 2000 使用多尺度正交小波分解。通常,变换编码方法的三个组成部分——变换、量化器和熵代码——是分别优化的(通常通过手动参数调整)。

  • We have developed a framework for end-to-end optimization of an image compression model based on nonlinear transforms (figure 1).Previously, we demonstrated that a model consisting of linear– nonlinear block transformations, optimized for a measure of perceptual distortion, exhibited visually superior performance compared to a model optimized for mean squared error (MSE) (Ball ́e, La- parra, and Simoncelli,2016).Here, we optimize for MSE, but use a more flexible transforms built from cascades of linear convolutions and nonlinearities.Specifically, we use a generalized divisive normalization (GDN) joint nonlinearity that is inspired by models of neurons in biological visual systems, and has proven effective in Gaussianizing image densities (Ball ́e, Laparra, and Simoncelli, 2015).This cascaded transformation is followed by uniform scalar quantization (i.e., each element is rounded to the nearest integer), which effectively implements a parametric form of vector quan- tization on the original image space.The compressed image is reconstructed from these quantized values using an approximate parametric nonlinear inverse transform.

  • 我们开发了一个基于非线性变换的图像压缩模型端到端优化框架(图 1)。之前,我们证明了由线性-非线性块变换组成的模型,针对感知失真的测量进行了优化,与针对均方误差(MSE)优化的模型(Ball ́e、Laparra 和 Simoncelli, 2016)。在这里,我们针对 MSE 进行优化,但使用由线性卷积和非线性级联构建的更灵活的变换。具体来说,我们使用广义除法归一化(GDN)联合非线性,其灵感来自生物视觉系统中的神经元模型,并已被证明在高斯化图像密度方面有效(Ball ́e、Laparra 和 Simoncelli,2015)。这种级联变换之后是均匀标量量化(即,每个元素都舍入到最接近的整数),这有效地在原始图像空间上实现了矢量量化的参数形式。使用近似参数非线性逆变换从这些量化值重建压缩图像。

  • For any desired point along the rate–distortion curve, the parameters of both analysis and synthesis transforms are jointly optimized using stochastic gradient descent.To achieve this in the presence of quantization (which produces zero gradients almost everywhere), we use a proxy loss function based on a continuous relaxation of the probability model, replacing the quantization step with additive uniform noise.The relaxed rate–distortion optimization problem bears some resemblance to those used to fit generative image models, and in particular variational autoencoders (Kingma and Welling, 2014; Rezende, Mohamed, and Wierstra, 2014), but differs in the constraints we impose to ensurethat it approximates the discrete problem all along the rate–distortion curve.Finally, rather than reporting differential or discrete entropy estimates, we implement an entropy code and report performance using actual bit rates, thus demonstrating the feasibility of our solution as a complete lossy compression method.

  • 对于速率失真曲线上的任何所需点,使用随机梯度下降联合优化分析和综合变换的参数。为了在存在量化(几乎在任何地方产生零梯度)的情况下实现这一目标,我们使用基于概率模型的连续松弛的代理损失函数,用加性均匀噪声代替量化步骤。松弛率失真优化问题与用于拟合生成图像模型的问题有一些相似之处,特别是变分自动编码器(Kingma 和 Welling,2014 年;Rezende、Mohamed 和 Wierstra,2014 年),但不同之处在于我们为确保它近似于沿着率失真曲线的离散问题。最后,我们不是报告差分或离散熵估计,而是使用实际比特率实现熵代码并报告性能,从而证明了我们的解决方案作为完整有损压缩方法的可行性。

读不下去了…………根本看不懂

学习参考资料:(68条消息) 端到端的图像压缩------《End-to-end optimized image compression》笔记_gdn层_叶笙箫的博客-CSDN博客

image-20230704153708775

整体算法分为三个部分:非线性分析变换(编码器),均匀量化器和非线性合成边变换(解码器)

x 与 x ^ \hat{x} x^分别代表输入的原图和经过编解码器后的重建图片。
g a g_a ga表示编码器提供的非线性分析变换,即由输入图片经过编码器网络后得到的潜在特征,通过量化器q 后,得到量化后结果: y ^ \hat{y} y^

再通过 g S g_S gS解码器重建图片结果.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/14343.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【LangChain】Document篇

概述 这些是处理文档的核心链。它们对于总结文档、回答文档问题、从文档中提取信息等很有用。 这些链都实现了一个通用接口: class BaseCombineDocumentsChain(Chain, ABC):"""Base interface for chains combining documents."""a…

DEVICENET转ETHERCAT网关连接ethercat通讯协议详细解析

你有没有遇到过生产管理系统中,设备之间的通讯问题?两个不同协议的设备进行通讯,是不是很麻烦?今天,我们为大家介绍一款神奇的产品,能够将不同协议的设备进行连接,让现场的数据交换不再困扰&…

TiDB(7):技术内幕之存储

1 引言 数据库、操作系统和编译器并称为三大系统,可以说是整个计算机软件的基石。其中数据库更靠近应用层,是很多业务的支撑。这一领域经过了几十年的发展,不断的有新的进展。 很多人用过数据库,但是很少有人实现过一个数据库&a…

OpenCV绘制矩形

这是完整的代码: #include <opencv2/opencv.hpp>int main() {// 创建一个白色的图像cv::Mat image(500, 500, CV_8UC3, cv

2-Spring cloud之Eureka快速剔除失效服务

2-Spring cloud之Eureka快速剔除失效服务 1. eureka server端修改yml配置2. 客户端配置 1. eureka server端修改yml配置 添加如下配置&#xff1a; server:enable-self-preservation: false # 关闭自我保护eviction-interval-timer-in-ms: 3000 # 清理间隔(剔除失效服务…

cmake流程控制--循环

目录 for循环 普通方式 demo cmake3.17中添加了一种特殊的形式,可以在一次循环多个列表,其形式如下: demo 类似python语言的for循环 demo while循环 demo cmake跳出循环(break)和继续下次循环(continue) demo block()和endblock()命令定义的块内也是允许break()和c…

MySQL CDC技术方案梳理

本篇主要探讨MySQL数据同步的各类常见技术方案及优劣势对比分析&#xff0c;从而更加深层次的理解方案&#xff0c;进而在后续的实际业务中&#xff0c;更好的选择方案。 1 CDC概念 CDC即Change Data Capture&#xff0c;变更数据捕获&#xff0c;即当数据发生变更时&#xff…

反诈防骗网络安全知识竞赛导出排行榜数据到excel遇到的问题复盘

在昨天的反诈防骗网络安全知识竞赛活动结束后&#xff0c;应主办方要求&#xff0c;我就帮忙导出排行榜全部数据&#xff08;含排名、编号、赛区、成绩、答题用时、答题日期等信息&#xff09;。 导出excel后&#xff0c;打开查看&#xff0c;发现有好几条数据的答题日期并不是…

ubuntu系统自带的Text Editor编辑器不高亮解决办法

平时在写launch文件时&#xff0c;我喜欢用ubuntu系统自带的text编辑器&#xff0c;但发现使用text打开launch 文件时&#xff0c;没有高亮功能了&#xff0c;如下图所示&#xff1a; 解决办法非常简单&#xff0c;因为launch和xml文件语法规则类似&#xff0c;只需将text编辑…

VSCode 免安装及中文设置

前言&#xff1a;VSCode作为目前最强大的文本编辑器&#xff0c;通过内部的插件市场可满足各种开发需求。使用免安装版可以自定义插件安装位置等&#xff0c;而使用安装包安装只能通过修改快捷方式自定义&#xff0c;十分不方便。因此这里分享如何安装免安装版的VSCode。 下载…

【Leetcode】707. 设计链表

单向链表 class ListNode:def __init__(self, val0, nextNone):self.val valself.next nextclass MyLinkedList:def __init__(self):self.dummy_head ListNode()self.size 0def get(self, index):if index < 0 or index > self.size:return -1current self.dummy_h…

传统图像处理之目标检测——人脸识别

代码实战&#xff1a;人脸识别 import numpy as np import cv2 img cv2.imread("3.webp")face_cascade cv2.CascadeClassifier(r./haarcascade_frontalface_default.xml)gray cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)#探测图片中的人脸 faces face_cascade.detec…