Apache celeborn 安装及使用教程

1.下载安装包

https://celeborn.apache.org/download/ 

测0.4.0时出现https://github.com/apache/incubator-celeborn/issues/835

2.解压

tar -xzvf apache-celeborn-0.3.2-incubating-bin.tgz

3.修改配置文件

cp celeborn-env.sh.template  celeborn-env.shcp log4j2.xml.template  log4j2.xmlcp celeborn-defaults.conf.template  cp celeborn-defaults.conf

3.1修改celeborn-env.sh

CELEBORN_MASTER_MEMORY=2g
CELEBORN_WORKER_MEMORY=2g
CELEBORN_WORKER_OFFHEAP_MEMORY=4g

3.2 修改celeborn-defaults.conf

# used by client and worker to connect to master
celeborn.master.endpoints 10.67.78.xx:9097# used by master to bootstrap
celeborn.master.host 10.67.78.xx
celeborn.master.port 9097celeborn.metrics.enabled true
celeborn.worker.flusher.buffer.size 256k# If Celeborn workers have local disks and HDFS. Following configs should be added.
# If Celeborn workers have local disks, use following config.
# Disk type is HDD by defaut.
#celeborn.worker.storage.dirs /mnt/disk1:disktype=SSD,/mnt/disk2:disktype=SSD# If Celeborn workers don't have local disks. You can use HDFS.
# Do not set `celeborn.worker.storage.dirs` and use following configs.
celeborn.storage.activeTypes HDFS
celeborn.worker.sortPartition.threads 64
celeborn.worker.commitFiles.timeout 240s
celeborn.worker.commitFiles.threads 128
celeborn.master.slot.assign.policy roundrobin
celeborn.rpc.askTimeout 240s
celeborn.worker.flusher.hdfs.buffer.size 4m
celeborn.storage.hdfs.dir hdfs://10.67.78.xx:8020/celeborn
celeborn.worker.replicate.fastFail.duration 240s# If your hosts have disk raid or use lvm, set celeborn.worker.monitor.disk.enabled to false
celeborn.worker.monitor.disk.enabled false

4.复制到其他节点

scp -r /root/apache-celeborn-0.3.2-incubating-bin 10.67.78.xx1:/root/
scp -r /root/apache-celeborn-0.3.2-incubating-bin 10.67.78.xx2:/root/

因为在配置文件中已经配置了master 所以启动matster和worker即可。

5.启动master和worker

cd $CELEBORN_HOME
./sbin/start-master.sh./sbin/start-worker.sh celeborn://<Master IP>:<Master Port>

 之后在master的日志中看woker是否注册上

 

6.在 spark客户端使用

复制 $CELEBORN_HOME/spark/*.jar   到   $SPARK_HOME/jars/

修改spark-defaults.conf

# Shuffle manager class name changed in 0.3.0:
#    before 0.3.0: org.apache.spark.shuffle.celeborn.RssShuffleManager
#    since 0.3.0: org.apache.spark.shuffle.celeborn.SparkShuffleManager
spark.shuffle.manager org.apache.spark.shuffle.celeborn.SparkShuffleManager
# must use kryo serializer because java serializer do not support relocation
spark.serializer org.apache.spark.serializer.KryoSerializer# celeborn master
spark.celeborn.master.endpoints clb-1:9097,clb-2:9097,clb-3:9097
# This is not necessary if your Spark external shuffle service is Spark 3.1 or newer
spark.shuffle.service.enabled false# options: hash, sort
# Hash shuffle writer use (partition count) * (celeborn.push.buffer.max.size) * (spark.executor.cores) memory.
# Sort shuffle writer uses less memory than hash shuffle writer, if your shuffle partition count is large, try to use sort hash writer.  
spark.celeborn.client.spark.shuffle.writer hash# We recommend setting spark.celeborn.client.push.replicate.enabled to true to enable server-side data replication
# If you have only one worker, this setting must be false 
# If your Celeborn is using HDFS, it's recommended to set this setting to false
spark.celeborn.client.push.replicate.enabled true# Support for Spark AQE only tested under Spark 3
# we recommend setting localShuffleReader to false to get better performance of Celeborn
spark.sql.adaptive.localShuffleReader.enabled false# If Celeborn is using HDFS
spark.celeborn.storage.hdfs.dir hdfs://<namenode>/celeborn# we recommend enabling aqe support to gain better performance
spark.sql.adaptive.enabled true
spark.sql.adaptive.skewJoin.enabled true# Support Spark Dynamic Resource Allocation
# Required Spark version >= 3.5.0 注意spark版本是否满足
spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO
# Required Spark version >= 3.4.0, highly recommended to disable 注意spark版本是否满足
spark.dynamicAllocation.shuffleTracking.enabled false

7.启动spark-shell

./bin/spark-shell spark.sparkContext.parallelize(1 to 1000, 1000).flatMap(_ => (1 to 100).iterator.map(num => num)).repartition(10).count

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/488412.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

程序媛的mac修炼手册-- 小白入门Java篇

最近因为要用CiteSpace做文献综述&#xff0c;间接接触Java了。所以&#xff0c;继Python、C之后&#xff0c;又要涉猎Java了。刺激&#xff01;&#xff01; 由于CiteSpace与Java要求版本高度匹配&#xff0c;有个匹配详情明天为大家讲解。总之&#xff0c;我的Java之旅开始于…

在openEuler中通过KVM可视化安装华为FusionCompute的VRM节点

一、说明 本文是华为FusionCompute云平台配置的延续&#xff0c;是在CNA&#xff08;ComputingNode Agent&#xff0c;计算节点代理&#xff09;主机安装配置完成后&#xff0c;详细安装VRM&#xff08;Virtual Resource Manager&#xff0c;虚拟资源管理器&#xff09;节点的…

【论文阅读】ICCV 2023 计算和数据高效后门攻击

文章目录 一.论文信息二.论文内容1.摘要2.引言3.主要图表4.结论 一.论文信息 论文题目&#xff1a; Computation and Data Efficient Backdoor Attacks&#xff08;计算和数据高效后门攻击&#xff09; 论文来源&#xff1a; 2023-ICCV&#xff08;CCF-A&#xff09; 论文团…

前后端分离PHP+vue+mysql城市轨道交通线路公交查询系统

医院、厕所、药店、派出所、学校、营业厅、快递、银行 开发语言&#xff1a;php 后端框架&#xff1a;Thinkphp 前端框架&#xff1a;vue.js 服务器&#xff1a;apache 数据库&#xff1a;mysql 运行环境:phpstudy/wamp/xammp等 A.美食 快餐、中餐、自助餐、火锅、烧烤、奶…

【大数据】Flink 内存管理(一):设置 Flink 进程内存

Flink 内存管理&#xff08;一&#xff09;&#xff1a;设置 Flink 进程内存 1.配置 Total Memory2.JVM 参数3.根据比例限制的组件&#xff08;Capped Fractionated Components&#xff09; Apache Flink 通过严格控制各种组件的内存使用&#xff0c;在 JVM 上提供高效的工作负…

Siamfc论文中文翻译(详细!)

Fully-Convolutional Siamese Networks for Object Tracking 用于对象跟踪的Siamese网络 说明 建议对照siamfc&#xff08;2021版&#xff09;原文阅读&#xff0c;翻译软件翻译出来的效果不好&#xff0c;整体阅读体验不佳&#xff0c;所以我对译文重新进行了整理&#xff0…

EasyRecovery易恢复15Mac苹果电脑版下载及功能详细信息

一、软件概述 EasyRecovery易恢复15 Mac版是一款专为苹果电脑用户设计的强大数据恢复软件。它采用先进的扫描和恢复技术&#xff0c;能够帮助用户快速、安全地恢复因各种原因丢失或删除的数据。无论是误删除、格式化、系统崩溃还是硬件故障&#xff0c;EasyRecovery易恢复15 M…

第八篇【传奇开心果系列】python的文本和语音相互转换库技术点案例示例:Google Text-to-Speech虚拟现实(VR)沉浸式体验经典案例

传奇开心果博文系列 系列博文目录python的文本和语音相互转换库技术点案例示例系列 博文目录前言一、雏形示例代码二、扩展思路介绍三、虚拟导游示例代码四、交互式学习示例代码五、虚拟角色对话示例代码六、辅助用户界面示例代码七、实时语音交互示例代码八、多语言支持示例代…

2024.2.23

信号的课堂代码 1.试图处理普通信号 #include<myhead.h> void handler(int signo) {if(signoSIGINT){printf("用户按下了CTRL C\n");} } int main(int argc, const char *argv[]) {/*忽略if(signal(SIGINT,SIG_IGN)SIG_ERR){perror("signal error"…

Python流程控制有知道的吗?

流程控制是编程的核心概念之一&#xff0c;Python也不例外。在Python中&#xff0c;程序的流程控制结构主要包括顺序结构、选择结构和循环结构。这些结构让程序员能够更好地组织代码&#xff0c;使其按照特定的逻辑执行。 1.顺序结构 顺序结构是Python中最简单的流程控制结构&…

Python实战: 获取 后缀名(扩展名) 或 文件名

Python实战: 获取 后缀名(扩展名) 或 文件名 &#x1f308; 个人主页&#xff1a;高斯小哥 &#x1f525; 高质量专栏&#xff1a;Matplotlib之旅&#xff1a;零基础精通数据可视化、Python基础【高质量合集】、PyTorch零基础入门教程 &#x1f448; 希望得到您的订阅和支持~ &…

消息中间件篇之RabbitMQ-延时队列

一、延时队列 延迟队列&#xff1a;进入队列的消息会被延迟消费的队列。 场景&#xff1a;超时订单、限时优惠、定时发布。 延迟队列死信交换机TTL&#xff08;生存时间&#xff09;。 二、死信交换机 当一个队列中的消息满足下列情况之一时&#xff0c;可以成为死信&#xf…