一、问题背景描述
1.任务提交异常日志
2023-06-29 15:48:20,877 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,129 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,381 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,633 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,885 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,137 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,389 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,641 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,894 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2.问题描述
集群剩余资源:集群可用为180cpu、可用内存为228GB,当前只剩余196G内存、120G内存。
还剩余32G内存、60cpu,却无法继续提交任务,异常日志上图所示。
二、处理过程
1.描述
默认情况,Fair队列资源使用策略, 不能使用全部队列资源,有个公式可以计算。以下是我粗暴的解决问题赶进度了。 后续有时间在细细研究了! 其他更多详细解释可以参考这位博主的文章 yarn队列之fair队列 、YARN三种资源调度器解析
2.操作
可通过在cdh yarn的配置输入框 搜索 “MaxAMShare” 关键词,结果如下
fair 配置文件格式化后如下:
{"defaultFairSharePreemptionThreshold":null,"defaultFairSharePreemptionTimeout":null,"defaultMinSharePreemptionTimeout":null,"defaultQueueSchedulingPolicy":"fair","queueMaxAMShareDefault":1,"queueMaxAppsDefault":null,"queuePlacementRules":[{"create":true,"name":"specified","queue":null,"rules":null},{"create":true,"name":"nestedUserQueue","queue":null,"rules":[{"create":true,"name":"default","queue":"users","rules":null}]},{"create":null,"name":"default","queue":null,"rules":null}],"queues":[{"aclAdministerApps":"*","aclSubmitApps":"*","allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"root","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"users","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"admin","queues":[],"schedulablePropertiesList":[{"impalaClampMemLimitQueryOption":null,"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueryMemLimit":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaMinQueryMemLimit":null,"impalaQueueTimeout":null,"maxAMShare":1,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":100}],"schedulingPolicy":"drf","type":null}],"schedulablePropertiesList":[{"impalaClampMemLimitQueryOption":null,"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueryMemLimit":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaMinQueryMemLimit":null,"impalaQueueTimeout":null,"maxAMShare":1,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1}],"schedulingPolicy":"drf","type":"parent"}],"schedulablePropertiesList":[{"impalaClampMemLimitQueryOption":null,"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueryMemLimit":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaMinQueryMemLimit":null,"impalaQueueTimeout":null,"maxAMShare":1,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1}],"schedulingPolicy":"drf","type":null}],"userMaxAppsDefault":null,"users":[]
}
关键配置修改:我主要是对 "queues":[]中相关maxAMShare 参数的修改为1,表示可以全部使用分配给队列的资源。以上是我的修改。修改后,保存即可!
3.修改后,重新继续提交资源,正常提交所有资源