-
安装相关库:
pip install mrjob boto3
-
编写代码:
from mrjob.job import MRJobclass WordCount(MRJob):def mapper(self, _, line):for word in line.split():yield word, 1def reducer(self, word, counts):yield word, sum(counts)if __name__ == '__main__':WordCount.run()
-
编辑配置文件:
runners:emr:# AWS 认证信息aws_access_key_id: YOUR_ACCESS_KEYaws_secret_access_key: YOUR_SECRET_ACCESS_KEYregion: ap-east-1# 集群配置instance_type: m5.xlargenum_core_instances: 3# EMR 版本release_label: emr-7.3.0