客户价值分析
一、实验目的与要求
1、掌握使用numpy和pandas库处理数据的基本方法。
2、掌握使用RFM分析模型对客户信息进行特征提取的基本方法。
3、掌握对特征数据进行标准化处理的基本方法。
4、掌握使用Sklearn库对K-Means聚类算法的实现及其评价方法。
5、掌握使用matplotlib结合pandas库对数据分析可视化处理的基本方法。
二、实验内容
1、利用python中pandas等库完成对数据的预处理,并计算R、F、M等3个特征指标,最后将处理好的文件进行保存。
2、利用python中pandas等库完成对数据的标准化处理。
3、利用Sklearn库和RFM分析方法建立聚类模型,完成对客户价值的聚类分析,并对巨累结果进行评价。
4、结合pandas、matplotlib库对聚类完成的结果进行可视化处理。
三、实验步骤
1、数据预处理。
(1)导入所需要使用的包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
from sklearn.cluster import KMeans
from datetime import datetime
(2)读取文件
datafile="/data/bigfiles/data2.csv"
data = pd.read_csv(datafile)
(3)查看数据的基本统计信息
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2832 entries, 0 to 2831
Data columns (total 54 columns):
买家会员名 2660 non-null object
买家实际支付积分 2660 non-null float64
买家实际支付金额 2660 non-null float64
买家应付货款 2660 non-null float64
买家应付邮费 2660 non-null float64
买家支付宝账号 2658 non-null object
买家支付积分 2660 non-null float64
买家服务费 2660 non-null object
买家留言 163 non-null object
修改后的sku 0 non-null float64
修改后的收货地址 358 non-null object
分阶段订单信息 0 non-null float64
卖家服务费 2660 non-null float64
发票抬头 0 non-null float64
含应开票给个人的个人红包 0 non-null float64
天猫卡券抵扣 0 non-null float64
定金排名 0 non-null float64
宝贝总数量 2660 non-null float64
宝贝标题 2397 non-null object
宝贝种类 2660 non-null float64
店铺Id 1581 non-null float64
店铺名称 2660 non-null object
异常信息 0 non-null float64
总金额 2660 non-null float64
打款商家金额 2660 non-null object
支付单号 1560 non-null object
支付详情 1560 non-null object
收货人姓名 2660 non-null object
收货地址 2660 non-null object
新零售交易类型 2660 non-null object
新零售发货门店id 0 non-null float64
新零售发货门店名称 0 non-null float64
新零售导购门店id 0 non-null float64
新零售导购门店名称 0 non-null float64
是否上传合同照片 2660 non-null object
是否上传小票 2660 non-null object
是否代付 2660 non-null object
是否手机订单 1838 non-null object
是否是O2O交易 0 non-null float64
物流公司 1425 non-null object
物流单号 1425 non-null object
特权订金订单id 0 non-null float64
确认收货时间 1876 non-null object
联系手机 2659 non-null object
联系电话 130 non-null object
订单付款时间 2148 non-null object
订单关闭原因 2660 non-null object
订单创建时间 2660 non-null object
订单备注 695 non-null object
订单状态 2660 non-null object
运送方式 2660 non-null object
返点积分 2660 non-null float64
退款金额 2660 non-null float64
数据采集时间 2660 non-null object
dtypes: float64(25), object(29)
memory usage: 1.2+ MB
len(data)
2832
data.describe()
买家实际支付积分 | 买家实际支付金额 | 买家应付货款 | 买家应付邮费 | 买家支付积分 | 修改后的sku | 分阶段订单信息 | 卖家服务费 | 发票抬头 | 含应开票给个人的个人红包 | ... | 异常信息 | 总金额 | 新零售发货门店id | 新零售发货门店名称 | 新零售导购门店id | 新零售导购门店名称 | 是否是O2O交易 | 特权订金订单id | 返点积分 | 退款金额 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 2660.0 | 2660.000000 | 2660.000000 | 2660.000000 | 2660.0 | 0.0 | 0.0 | 2660.0 | 0.0 | 0.0 | ... | 0.0 | 2660.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2660.0 | 2660.000000 |
mean | 0.0 | 155.113094 | 181.193241 | 1.257519 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 182.450759 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 10.436218 |
std | 0.0 | 350.332509 | 366.871965 | 4.408725 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 366.806966 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 131.244263 |
min | 0.0 | 0.000000 | 0.100000 | 0.000000 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 0.100000 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.000000 |
25% | 0.0 | 43.890000 | 50.860000 | 0.000000 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 51.870000 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.000000 |
50% | 0.0 | 62.860000 | 89.700000 | 0.000000 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 90.130000 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.000000 |
75% | 0.0 | 199.000000 | 268.000000 | 0.000000 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 268.000000 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.000000 |
max | 0.0 | 13246.800000 | 13246.800000 | 55.000000 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | ... | NaN | 13246.800000 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 3950.730000 |
8 rows × 25 columns
(4)提取属性列
data.columns
Index(['买家会员名', '买家实际支付积分', '买家实际支付金额', '买家应付货款', '买家应付邮费', '买家支付宝账号','买家支付积分', '买家服务费', '买家留言', '修改后的sku', '修改后的收货地址', '分阶段订单信息', '卖家服务费','发票抬头', '含应开票给个人的个人红包', '天猫卡券抵扣', '定金排名', '宝贝总数量', '宝贝标题 ', '宝贝种类 ','店铺Id', '店铺名称', '异常信息', '总金额', '打款商家金额', '支付单号', '支付详情', '收货人姓名','收货地址', '新零售交易类型', '新零售发货门店id', '新零售发货门店名称', '新零售导购门店id', '新零售导购门店名称','是否上传合同照片', '是否上传小票', '是否代付', '是否手机订单', '是否是O2O交易', '物流公司', '物流单号 ','特权订金订单id', '确认收货时间', '联系手机', '联系电话 ', '订单付款时间', '订单关闭原因', '订单创建时间','订单备注', '订单状态', '运送方式', '返点积分', '退款金额', '数据采集时间'],dtype='object')
data.订单状态.unique()
array(['买家已付款,等待卖家发货', '等待买家付款', '卖家已发货,等待买家确认', '交易关闭', '交易成功', nan],dtype=object)
data = data[data.订单状态 == '交易成功']
data
买家会员名 | 买家实际支付积分 | 买家实际支付金额 | 买家应付货款 | 买家应付邮费 | 买家支付宝账号 | 买家支付积分 | 买家服务费 | 买家留言 | 修改后的sku | ... | 联系电话 | 订单付款时间 | 订单关闭原因 | 订单创建时间 | 订单备注 | 订单状态 | 运送方式 | 返点积分 | 退款金额 | 数据采集时间 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
25 | gang_2015 | 0.0 | 143.64 | 143.64 | 0.0 | 18104860223 | 0.0 | 0元 | NaN | NaN | ... | NaN | 2018/1/27 | 订单未关闭 | 2018-01-27 09:57:23 | NaN | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
26 | tb6683844_2011 | 0.0 | 55.86 | 55.86 | 0.0 | 17743451991 | 0.0 | 0元 | NaN | NaN | ... | NaN | 2018/1/26 | 订单未关闭 | 2018-01-26 22:55:46 | NaN | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
30 | dlzslv | 0.0 | 90.72 | 90.72 | 0.0 | zs-lv@sohu.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2018/1/26 | 订单未关闭 | 2018-01-26 13:37:22 | NaN | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
31 | 劳什子2010 | 0.0 | 48.86 | 48.86 | 0.0 | tangzhai2010@163.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2018/1/26 | 订单未关闭 | 2018-01-26 10:12:18 | v6 | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
32 | 李氏江江48 | 0.0 | 103.74 | 103.74 | 0.0 | 849694657@qq.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2018/1/26 | 订单未关闭 | 2018-01-26 06:48:35 | NaN | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2655 | 旋光精灵 | 0.0 | 999.00 | 999.00 | 0.0 | 20323624@qq.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2017/1/4 | 订单未关闭 | 2017/1/4 15:05 | NaN | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
2656 | leryang | 0.0 | 268.00 | 268.00 | 0.0 | 9722165@163.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2017/1/3 | 订单未关闭 | 2017/1/3 16:51 | '中通快递:728773317678 | 交易成功 | 虚拟物品 | 0.0 | 0.0 | 2018/12/31 |
2657 | leryang | 0.0 | 134.00 | 134.00 | 0.0 | 9722165@163.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2017/1/3 | 订单未关闭 | 2017/1/3 16:51 | NaN | 交易成功 | 虚拟物品 | 0.0 | 0.0 | 2018/12/31 |
2658 | crazy283 | 0.0 | 268.00 | 268.00 | 0.0 | crazy355@126.com | 0.0 | 0元 | NaN | NaN | ... | NaN | 2017/1/3 | 订单未关闭 | 2017/1/3 16:01 | '中通:728773317331 【月月 01-04 08:58】 | 交易成功 | 虚拟物品 | 0.0 | 0.0 | 2018/12/31 |
2659 | zhangyang52058 | 0.0 | 63.70 | 48.70 | 15.0 | 13693516433 | 0.0 | 0元 | NaN | NaN | ... | NaN | 2017/1/2 | 订单未关闭 | 2017/1/2 23:28 | NaN | 交易成功 | 快递 | 0.0 | 0.0 | 2018/12/31 |
1876 rows × 54 columns
#提取需要的列
# 这里需要买家id,支付金额,支付时间,最后付款时间
data=data.filter(items=['买家会员名','打款商家金额','订单付款时间'])
(5)处理异常数据
# 统计数据缺失的值
datas=data.isnull().sum()
datas
买家会员名 0
打款商家金额 0
订单付款时间 0
dtype: int64
# 查看完全重复行
result=data.duplicated()
df=data[result]
df
买家会员名 | 打款商家金额 | 订单付款时间 | |
---|---|---|---|
71 | qufan_xiao | 100.00元 | 2018/1/20 |
119 | kangfengtj | 55.86元 | 2018/1/16 |
207 | waterli2005 | 55.86元 | 2018/1/3 |
211 | 时尚乐器 | 268.00元 | 2018/1/3 |
584 | 猪头luing | 254.00元 | 2018/6/22 |
... | ... | ... | ... |
2308 | bill163com | 200.00元 | 2017/6/21 |
2320 | 南山熊00340 | 268.00元 | 2017/6/13 |
2354 | 铭铭猪是的念倒 | 201.00元 | 2017/6/1 |
2446 | 夜沉晨 | 201.00元 | 2017/4/15 |
2533 | chenzh3664951 | 201.00元 | 2017/3/14 |
103 rows × 3 columns
# 删除完全重复的行
data=data.drop_duplicates()
#删除未付款的行
data.drop(data.loc[data['打款商家金额']=='0.00元'].index, inplace=True)
data['订单付款时间'] = data.订单付款时间.map(lambda x: datetime.strptime(x, '%Y/%m/%d'))
data.打款商家金额 = data.打款商家金额.map(lambda x: re.sub('元','',x))
data.打款商家金额 = data.打款商家金额.map(lambda x: float(x))
# print(data)
data =data.groupby("买家会员名").agg({"打款商家金额":"sum","订单付款时间":"max","买家会员名":"count"})
data = data.rename(columns = {'打款商家金额':'总金额','买家会员名':'付款次数'})
(6)计算R并进行标准化,更改列名
# 计算R
# 数据采集时间减去订单付款时间
exdata_date=datetime(2018,12,31)
start_date=datetime(2017,1,2)
data['R(最后一次消费时间)']=exdata_date-data['订单付款时间']
data
总金额 | 订单付款时间 | 付款次数 | R(最后一次消费时间) | |
---|---|---|---|---|
买家会员名 | ||||
00牛哥哥00 | 402.00 | 2017-02-06 | 2 | 693 days |
020luo | 74.70 | 2017-11-18 | 1 | 408 days |
0587xueguangju | 268.00 | 2017-04-14 | 1 | 626 days |
0o秋天de童话 | 411.50 | 2018-10-09 | 2 | 83 days |
0残缺0 | 48.86 | 2018-01-19 | 1 | 346 days |
... | ... | ... | ... | ... |
黑河市2013 | 47.88 | 2018-01-11 | 1 | 354 days |
黑瑾瞳 | 158.44 | 2018-07-26 | 2 | 158 days |
鼠标右键点 | 51.87 | 2018-12-12 | 1 | 19 days |
龙星宇1018 | 198.00 | 2017-11-17 | 1 | 409 days |
龙魂爱上凤灵 | 43.86 | 2017-12-13 | 1 | 383 days |
1483 rows × 4 columns
(7)计算F并进行标准化,更改列名
from math import ceil
# 计算最后一次消费事件和起始时间
period_day=data['订单付款时间']-start_date
#创建空列表统计月数
period_month=[]
for i in period_day:period_month.append(ceil(i.days/30))
# 第一次输出月数统计
print(period_month)
[2, 11, 4, 22, 13, 3, 9, 15, 8, 7, 17, 12, 24, 23, 24, 17, 22, 17, 17, 5, 24, 13, 18, 18, 11, 24, 13, 13, 9, 22, 8, 22, 22, 11, 11, 12, 15, 13, 6, 20, 17, 13, 13, 22, 8, 15, 4, 24, 11, 10, 24, 13, 12, 18, 13, 15, 13, 13, 13, 9, 12, 23, 11, 12, 24, 24, 23, 24, 10, 17, 11, 24, 24, 6, 22, 24, 19, 8, 12, 18, 12, 2, 19, 25, 6, 6, 10, 17, 12, 10, 5, 25, 15, 12, 9, 18, 8, 7, 18, 23, 18, 8, 22, 9, 3, 17, 3, 9, 7, 5, 3, 10, 9, 20, 12, 11, 24, 23, 18, 17, 23, 1, 15, 8, 9, 4, 24, 22, 13, 20, 22, 11, 15, 10, 15, 22, 11, 5, 12, 12, 19, 1, 13, 6, 9, 9, 15, 19, 19, 19, 9, 10, 17, 15, 17, 5, 24, 10, 9, 3, 23, 22, 13, 15, 15, 12, 24, 11, 9, 15, 22, 11, 8, 22, 12, 12, 22, 6, 22, 11, 18, 8, 22, 2, 4, 13, 23, 23, 23, 23, 15, 9, 23, 24, 23, 24, 9, 13, 7, 23, 12, 8, 10, 12, 23, 22, 10, 10, 23, 9, 19, 3, 15, 13, 12, 13, 13, 15, 10, 17, 9, 15, 13, 13, 15, 17, 12, 13, 13, 19, 11, 17, 3, 3, 18, 12, 13, 15, 15, 19, 9, 15, 10, 8, 13, 12, 22, 17, 17, 15, 5, 12, 15, 23, 18, 13, 17, 24, 11, 22, 13, 5, 14, 5, 5, 13, 15, 11, 7, 11, 24, 9, 7, 13, 13, 17, 15, 6, 14, 18, 23, 11, 24, 19, 8, 25, 11, 17, 13, 7, 23, 13, 22, 15, 24, 3, 22, 7, 17, 7, 19, 24, 12, 12, 22, 1, 12, 17, 12, 24, 10, 17, 6, 19, 15, 12, 18, 15, 12, 19, 19, 19, 23, 24, 17, 3, 13, 11, 12, 12, 6, 13, 13, 6, 18, 18, 20, 19, 4, 1, 18, 11, 17, 13, 7, 8, 18, 19, 12, 23, 13, 23, 10, 23, 24, 11, 10, 15, 19, 19, 11, 17, 2, 21, 13, 22, 15, 3, 13, 24, 20, 15, 17, 11, 1, 7, 4, 12, 12, 12, 17, 12, 18, 3, 22, 23, 8, 23, 18, 10, 17, 13, 12, 23, 13, 7, 24, 23, 21, 18, 10, 24, 7, 18, 23, 5, 22, 8, 11, 13, 7, 8, 9, 7, 13, 18, 15, 9, 8, 5, 3, 7, 15, 15, 5, 20, 22, 25, 9, 19, 15, 24, 24, 14, 11, 13, 4, 13, 19, 2, 7, 13, 24, 8, 12, 11, 12, 11, 11, 12, 10, 3, 17, 3, 11, 7, 17, 6, 12, 11, 8, 12, 11, 15, 4, 17, 22, 3, 11, 13, 19, 3, 18, 12, 20, 13, 2, 10, 12, 12, 13, 2, 21, 24, 12, 24, 23, 15, 17, 13, 17, 15, 22, 25, 24, 2, 3, 3, 11, 11, 9, 18, 13, 22, 7, 17, 11, 12, 24, 5, 19, 8, 9, 10, 23, 12, 19, 24, 12, 24, 17, 24, 17, 24, 22, 19, 13, 19, 22, 21, 22, 7, 12, 17, 1, 23, 24, 11, 22, 3, 13, 12, 21, 19, 8, 21, 18, 6, 6, 24, 23, 23, 19, 23, 10, 13, 7, 22, 5, 12, 19, 23, 24, 19, 17, 23, 4, 12, 12, 3, 9, 13, 13, 1, 15, 11, 11, 22, 8, 12, 18, 3, 15, 13, 13, 18, 9, 17, 2, 17, 18, 13, 23, 4, 13, 15, 6, 22, 4, 13, 3, 24, 5, 12, 7, 24, 14, 15, 15, 9, 15, 3, 4, 7, 18, 13, 11, 24, 24, 9, 23, 13, 22, 15, 12, 7, 6, 24, 19, 20, 12, 9, 2, 14, 18, 4, 10, 24, 3, 18, 22, 12, 20, 8, 11, 13, 21, 23, 13, 21, 9, 23, 13, 19, 24, 18, 23, 12, 1, 8, 23, 4, 5, 24, 2, 17, 23, 2, 24, 15, 13, 10, 13, 6, 15, 12, 21, 1, 9, 6, 9, 24, 11, 23, 8, 19, 19, 10, 9, 7, 8, 12, 18, 7, 17, 3, 10, 22, 3, 3, 1, 13, 11, 12, 18, 15, 19, 22, 17, 8, 24, 24, 4, 5, 22, 14, 13, 5, 6, 19, 12, 22, 23, 8, 11, 11, 15, 24, 13, 15, 10, 12, 12, 22, 12, 24, 2, 7, 12, 23, 10, 18, 12, 12, 5, 24, 12, 8, 15, 17, 24, 13, 17, 18, 9, 24, 9, 23, 14, 18, 8, 4, 10, 7, 21, 19, 17, 23, 15, 22, 22, 18, 23, 18, 18, 22, 23, 8, 23, 7, 6, 22, 4, 12, 19, 24, 18, 8, 15, 1, 23, 11, 11, 17, 12, 15, 15, 19, 8, 0, 18, 11, 25, 22, 18, 7, 19, 2, 4, 24, 13, 9, 19, 19, 10, 11, 19, 13, 2, 12, 24, 19, 17, 12, 15, 17, 25, 23, 18, 12, 12, 15, 21, 14, 15, 22, 6, 24, 15, 19, 18, 15, 24, 23, 23, 6, 7, 7, 2, 8, 9, 5, 12, 8, 7, 2, 10, 8, 14, 13, 15, 18, 3, 10, 23, 6, 8, 24, 22, 15, 2, 2, 2, 2, 10, 13, 3, 9, 13, 22, 5, 23, 8, 18, 12, 8, 13, 22, 3, 11, 13, 22, 19, 13, 2, 2, 19, 25, 18, 25, 17, 15, 9, 17, 18, 13, 24, 8, 23, 13, 13, 15, 8, 12, 19, 13, 13, 4, 22, 10, 21, 13, 24, 8, 8, 9, 11, 22, 1, 6, 15, 15, 13, 13, 22, 25, 19, 23, 12, 8, 13, 12, 24, 4, 15, 19, 10, 7, 24, 4, 17, 12, 17, 24, 13, 24, 13, 18, 22, 12, 2, 15, 12, 19, 11, 1, 23, 13, 24, 15, 13, 13, 23, 22, 13, 15, 8, 15, 12, 13, 13, 22, 11, 4, 19, 12, 18, 6, 3, 23, 4, 8, 12, 13, 7, 23, 12, 17, 22, 22, 15, 24, 11, 22, 8, 18, 22, 13, 15, 9, 7, 24, 23, 10, 5, 23, 1, 23, 9, 10, 15, 10, 25, 25, 13, 15, 14, 12, 18, 13, 11, 8, 18, 11, 12, 15, 22, 10, 25, 2, 6, 17, 14, 15, 14, 11, 12, 13, 11, 8, 24, 15, 23, 17, 10, 23, 23, 10, 17, 4, 7, 13, 3, 14, 12, 22, 19, 23, 25, 15, 21, 22, 12, 10, 1, 5, 21, 13, 15, 22, 22, 8, 18, 12, 1, 13, 23, 2, 18, 12, 4, 7, 13, 13, 24, 14, 12, 13, 13, 15, 4, 24, 13, 25, 12, 15, 24, 4, 4, 7, 18, 2, 12, 22, 15, 11, 23, 15, 15, 13, 23, 17, 1, 23, 13, 21, 19, 8, 13, 11, 15, 18, 24, 17, 22, 24, 10, 15, 18, 10, 5, 3, 17, 15, 17, 17, 23, 12, 24, 14, 12, 10, 23, 15, 12, 13, 1, 8, 17, 13, 2, 5, 19, 25, 12, 15, 13, 13, 24, 12, 17, 8, 15, 22, 2, 18, 12, 17, 18, 17, 18, 8, 12, 22, 8, 15, 19, 20, 12, 5, 17, 22, 12, 24, 7, 8, 13, 12, 7, 11, 8, 19, 15, 23, 12, 18, 19, 5, 19, 24, 19, 18, 2, 4, 7, 17, 19, 15, 8, 13, 15, 12, 23, 13, 24, 3, 11, 17, 15, 22, 15, 15, 22, 20, 24, 13, 5, 1, 19, 14, 5, 15, 18, 24, 24, 11, 22, 15, 3, 4, 9, 13, 3, 3, 23, 19, 19, 22, 17, 18, 18, 18, 7, 13, 24, 13, 5, 17, 24, 22, 24, 15, 24, 23, 5, 12, 22, 22, 19, 15, 12, 23, 24, 19, 19, 13, 15, 18, 15, 12, 3, 18, 15, 19, 3, 17, 24, 9, 8, 22, 8, 17, 8, 15, 25, 11, 19, 18, 15, 23, 14, 19, 18, 12, 19, 2, 19, 9, 14, 22, 24, 12, 14, 3, 4, 21, 19, 17, 21, 3, 9, 23, 23, 24, 15, 13, 11, 10, 12, 9, 18, 22, 24, 16, 7, 4, 24, 3, 12, 24, 18, 12, 13, 19, 18, 8, 2, 8, 9, 6, 17, 19, 2, 12, 7, 23, 17, 20, 13, 12, 24, 5, 18, 9, 13, 24, 9, 13, 18, 23, 24, 18, 22, 13, 6, 12, 9, 15, 5, 9, 13, 19, 19, 23, 3, 10, 19, 15, 3, 15, 25, 5, 12, 3, 10, 10, 13, 23, 1, 13, 22, 17, 17, 15, 8, 20, 22, 3, 5, 24, 11, 18, 17, 5, 13, 15, 24, 24, 23, 10, 23, 13, 13, 22, 22, 18, 7, 3, 10, 18, 9, 22, 2, 8, 24, 8, 3, 13, 13, 24, 12, 12, 23, 17, 23, 10, 8, 18, 22, 18, 11, 15, 15, 13, 17, 12, 25, 22, 7, 23, 24, 23, 19, 13, 23, 18, 13, 13, 13, 19, 24, 11, 12]
# 遍历清除0值
for i in range(0,len(period_month)):if period_month[i]==0:period_month[i]=1
# 第二次统计月数
print(period_month)
[2, 11, 4, 22, 13, 3, 9, 15, 8, 7, 17, 12, 24, 23, 24, 17, 22, 17, 17, 5, 24, 13, 18, 18, 11, 24, 13, 13, 9, 22, 8, 22, 22, 11, 11, 12, 15, 13, 6, 20, 17, 13, 13, 22, 8, 15, 4, 24, 11, 10, 24, 13, 12, 18, 13, 15, 13, 13, 13, 9, 12, 23, 11, 12, 24, 24, 23, 24, 10, 17, 11, 24, 24, 6, 22, 24, 19, 8, 12, 18, 12, 2, 19, 25, 6, 6, 10, 17, 12, 10, 5, 25, 15, 12, 9, 18, 8, 7, 18, 23, 18, 8, 22, 9, 3, 17, 3, 9, 7, 5, 3, 10, 9, 20, 12, 11, 24, 23, 18, 17, 23, 1, 15, 8, 9, 4, 24, 22, 13, 20, 22, 11, 15, 10, 15, 22, 11, 5, 12, 12, 19, 1, 13, 6, 9, 9, 15, 19, 19, 19, 9, 10, 17, 15, 17, 5, 24, 10, 9, 3, 23, 22, 13, 15, 15, 12, 24, 11, 9, 15, 22, 11, 8, 22, 12, 12, 22, 6, 22, 11, 18, 8, 22, 2, 4, 13, 23, 23, 23, 23, 15, 9, 23, 24, 23, 24, 9, 13, 7, 23, 12, 8, 10, 12, 23, 22, 10, 10, 23, 9, 19, 3, 15, 13, 12, 13, 13, 15, 10, 17, 9, 15, 13, 13, 15, 17, 12, 13, 13, 19, 11, 17, 3, 3, 18, 12, 13, 15, 15, 19, 9, 15, 10, 8, 13, 12, 22, 17, 17, 15, 5, 12, 15, 23, 18, 13, 17, 24, 11, 22, 13, 5, 14, 5, 5, 13, 15, 11, 7, 11, 24, 9, 7, 13, 13, 17, 15, 6, 14, 18, 23, 11, 24, 19, 8, 25, 11, 17, 13, 7, 23, 13, 22, 15, 24, 3, 22, 7, 17, 7, 19, 24, 12, 12, 22, 1, 12, 17, 12, 24, 10, 17, 6, 19, 15, 12, 18, 15, 12, 19, 19, 19, 23, 24, 17, 3, 13, 11, 12, 12, 6, 13, 13, 6, 18, 18, 20, 19, 4, 1, 18, 11, 17, 13, 7, 8, 18, 19, 12, 23, 13, 23, 10, 23, 24, 11, 10, 15, 19, 19, 11, 17, 2, 21, 13, 22, 15, 3, 13, 24, 20, 15, 17, 11, 1, 7, 4, 12, 12, 12, 17, 12, 18, 3, 22, 23, 8, 23, 18, 10, 17, 13, 12, 23, 13, 7, 24, 23, 21, 18, 10, 24, 7, 18, 23, 5, 22, 8, 11, 13, 7, 8, 9, 7, 13, 18, 15, 9, 8, 5, 3, 7, 15, 15, 5, 20, 22, 25, 9, 19, 15, 24, 24, 14, 11, 13, 4, 13, 19, 2, 7, 13, 24, 8, 12, 11, 12, 11, 11, 12, 10, 3, 17, 3, 11, 7, 17, 6, 12, 11, 8, 12, 11, 15, 4, 17, 22, 3, 11, 13, 19, 3, 18, 12, 20, 13, 2, 10, 12, 12, 13, 2, 21, 24, 12, 24, 23, 15, 17, 13, 17, 15, 22, 25, 24, 2, 3, 3, 11, 11, 9, 18, 13, 22, 7, 17, 11, 12, 24, 5, 19, 8, 9, 10, 23, 12, 19, 24, 12, 24, 17, 24, 17, 24, 22, 19, 13, 19, 22, 21, 22, 7, 12, 17, 1, 23, 24, 11, 22, 3, 13, 12, 21, 19, 8, 21, 18, 6, 6, 24, 23, 23, 19, 23, 10, 13, 7, 22, 5, 12, 19, 23, 24, 19, 17, 23, 4, 12, 12, 3, 9, 13, 13, 1, 15, 11, 11, 22, 8, 12, 18, 3, 15, 13, 13, 18, 9, 17, 2, 17, 18, 13, 23, 4, 13, 15, 6, 22, 4, 13, 3, 24, 5, 12, 7, 24, 14, 15, 15, 9, 15, 3, 4, 7, 18, 13, 11, 24, 24, 9, 23, 13, 22, 15, 12, 7, 6, 24, 19, 20, 12, 9, 2, 14, 18, 4, 10, 24, 3, 18, 22, 12, 20, 8, 11, 13, 21, 23, 13, 21, 9, 23, 13, 19, 24, 18, 23, 12, 1, 8, 23, 4, 5, 24, 2, 17, 23, 2, 24, 15, 13, 10, 13, 6, 15, 12, 21, 1, 9, 6, 9, 24, 11, 23, 8, 19, 19, 10, 9, 7, 8, 12, 18, 7, 17, 3, 10, 22, 3, 3, 1, 13, 11, 12, 18, 15, 19, 22, 17, 8, 24, 24, 4, 5, 22, 14, 13, 5, 6, 19, 12, 22, 23, 8, 11, 11, 15, 24, 13, 15, 10, 12, 12, 22, 12, 24, 2, 7, 12, 23, 10, 18, 12, 12, 5, 24, 12, 8, 15, 17, 24, 13, 17, 18, 9, 24, 9, 23, 14, 18, 8, 4, 10, 7, 21, 19, 17, 23, 15, 22, 22, 18, 23, 18, 18, 22, 23, 8, 23, 7, 6, 22, 4, 12, 19, 24, 18, 8, 15, 1, 23, 11, 11, 17, 12, 15, 15, 19, 8, 1, 18, 11, 25, 22, 18, 7, 19, 2, 4, 24, 13, 9, 19, 19, 10, 11, 19, 13, 2, 12, 24, 19, 17, 12, 15, 17, 25, 23, 18, 12, 12, 15, 21, 14, 15, 22, 6, 24, 15, 19, 18, 15, 24, 23, 23, 6, 7, 7, 2, 8, 9, 5, 12, 8, 7, 2, 10, 8, 14, 13, 15, 18, 3, 10, 23, 6, 8, 24, 22, 15, 2, 2, 2, 2, 10, 13, 3, 9, 13, 22, 5, 23, 8, 18, 12, 8, 13, 22, 3, 11, 13, 22, 19, 13, 2, 2, 19, 25, 18, 25, 17, 15, 9, 17, 18, 13, 24, 8, 23, 13, 13, 15, 8, 12, 19, 13, 13, 4, 22, 10, 21, 13, 24, 8, 8, 9, 11, 22, 1, 6, 15, 15, 13, 13, 22, 25, 19, 23, 12, 8, 13, 12, 24, 4, 15, 19, 10, 7, 24, 4, 17, 12, 17, 24, 13, 24, 13, 18, 22, 12, 2, 15, 12, 19, 11, 1, 23, 13, 24, 15, 13, 13, 23, 22, 13, 15, 8, 15, 12, 13, 13, 22, 11, 4, 19, 12, 18, 6, 3, 23, 4, 8, 12, 13, 7, 23, 12, 17, 22, 22, 15, 24, 11, 22, 8, 18, 22, 13, 15, 9, 7, 24, 23, 10, 5, 23, 1, 23, 9, 10, 15, 10, 25, 25, 13, 15, 14, 12, 18, 13, 11, 8, 18, 11, 12, 15, 22, 10, 25, 2, 6, 17, 14, 15, 14, 11, 12, 13, 11, 8, 24, 15, 23, 17, 10, 23, 23, 10, 17, 4, 7, 13, 3, 14, 12, 22, 19, 23, 25, 15, 21, 22, 12, 10, 1, 5, 21, 13, 15, 22, 22, 8, 18, 12, 1, 13, 23, 2, 18, 12, 4, 7, 13, 13, 24, 14, 12, 13, 13, 15, 4, 24, 13, 25, 12, 15, 24, 4, 4, 7, 18, 2, 12, 22, 15, 11, 23, 15, 15, 13, 23, 17, 1, 23, 13, 21, 19, 8, 13, 11, 15, 18, 24, 17, 22, 24, 10, 15, 18, 10, 5, 3, 17, 15, 17, 17, 23, 12, 24, 14, 12, 10, 23, 15, 12, 13, 1, 8, 17, 13, 2, 5, 19, 25, 12, 15, 13, 13, 24, 12, 17, 8, 15, 22, 2, 18, 12, 17, 18, 17, 18, 8, 12, 22, 8, 15, 19, 20, 12, 5, 17, 22, 12, 24, 7, 8, 13, 12, 7, 11, 8, 19, 15, 23, 12, 18, 19, 5, 19, 24, 19, 18, 2, 4, 7, 17, 19, 15, 8, 13, 15, 12, 23, 13, 24, 3, 11, 17, 15, 22, 15, 15, 22, 20, 24, 13, 5, 1, 19, 14, 5, 15, 18, 24, 24, 11, 22, 15, 3, 4, 9, 13, 3, 3, 23, 19, 19, 22, 17, 18, 18, 18, 7, 13, 24, 13, 5, 17, 24, 22, 24, 15, 24, 23, 5, 12, 22, 22, 19, 15, 12, 23, 24, 19, 19, 13, 15, 18, 15, 12, 3, 18, 15, 19, 3, 17, 24, 9, 8, 22, 8, 17, 8, 15, 25, 11, 19, 18, 15, 23, 14, 19, 18, 12, 19, 2, 19, 9, 14, 22, 24, 12, 14, 3, 4, 21, 19, 17, 21, 3, 9, 23, 23, 24, 15, 13, 11, 10, 12, 9, 18, 22, 24, 16, 7, 4, 24, 3, 12, 24, 18, 12, 13, 19, 18, 8, 2, 8, 9, 6, 17, 19, 2, 12, 7, 23, 17, 20, 13, 12, 24, 5, 18, 9, 13, 24, 9, 13, 18, 23, 24, 18, 22, 13, 6, 12, 9, 15, 5, 9, 13, 19, 19, 23, 3, 10, 19, 15, 3, 15, 25, 5, 12, 3, 10, 10, 13, 23, 1, 13, 22, 17, 17, 15, 8, 20, 22, 3, 5, 24, 11, 18, 17, 5, 13, 15, 24, 24, 23, 10, 23, 13, 13, 22, 22, 18, 7, 3, 10, 18, 9, 22, 2, 8, 24, 8, 3, 13, 13, 24, 12, 12, 23, 17, 23, 10, 8, 18, 22, 18, 11, 15, 15, 13, 17, 12, 25, 22, 7, 23, 24, 23, 19, 13, 23, 18, 13, 13, 13, 19, 24, 11, 12]
# 计算f
data['F(月平消费次数)']=data['付款次数']/period_month
data
总金额 | 订单付款时间 | 付款次数 | R(最后一次消费时间) | F(月平消费次数) | |
---|---|---|---|---|---|
买家会员名 | |||||
00牛哥哥00 | 402.00 | 2017-02-06 | 2 | 693 days | 1.000000 |
020luo | 74.70 | 2017-11-18 | 1 | 408 days | 0.090909 |
0587xueguangju | 268.00 | 2017-04-14 | 1 | 626 days | 0.250000 |
0o秋天de童话 | 411.50 | 2018-10-09 | 2 | 83 days | 0.090909 |
0残缺0 | 48.86 | 2018-01-19 | 1 | 346 days | 0.076923 |
... | ... | ... | ... | ... | ... |
黑河市2013 | 47.88 | 2018-01-11 | 1 | 354 days | 0.076923 |
黑瑾瞳 | 158.44 | 2018-07-26 | 2 | 158 days | 0.105263 |
鼠标右键点 | 51.87 | 2018-12-12 | 1 | 19 days | 0.041667 |
龙星宇1018 | 198.00 | 2017-11-17 | 1 | 409 days | 0.090909 |
龙魂爱上凤灵 | 43.86 | 2017-12-13 | 1 | 383 days | 0.083333 |
1483 rows × 5 columns
(8)更改M为列名,对数据进行标准化
data['m(月平均消费金额)']=data['总金额']/period_month
data
总金额 | 订单付款时间 | 付款次数 | R(最后一次消费时间) | F(月平消费次数) | m(月平均消费金额) | |
---|---|---|---|---|---|---|
买家会员名 | ||||||
00牛哥哥00 | 402.00 | 2017-02-06 | 2 | 693 days | 1.000000 | 201.000000 |
020luo | 74.70 | 2017-11-18 | 1 | 408 days | 0.090909 | 6.790909 |
0587xueguangju | 268.00 | 2017-04-14 | 1 | 626 days | 0.250000 | 67.000000 |
0o秋天de童话 | 411.50 | 2018-10-09 | 2 | 83 days | 0.090909 | 18.704545 |
0残缺0 | 48.86 | 2018-01-19 | 1 | 346 days | 0.076923 | 3.758462 |
... | ... | ... | ... | ... | ... | ... |
黑河市2013 | 47.88 | 2018-01-11 | 1 | 354 days | 0.076923 | 3.683077 |
黑瑾瞳 | 158.44 | 2018-07-26 | 2 | 158 days | 0.105263 | 8.338947 |
鼠标右键点 | 51.87 | 2018-12-12 | 1 | 19 days | 0.041667 | 2.161250 |
龙星宇1018 | 198.00 | 2017-11-17 | 1 | 409 days | 0.090909 | 18.000000 |
龙魂爱上凤灵 | 43.86 | 2017-12-13 | 1 | 383 days | 0.083333 | 3.655000 |
1483 rows × 6 columns
# 标准化
cdata=data[['R(最后一次消费时间)','F(月平消费次数)','m(月平均消费金额)']]
# 修改索引
cdata.index = data.index
cdata
R(最后一次消费时间) | F(月平消费次数) | m(月平均消费金额) | |
---|---|---|---|
买家会员名 | |||
00牛哥哥00 | 693 days | 1.000000 | 201.000000 |
020luo | 408 days | 0.090909 | 6.790909 |
0587xueguangju | 626 days | 0.250000 | 67.000000 |
0o秋天de童话 | 83 days | 0.090909 | 18.704545 |
0残缺0 | 346 days | 0.076923 | 3.758462 |
... | ... | ... | ... |
黑河市2013 | 354 days | 0.076923 | 3.683077 |
黑瑾瞳 | 158 days | 0.105263 | 8.338947 |
鼠标右键点 | 19 days | 0.041667 | 2.161250 |
龙星宇1018 | 409 days | 0.090909 | 18.000000 |
龙魂爱上凤灵 | 383 days | 0.083333 | 3.655000 |
1483 rows × 3 columns
z_cdata=(cdata-cdata.mean())/cdata.std()
#重命名列名
z_cdata.columns=['R(标准化)','F(标准化)','m(标准化)']
z_cdata
R(标准化) | F(标准化) | m(标准化) | |
---|---|---|---|
买家会员名 | |||
00牛哥哥00 | 1.926851 | 4.432309 | 2.781167 |
020luo | 0.469973 | -0.211456 | -0.304766 |
0587xueguangju | 1.584357 | 0.601203 | 0.651941 |
0o秋天de童话 | -1.191378 | -0.211456 | -0.115461 |
0残缺0 | 0.153038 | -0.282899 | -0.352951 |
... | ... | ... | ... |
黑河市2013 | 0.193933 | -0.282899 | -0.354149 |
黑瑾瞳 | -0.807990 | -0.138134 | -0.280168 |
鼠标右键点 | -1.518537 | -0.462993 | -0.378330 |
龙星宇1018 | 0.475085 | -0.211456 | -0.126656 |
龙魂爱上凤灵 | 0.342177 | -0.250154 | -0.354595 |
1483 rows × 3 columns
(9)存储预处理后的文件
data.to_csv('/data/bigfiles/client.csv')
2、数据分析
(1)读取预处理后的文件
data=pd.read('/data/bigfiles/client.csv')
(2)利用肘部法确定k的值(图像展示)
# 用SSE来记录每次聚集类后样本到中心的欧式距离
SSE=[]
# 分别聚类为1~9个类别
for k in range(1,9):estimator =KMeans(n_clusters=k)estimator.fit(z_cdata)
# 样本到最近聚类中心的距离平方之和SSE.append(estimator.inertia_)
#设置x轴数据
X=range(1,9)
#设置字体
plt.rcParams['font.sans-serif']=['SimHei']
#开始绘图
plt.plot(X,SSE,'o-')
plt.xlabel('k')
plt.ylabel('SSE')
plt.title("肘部图")
plt.show()
(3)建立KMeans模型
# 聚类分析
kmodel=KMeans(n_clusters=4,n_init=4,max_iter=100,random_state = 0)
kmodel.fit(z_cdata)
KMeans(max_iter=100, n_clusters=4, n_init=4, random_state=0)
(4)输出各个簇的质心
#查看每条数据所属的聚类类别
kmodel.labels_
#查看聚类中心坐标
kmodel.cluster_centers_
array([[ 1.57670505, 1.17239812, 0.98112868],[-1.03013307, -0.37085365, -0.28728299],[ 0.43389504, -0.13530733, -0.18934963],[ 1.70098269, 4.71659247, 5.44718135]])
(5)存储客户类型文件
# 统计所属各个类别的数据个数
r1=pd.Series(kmodel.labels_).value_counts()
r2=pd.DataFrame(kmodel.cluster_centers_)
# 连接labels_与z_cdata
result=pd.concat([r2,r1],axis=1)
#重命名列名
result.columns=['R','F','M']+['类别']
result
R | F | M | 类别 | |
---|---|---|---|---|
0 | 1.576705 | 1.172398 | 0.981129 | 157 |
1 | -1.030133 | -0.370854 | -0.287283 | 587 |
2 | 0.433895 | -0.135307 | -0.189350 | 712 |
3 | 1.700983 | 4.716592 | 5.447181 | 27 |
# 连接labels_与z_cdata
KM_data=pd.concat([z_cdata,pd.Series(kmodel.labels_,index=z_cdata.index)],axis=1)
data1=pd.concat([data,pd.Series(kmodel.labels_,index=data.index)],axis=1)
#重命名列名
data1.columns=list(data.columns)+['类别']
KM_data.columns=['R','F','M']+['类别']
KM_data.head()
#买家会员名列与类名标签对应
KM_data['买家会员名']=KM_data.index
3、数据可视化(对每个类型客户标准化后的R、F、M数据分别进行图像展示)
# 分组统计求均值
kmeans_analysis =KM_data.groupby(KM_data['类别']).mean()
#重命名列名
kmeans_analysis.columns=['R','F','M']
kmeans_analysis
R | F | M | |
---|---|---|---|
类别 | |||
0 | 1.580417 | 1.183741 | 0.988757 |
1 | -1.030133 | -0.370854 | -0.287283 |
2 | 0.436287 | -0.134135 | -0.187744 |
3 | 1.700983 | 4.716592 | 5.447181 |
#绘制柱状图
kmeans_analysis.plot(kind ='bar',rot=0,yticks=range(-1,9))
#完善图表
plt.title("聚类结果统计柱状图")
plt.xticks(range(0,4),['第0类','第1类','第2类','第3类'])
plt.grid(axis='y',color='grey',linestyle='--',alpha=0.5)
plt.ylabel("R,F,M 3个指标均值")
plt.savefig("聚类结果统计柱状图",dpi=128)
4、分析评价
实验总结:
通过本次实验,我们学习了如何使用numpy和pandas库处理数据,掌握了使用RFM分析模型对客户信息进行特征提取的方法。
同时,我们还学会了如何对特征数据进行标准化处理,以及使用Sklearn库实现K-Means聚类算法及其评价方法。
最后,我们利用matplotlib结合pandas库对数据分析进行了可视化处理。
在实验过程中,我们首先使用pandas等库完成了数据的预处理,计算了R、F、M三个特征指标,并将处理好的文件进行了保存。
接着,我们使用pandas等库完成了数据的标准化处理。然后,我们利用Sklearn库和RFM分析方法建立了聚类模型,完成了对客户价值的聚类分析,
并对聚类结果进行了评价。最后,我们结合pandas、matplotlib库对聚类完成的结果进行了可视化处理。
通过本次实验,我们对客户价值分析有了更深入的了解,掌握了相关的数据处理和分析方法,为今后的数据分析工作打下了坚实的基础。