一、乱象
1.1 中文注释乱码
hive> DESCRIBE test;
OK
# col_name data_type comment
id string ??ID ??
pcs string ?????
mzmc string ????
gzdb_addtime string ???????
swdd string ????
swyydm string ??????
dz string ????
xm string ??
gjmc string ????
zt string ??
二、平乱
2.1 建库指定 utf8 编码
CREATE DATABASE my_database
COMMENT 'My database'
LOCATION 'hdfs://localhost:9000/user/hive/warehouse/my_database.db'
WITH DBPROPERTIES ('charset'='utf8');
2.2 建表指定 utf8 编码
CREATE TABLE `test_table`(`xm` string COMMENT '姓名', `xb` string COMMENT '性别', `nl` string COMMENT '年龄', `czdz` string COMMENT '常住地址', `hjdz` string COMMENT '户籍地址', `csrq` string COMMENT '出生日期',
COMMENT '测试'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES ('charset'='utf8');
实际上以上方式都不能解决编码问题,根本原因并不出在 hive 上,而是存储 hive 元数据的 mysql 数据库上面
2.3 mysql 编码查看与修改
2.3.1 修改hive元数据库编码
(1)查看hive元数据库编码(显示:utf8mb3)
mysql> show create database hive;
+----------+-----------------------------------------------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------------------------------------------+
| hive | CREATE DATABASE `hive` /*!40100 DEFAULT CHARACTER SET utf8mb3 */ /*!80016 DEFAULT ENCRYPTION='N' */ |
+----------+-----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
(2) 修改编码为 latin1
mysql> alter database hive character set latin1;
Query OK, 1 row affected (0.01 sec)
2.3.2 修改表编码
(1)查看hive库中有哪些表
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -ADatabase changed
mysql> show tables;
+-------------------------------+
| Tables_in_hive |
+-------------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDH_VERSION |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_COMPACTIONS |
| COMPLETED_TXN_COMPONENTS |
| CTLGS |
| CTLGS_bak20230606 |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| I_SCHEMA |
| KEY_CONSTRAINTS |
| MASTER_KEYS |
| MATERIALIZATION_REBUILD_LOCKS |
| METASTORE_DB_PROPERTIES |
| MIN_HISTORY_LEVEL |
| MV_CREATION_METADATA |
| MV_TABLES_USED |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NEXT_WRITE_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| REPL_TXN_MAP |
| ROLES |
| ROLE_MAP |
| RUNTIME_STATS |
| SCHEMA_VERSION |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TXN_TO_WRITE_ID |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WM_MAPPING |
| WM_POOL |
| WM_POOL_TO_TRIGGER |
| WM_RESOURCEPLAN |
| WM_TRIGGER |
| WRITE_SET |
+-------------------------------+
76 rows in set (0.00 sec)
(2)需要修改如下表编码
# COLUMNS_V2
# TABLE_PARAMS
# PARTITION_PARAMS
# PARTITION_KEYS
# INDEX_PARAMS
(3)查看目前表编码
#查看表 COLUMNS_V2 (其他略)
mysql> show create table COLUMNS_V2;
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| COLUMNS_V2 | CREATE TABLE `COLUMNS_V2` (`CD_ID` bigint NOT NULL,`COMMENT` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,`COLUMN_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,`TYPE_NAME` mediumtext,`INTEGER_IDX` int NOT NULL,PRIMARY KEY (`CD_ID`,`COLUMN_NAME`),KEY `COLUMNS_V2_N49` (`CD_ID`),CONSTRAINT `COLUMNS_V2_FK1` FOREIGN KEY (`CD_ID`) REFERENCES `CDS` (`CD_ID`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
(4)修改以上表编码
mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;mysql> alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;mysql> alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;mysql> alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;mysql> alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
(5)查看表编码变更信息
#可以看到编码已经被设置为 utf8mb3
mysql> show create table COLUMNS_V2;
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| COLUMNS_V2 | CREATE TABLE `COLUMNS_V2` (`CD_ID` bigint NOT NULL,`COMMENT` varchar(256) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,`COLUMN_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,`TYPE_NAME` mediumtext,`INTEGER_IDX` int NOT NULL,PRIMARY KEY (`CD_ID`,`COLUMN_NAME`),KEY `COLUMNS_V2_N49` (`CD_ID`),CONSTRAINT `COLUMNS_V2_FK1` FOREIGN KEY (`CD_ID`) REFERENCES `CDS` (`CD_ID`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
(5)重新创建 hive 库和表
#这里建库和建表就没有像之前那样指定 utf-8 编码去建了
#可以看到已经没有中文乱码了
hive> desc test;
OK
xm string 姓名
sfzhm string 身份证号码
xbdm string 性别代码
xb string 性别
csrq string 出生日期
hjdz string 户籍地址
lxdh string 联系电话
三、应变之策
如果以上并没有解决你的乱码问题,可以尝试如下方法:
修改 hive-site.xml 配置,如下是我的默认配置
<property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop105:3306/hive?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBC metastore</description>
</property>
加入:useUnicode=true&characterEncoding=UTF-8 修改为:
<property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop105:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value><description>JDBC connect string for a JDBC metastore</description>
</property>
重启后再去建库建表