1.更改DBS表中的location字段
2.更改SDS表中的location字段
非分区表:
UPDATE DBS
JOIN TBLS
JOIN SDS ON DBS.DB_ID = TBLS.DB_ID
AND TBLS.SD_ID = SDS.SD_ID
SET SDS.LOCATION = REPLACE ( `LOCATION`, '原始路径前缀', '目标路径前缀‘ )
WHERE
DBS.`NAME` = '库名'
AND TBLS.TBL_TYPE = 'MANAGED_TABLE'
AND SDS.LOCATION LIKE ’原始路径前缀%'
分区表:
UPDATE DBS
JOIN TBLS
JOIN SDS ON DBS.DB_ID = TBLS.DB_ID
AND TBLS.SD_ID = SDS.SD_ID
SET SDS.LOCATION = REPLACE ( `LOCATION`, '原始路径前缀', '目标路径前缀‘ )
WHERE
DBS.`NAME` = '库名'
AND TBLS.TBL_TYPE = 'MANAGED_TABLE'
AND SDS.LOCATION LIKE ’原始路径前缀%'
3.同步数据文件
aws s3 sync 原始路径 目标路径
示例:
aws s3 rm s3://bigdata-database-meta-cn-northwest-1-prod/globalbi.db --recursive
aws s3 sync s3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi s3://bigdata-database-meta-cn-northwest-1-prod/globalbi.db
非分区表:
UPDATE DBS
JOIN TBLS
JOIN `PARTITIONS`
JOIN SDS ON DBS.DB_ID = TBLS.DB_ID
AND TBLS.TBL_ID = `PARTITIONS`.TBL_ID
AND `PARTITIONS`.SD_ID = SDS.SD_ID
SET SDS.LOCATION = REPLACE ( `LOCATION`, 's3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi', 's3://bigdata-database-meta-cn-northwest-1-prod/globalbi.db' )
WHERE-- DBS.`NAME` = 'debi'
-- AND TBLS.TBL_TYPE = 'MANAGED_TABLE'
-- AND
SDS.LOCATION LIKE 's3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi%'
SELECT
DBS.`NAME`,
TBLS.`TBL_NAME`,
TBLS.TBL_TYPE,
SDS.LOCATION
FROM
DBS
JOIN TBLS
JOIN SDS ON DBS.DB_ID = TBLS.DB_ID
AND TBLS.SD_ID = SDS.SD_ID
WHERE
DBS.`NAME` = 'globalbi' -- AND TBLS.TBL_TYPE = 'MANAGED_TABLE'
-- AND
-- SDS.LOCATION LIKE 's3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi%'
ORDER BY
location
分区表:
UPDATE DBS
JOIN TBLS
JOIN `PARTITIONS`
JOIN SDS ON DBS.DB_ID = TBLS.DB_ID
AND TBLS.TBL_ID = `PARTITIONS`.TBL_ID
AND `PARTITIONS`.SD_ID = SDS.SD_ID
SET SDS.LOCATION = REPLACE ( `LOCATION`, 's3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi', 's3://bigdata-database-meta-cn-northwest-1-prod/globalbi.db' )
WHERE
-- DBS.`NAME` = 'debi'
-- AND TBLS.TBL_TYPE = 'MANAGED_TABLE'
-- AND
SDS.LOCATION LIKE 's3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi%'
SELECT
DBS.`NAME`,
TBLS.`TBL_NAME`,
TBLS.TBL_TYPE,
`PARTITIONS`.TBL_ID,
`PARTITIONS`.SD_ID,
SDS.LOCATION
FROM
DBS
JOIN TBLS
JOIN `PARTITIONS`
JOIN SDS ON DBS.DB_ID = TBLS.DB_ID
AND TBLS.TBL_ID = `PARTITIONS`.TBL_ID
AND `PARTITIONS`.SD_ID = SDS.SD_ID
WHERE
DBS.`NAME` = 'globalbi' -- AND TBLS.TBL_TYPE = 'MANAGED_TABLE'
AND SDS.LOCATION LIKE 's3://bigdata-analysis-bikpi-cn-northwest-1/prod/global/globalbi%'
ORDER BY
location
本文档详细介绍了如何使用SQL更新Hive的location字段,以实现数据存储路径的更改,同时提供了非分区表和分区表的操作示例。此外,还展示了如何使用awss3sync命令同步S3上的数据文件,确保数据迁移的完整性和一致性。适用于大数据环境中的数据治理和迁移场景。
1852

被折叠的 条评论
为什么被折叠?



