MySQL中hase与tree的区别_通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出...

1、什么是 Sqoop？

??Sqoop 是一种用于在 Hadoop 和关系数据库或大型机之间传输数据的工具。

??您可以使用 Sqoop 将数据从关系数据库管理系统RDBMS(如 MySQL 或 Oracle)导入 Hadoop 分布式文件系统 HDFS，转换 Hadoop MapReduce 中的数据，然后将数据导出回 RDBMS。

??Sqoop 自动执行此过程的大部分过程，依靠数据库来描述要导入的数据的模式。Sqoop 使用 MapReduce 导入和导出数据，提供并行操作和容错。

2、下载应用程序及配置环境变量

2.1、下载 Sqoop 1.4.7

??通过以下命令下载 Sqoop，解压后，放到/home/work/_app/ 目录中：

[[email protected] _src]# pwd

/home/work/_src

[[email protected] _src]# wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

[[email protected] _src]# tar -xzvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

[[email protected] _src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /home/work/_app/

2.2、设置环境变量

??在每一台机器上设置 Sqoop 环境变量，运行以下命令

echo "" >> /etc/bashrc

echo "# Sqoop 1.4.7" >> /etc/bashrc

echo "export SQOOP_HOME=/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0" >> /etc/bashrc

echo "" >> /etc/bashrc

echo "# Path" >> /etc/bashrc

echo "export PATH=\$PATH:\$SQOOP_HOME/bin" >> /etc/bashrc

source /etc/bashrc

2.3、设置安装所需环境

3、安装 Sqoop 1.4.7

3.1、修改 Sqoop 配置文件

3.1.1、修改配置文件 sqoop-env.sh

?? 创建 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh 文件编辑并保存，内容为空，因为我们在本文的配置环境变量章节中已经配置了环境变量，同时也在一文中配置了 Hive 和 Hadoop 环境变量：

[[email protected] ~]# echo "" > /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh

??将 /home/work/_app/hive-2.3.4/lib/ 目录下的 hive-hcatalog-core-2.3.4.jar、mysql-connector-java-5.1.47-bin.jar、hive-common-2.3.4.jar、libthrift-0.9.3.jar 文件，复制到/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib 目录下

[[email protected] ~]# cp /home/work/_app/hive-2.3.4/lib/hive-hcatalog-core-2.3.4.jar /home/work/_app/hive-2.3.4/lib/mysql-connector-java-5.1.47-bin.jar /home/work/_app/hive-2.3.4/lib/libthrift-0.9.3.jar /home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/

3.1.2、修改配置文件 configure-sqoop

??编辑 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop 文件并保存，内容如下：

[[email protected] _src]# cat /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop

#!/bin/bash

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements. See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership. The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License. You may obtain a copy of the License at

# http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# This is sourced in by bin/sqoop to set environment variables prior to

# invoking Hadoop.

bin="$1"

if [ -z "${bin}" ]; then

bin=`dirname $0`

bin=`cd ${bin} && pwd`

if [ -z "$SQOOP_HOME" ]; then

export SQOOP_HOME=${bin}/..

SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}

if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then

. "${SQOOP_CONF_DIR}/sqoop-env.sh"

# Find paths to our dependency systems. If they are unset, use CDH defaults.

if [ -z "${HADOOP_COMMON_HOME}" ]; then

if [ -n "${HADOOP_HOME}" ]; then

HADOOP_COMMON_HOME=${HADOOP_HOME}

else

if [ -d "/usr/lib/hadoop" ]; then

HADOOP_COMMON_HOME=/usr/lib/hadoop

else

HADOOP_COMMON_HOME=${SQOOP_HOME}/../hadoop

if [ -z "${HADOOP_MAPRED_HOME}" ]; then

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then

if [ -n "${HADOOP_HOME}" ]; then

HADOOP_MAPRED_HOME=${HADOOP_HOME}

else

HADOOP_MAPRED_HOME=${SQOOP_HOME}/../hadoop-mapreduce

# We are setting HADOOP_HOME to HADOOP_COMMON_HOME if it is not set

# so that hcat script works correctly on BigTop

if [ -z "${HADOOP_HOME}" ]; then

if [ -n "${HADOOP_COMMON_HOME}" ]; then

HADOOP_HOME=${HADOOP_COMMON_HOME}

export HADOOP_HOME

if [ -z "${HBASE_HOME}" ]; then

if [ -d "/usr/lib/hbase" ]; then

HBASE_HOME=/usr/lib/hbase

else

HBASE_HOME=${SQOOP_HOME}/../hbase

#if [ -z "${HCAT_HOME}" ]; then

# if [ -d "/usr/lib/hive-hcatalog" ]; then

# HCAT_HOME=/usr/lib/hive-hcatalog

# elif [ -d "/usr/lib/hcatalog" ]; then

# HCAT_HOME=/usr/lib/hcatalog

# else

# HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog

# if [ ! -d ${HCAT_HOME} ]; then

# HCAT_HOME=${SQOOP_HOME}/../hcatalog

# fi

#fi

#if [ -z "${ACCUMULO_HOME}" ]; then

# if [ -d "/usr/lib/accumulo" ]; then

# ACCUMULO_HOME=/usr/lib/accumulo

# else

# ACCUMULO_HOME=${SQOOP_HOME}/../accumulo

# fi

#fi

if [ -z "${ZOOKEEPER_HOME}" ]; then

if [ -d "/usr/lib/zookeeper" ]; then

ZOOKEEPER_HOME=/usr/lib/zookeeper

else

ZOOKEEPER_HOME=${SQOOP_HOME}/../zookeeper

if [ -z "${HIVE_HOME}" ]; then

if [ -d "/usr/lib/hive" ]; then

export HIVE_HOME=/usr/lib/hive

elif [ -d ${SQOOP_HOME}/../hive ]; then

export HIVE_HOME=${SQOOP_HOME}/../hive

# Check: If we can't find our dependencies, give up here.

if [ ! -d "${HADOOP_COMMON_HOME}" ]; then

echo "Error: $HADOOP_COMMON_HOME does not exist!"

echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'

exit 1

if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then

echo "Error: $HADOOP_MAPRED_HOME does not exist!"

echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'

exit 1

## Moved to be a runtime check in sqoop.

if [ ! -d "${HBASE_HOME}" ]; then

echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."

echo 'Please set $HBASE_HOME to the root of your HBase installation.'

## Moved to be a runtime check in sqoop.

#if [ ! -d "${HCAT_HOME}" ]; then

# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."

# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'

#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then

# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."

# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'

#fi

if [ ! -d "${ZOOKEEPER_HOME}" ]; then

echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."

echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'

# Where to find the main Sqoop jar

SQOOP_JAR_DIR=$SQOOP_HOME

# If there's a "build" subdir, override with this, so we use

# the newly-compiled copy.

if [ -d "$SQOOP_JAR_DIR/build" ]; then

SQOOP_JAR_DIR="${SQOOP_JAR_DIR}/build"

function add_to_classpath() {

dir=$1

for f in $dir/*.jar; do

SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;

done

export SQOOP_CLASSPATH

}

# Add sqoop dependencies to classpath.

SQOOP_CLASSPATH=""

if [ -d "$SQOOP_HOME/lib" ]; then

add_to_classpath $SQOOP_HOME/lib

# Add HBase to dependency list

if [ -e "$HBASE_HOME/bin/hbase" ]; then

TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`$HBASE_HOME/bin/hbase classpath`

SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}

# Add HCatalog to dependency list

if [ -e "${HCAT_HOME}/bin/hcat" ]; then

TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`

if [ -z "${HIVE_CONF_DIR}" ]; then

TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}

SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}

# Add Accumulo to dependency list

if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then

for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do

SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn

done

for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do

SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn

done

ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}

if [ -d "${ZOOCFGDIR}" ]; then

SQOOP_CLASSPATH=$ZOOCFGDIR:$SQOOP_CLASSPATH

SQOOP_CLASSPATH=${SQOOP_CONF_DIR}:${SQOOP_CLASSPATH}

# If there's a build subdir, use Ivy-retrieved dependencies too.

if [ -d "$SQOOP_HOME/build/ivy/lib/sqoop" ]; then

for f in $SQOOP_HOME/build/ivy/lib/sqoop/*/*.jar; do

SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;

done

add_to_classpath ${SQOOP_JAR_DIR}

HADOOP_CLASSPATH="${SQOOP_CLASSPATH}:${HADOOP_CLASSPATH}"

if [ ! -z "$SQOOP_USER_CLASSPATH" ]; then

# User has elements to prepend to the classpath, forcibly overriding

# Sqoop's own lib directories.

export HADOOP_CLASSPATH="${SQOOP_USER_CLASSPATH}:${HADOOP_CLASSPATH}"

export SQOOP_CLASSPATH

export SQOOP_CONF_DIR

export SQOOP_JAR_DIR

export HADOOP_CLASSPATH

export HADOOP_COMMON_HOME

export HADOOP_MAPRED_HOME

export HBASE_HOME

export HCAT_HOME

export HIVE_CONF_DIR

export ACCUMULO_HOME

export ZOOKEEPER_HOME

3.2、查看 Sqoop 版本

[[email protected] _src]# sqoop version

2019-03-11 22:30:16,837 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

Sqoop 1.4.7

git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8

Compiled by maugli on Thu Dec 21 15:59:58 STD 2017

4、启动和测试 Sqoop 的数据导入、导出

4.1、Sqoop 通过 Hive 导入数据到 Sqoop

??在 mysql 中创建数据库 testmshk 并授权给 root 用户，同时创建 hive2mysql_mshk 表

[[email protected] _src]# mysql -uroot -p123456

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 21

Server version: 5.7.25 MySQL Community Server (GPL)

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE testmshk DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

Query OK, 1 row affected (0.00 sec)

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| hive |

| mysql |

| performance_schema |

| sys |

| testmshk |

+--------------------+

6 rows in set (0.02 sec)

mysql> grant select,insert,update,delete,create on testmshk.* to root;

Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;

Query OK, 0 rows affected (0.01 sec)

mysql> use testmshk;

Database changed

mysql> create table hive2mysql_mshk(id int,namea varchar(50),nameb varchar(50));

Query OK, 0 rows affected (0.02 sec)

mysql> quit;

Bye

??通过 Sqoop 查询 Mysql 中表的内容，这时可以看到表中的内容是空的

[[email protected] ~]# sqoop eval --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --query "select * from hive2mysql_mshk"

2019-03-11 23:44:06,894 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-11 23:44:06,945 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-11 23:44:07,100 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

-------------------------------------------------------------

| id | namea | nameb |

-------------------------------------------------------------

??在一文中，我们在测试 Hive 时创建了测试数据 /hive/warehouse/testtable/testdata001.dat 我们将这个数据，导入到 Mysql

[[email protected] ~]# sqoop export --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --table hive2mysql_mshk --export-dir /hive/warehouse/testtable/testdata001.dat --input-fields-terminated-by ','

2019-03-11 23:47:10,400 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-11 23:47:10,437 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-11 23:47:10,571 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

2019-03-11 23:47:10,574 INFO tool.CodeGenTool: Beginning code generation

2019-03-11 23:47:10,914 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-11 23:47:10,943 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-11 23:47:10,952 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/work/_app/hadoop-3.1.2

Note: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

2019-03-11 23:47:12,652 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.jar

2019-03-11 23:47:12,669 INFO mapreduce.ExportJobBase: Beginning export of hive2mysql_mshk

2019-03-11 23:47:12,669 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-03-11 23:47:12,804 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

2019-03-11 23:47:14,106 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2019-03-11 23:47:14,479 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

2019-03-11 23:47:14,808 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1552315366846_0003

2019-03-11 23:47:16,429 INFO input.FileInputFormat: Total input files to process : 1

2019-03-11 23:47:16,432 INFO input.FileInputFormat: Total input files to process : 1

2019-03-11 23:47:16,513 INFO mapreduce.JobSubmitter: number of splits:4

2019-03-11 23:47:16,577 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

2019-03-11 23:47:16,684 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552315366846_0003

2019-03-11 23:47:16,686 INFO mapreduce.JobSubmitter: Executing with tokens: []

2019-03-11 23:47:16,924 INFO conf.Configuration: resource-types.xml not found

2019-03-11 23:47:16,924 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2019-03-11 23:47:17,213 INFO impl.YarnClientImpl: Submitted application application_1552315366846_0003

2019-03-11 23:47:17,261 INFO mapreduce.Job: The url to track the job: http://c1:8088/proxy/application_1552315366846_0003/

2019-03-11 23:47:17,262 INFO mapreduce.Job: Running job: job_1552315366846_0003

2019-03-11 23:47:23,359 INFO mapreduce.Job: Job job_1552315366846_0003 running in uber mode : false

2019-03-11 23:47:23,360 INFO mapreduce.Job: map 0% reduce 0%

2019-03-11 23:47:31,454 INFO mapreduce.Job: map 75% reduce 0%

2019-03-11 23:47:32,462 INFO mapreduce.Job: map 100% reduce 0%

2019-03-11 23:47:32,473 INFO mapreduce.Job: Job job_1552315366846_0003 completed successfully

2019-03-11 23:47:32,619 INFO mapreduce.Job: Counters: 32

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=913424

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=782

HDFS: Number of bytes written=0

HDFS: Number of read operations=19

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=4

Data-local map tasks=4

Total time spent by all maps in occupied slots (ms)=23446

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=23446

Total vcore-milliseconds taken by all map tasks=23446

Total megabyte-milliseconds taken by all map tasks=24008704

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=636

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=582

CPU time spent (ms)=3960

Physical memory (bytes) snapshot=830259200

Virtual memory (bytes) snapshot=11165683712

Total committed heap usage (bytes)=454557696

Peak Map Physical memory (bytes)=208502784

Peak Map Virtual memory (bytes)=2793611264

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

2019-03-11 23:47:32,626 INFO mapreduce.ExportJobBase: Transferred 782 bytes in 18.5015 seconds (42.2668 bytes/sec)

2019-03-11 23:47:32,629 INFO mapreduce.ExportJobBase: Exported 2 records.

--export-dir 表示在 HDFS 对应的 Hive 数据库文件位置

–input-fields-terminated-by 表示要处理的间隔符

??再次通过 Sqoop 查看 MySql 中的内容，可以看到数据已经成功导入

[[email protected] ~]# sqoop eval --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --query "select * from hive2mysql_mshk"

2019-03-11 23:48:56,848 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-11 23:48:56,884 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-11 23:48:57,024 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

-------------------------------------------------------------

| id | namea | nameb |

-------------------------------------------------------------

| 10086 | my.mshk.top | you.mshk.top |

| 12306 | mname | yname |

-------------------------------------------------------------

??在 MySql 中能够看到我们创建的 hive2mysql_mshk 表有2行数据

4.2、Sqoop 通过 MySql 导入数据到 Hive

??刚刚我们创建的 hive2mysql_mshk 表没有任何主键，我们只是从 Hive 中添加了一些记录到 Mysql。

??默认情况下，Sqoop 将识别表中的主键列(如果存在)并将其用作拆分列。

??从数据库中检索拆分列的低值和高值，并且映射任务在总范围的大小均匀的组件上运行。

??如果主键的实际值在其范围内不均匀分布，则可能导致任务不平衡。

??您应该使用 --split-by 参数明确选择不同的列。例如-- split-by id。

??在将 MySql 的数据导入到 Hive中的 Sqoop 命令添加了更多参数：

sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --target-dir /hive/warehouse/mysql2hive_mshk --fields-terminated-by "," --hive-import --hive-table testmshk.mysql2hive_mshk

--split-by 用哪个列来拆分

--table 告诉计算机您要从MySQL导入哪个表

--target-dir

--hive-import 将表导入Hive

--hive-overwrite 覆盖Hive表中的现有数据

--hive-table 设置导入Hive时要使用的表名

--fields-terminated-by 设置字段分隔符

??接下来 Sqoop 的操作是一个 map-reduce 工作。

[[email protected] _src]# sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --target-dir /hive/warehouse/mysql2hive_mshk --fields-terminated-by "," --hive-import --hive-table testmshk.mysql2hive_mshk --hive-overwrite

2019-03-12 20:21:05,060 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-12 20:21:05,137 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-12 20:21:05,337 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

2019-03-12 20:21:05,348 INFO tool.CodeGenTool: Beginning code generation

2019-03-12 20:21:05,785 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-12 20:21:05,821 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-12 20:21:05,831 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/work/_app/hadoop-3.1.2

Note: /tmp/sqoop-root/compile/202a5bda3950a7ccc6782f3cfcc3a99d/hive2mysql_mshk.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

2019-03-12 20:21:08,747 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/202a5bda3950a7ccc6782f3cfcc3a99d/hive2mysql_mshk.jar

2019-03-12 20:21:08,761 WARN manager.MySQLManager: It looks like you are importing from mysql.

2019-03-12 20:21:08,761 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

2019-03-12 20:21:08,761 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

2019-03-12 20:21:08,762 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

2019-03-12 20:21:08,764 INFO mapreduce.ImportJobBase: Beginning import of hive2mysql_mshk

2019-03-12 20:21:08,765 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-03-12 20:21:08,928 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

2019-03-12 20:21:09,656 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2019-03-12 20:21:10,332 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

2019-03-12 20:21:10,688 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1552315366846_0011

2019-03-12 20:21:12,618 INFO db.DBInputFormat: Using read commited transaction isolation

2019-03-12 20:21:12,619 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `hive2mysql_mshk`

2019-03-12 20:21:12,622 INFO db.IntegerSplitter: Split size: 555; Num splits: 4 from: 10086 to: 12306

2019-03-12 20:21:12,696 INFO mapreduce.JobSubmitter: number of splits:4

2019-03-12 20:21:13,137 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552315366846_0011

2019-03-12 20:21:13,140 INFO mapreduce.JobSubmitter: Executing with tokens: []

2019-03-12 20:21:13,443 INFO conf.Configuration: resource-types.xml not found

2019-03-12 20:21:13,443 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2019-03-12 20:21:13,533 INFO impl.YarnClientImpl: Submitted application application_1552315366846_0011

2019-03-12 20:21:13,593 INFO mapreduce.Job: The url to track the job: http://c1:8088/proxy/application_1552315366846_0011/

2019-03-12 20:21:13,594 INFO mapreduce.Job: Running job: job_1552315366846_0011

2019-03-12 20:21:20,705 INFO mapreduce.Job: Job job_1552315366846_0011 running in uber mode : false

2019-03-12 20:21:20,727 INFO mapreduce.Job: map 0% reduce 0%

2019-03-12 20:21:29,927 INFO mapreduce.Job: map 50% reduce 0%

2019-03-12 20:21:29,930 INFO mapreduce.Job: Task Id : attempt_1552315366846_0011_m_000000_0, Status : FAILED

[2019-03-12 20:21:28.236]Container [pid=19941,containerID=container_e15_1552315366846_0011_01_000002] is running 539445760B beyond the 'VIRTUAL' memory limit. Current usage: 199.9 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_e15_1552315366846_0011_01_000002 :

|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

|- 20026 19941 19941 19941 (java) 950 81 2678403072 50861 /opt/jdk1.8.0_201/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx820m -Djava.io.tmpdir=/home/work/_data/hadoop-3.1.2/nm-local-dir/usercache/root/appcache/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.11.32 41274 attempt_1552315366846_0011_m_000000_0 16492674416642

|- 19941 19939 19941 19941 (bash) 1 2 115900416 307 /bin/bash -c /opt/jdk1.8.0_201/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx820m -Djava.io.tmpdir=/home/work/_data/hadoop-3.1.2/nm-local-dir/usercache/root/appcache/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.11.32 41274 attempt_1552315366846_0011_m_000000_0 16492674416642 1>/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/stdout 2>/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/stderr

[2019-03-12 20:21:28.324]Container killed on request. Exit code is 143

[2019-03-12 20:21:28.335]Container exited with a non-zero exit code 143.

2019-03-12 20:21:30,978 INFO mapreduce.Job: map 75% reduce 0%

2019-03-12 20:21:37,021 INFO mapreduce.Job: map 100% reduce 0%

2019-03-12 20:21:37,032 INFO mapreduce.Job: Job job_1552315366846_0011 completed successfully

2019-03-12 20:21:37,145 INFO mapreduce.Job: Counters: 33

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=915840

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=425

HDFS: Number of bytes written=49

HDFS: Number of read operations=24

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters

Failed map tasks=1

Launched map tasks=5

Other local map tasks=5

Total time spent by all maps in occupied slots (ms)=31981

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=31981

Total vcore-milliseconds taken by all map tasks=31981

Total megabyte-milliseconds taken by all map tasks=32748544

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=425

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=318

CPU time spent (ms)=6520

Physical memory (bytes) snapshot=815542272

Virtual memory (bytes) snapshot=11174408192

Total committed heap usage (bytes)=437780480

Peak Map Physical memory (bytes)=206934016

Peak Map Virtual memory (bytes)=2795565056

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=49

2019-03-12 20:21:37,154 INFO mapreduce.ImportJobBase: Transferred 49 bytes in 27.4776 seconds (1.7833 bytes/sec)

2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Retrieved 2 records.

2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table hive2mysql_mshk

2019-03-12 20:21:37,188 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-12 20:21:37,208 INFO hive.HiveImport: Loading uploaded data into Hive

2019-03-12 20:21:37,220 INFO conf.HiveConf: Found configuration file file:/home/work/_app/hive-2.3.4/conf/hive-site.xml

2019-03-12 20:21:49,491 INFO hive.HiveImport:

2019-03-12 20:21:49,492 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true

2019-03-12 20:21:56,558 INFO hive.HiveImport: OK

2019-03-12 20:21:56,561 INFO hive.HiveImport: Time taken: 5.954 seconds

2019-03-12 20:21:57,005 INFO hive.HiveImport: Loading data to table testmshk.mysql2hive_mshk

2019-03-12 20:21:58,181 INFO hive.HiveImport: OK

2019-03-12 20:21:58,181 INFO hive.HiveImport: Time taken: 1.619 seconds

2019-03-12 20:21:58,681 INFO hive.HiveImport: Hive import complete.

????

??最后，让我们验证 Hive 中的输出：

[[email protected] ~]# hive

Logging initialized using configuration in jar:file:/home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> show databases;

default

testmshk

Time taken: 3.512 seconds, Fetched: 2 row(s)

hive> use testmshk;

Time taken: 0.032 seconds

hive> show tables;

mysql2hive_mshk

testtable

Time taken: 0.083 seconds, Fetched: 2 row(s)

hive> select * from mysql2hive_mshk;

10086 my.mshk.top you.mshk.top

12306 mname yname

Time taken: 1.634 seconds, Fetched: 2 row(s)

hive> quit;

??同时我们在 HDFS 中也可以看到创建的数据：

4.3、Sqoop 通过 MySql 导入数据到 Hbase

??接下来我们将 MySql 中的表 hive2mysql_mshk 数据，导入到 Hbase ，同时在 Hbase 中创建表 mysql2hase_mshk

[[email protected] ~]# sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --hbase-table mysql2hase_mshk --hbase-create-table --hbase-row-key id --column-family id

2019-03-13 12:04:33,647 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-13 12:04:33,694 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-13 12:04:33,841 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

2019-03-13 12:04:33,841 INFO tool.CodeGenTool: Beginning code generation

2019-03-13 12:04:34,162 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-13 12:04:34,197 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

...

2019-03-13 12:05:13,782 INFO mapreduce.Job: map 75% reduce 0%

2019-03-13 12:05:15,813 INFO mapreduce.Job: map 100% reduce 0%

2019-03-13 12:05:16,827 INFO mapreduce.Job: Job job_1552397454797_0002 completed successfully

2019-03-13 12:05:16,942 INFO mapreduce.Job: Counters: 33

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=1041632

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=425

HDFS: Number of bytes written=0

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Failed map tasks=5

Launched map tasks=9

Other local map tasks=9

Total time spent by all maps in occupied slots (ms)=68882

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=68882

Total vcore-milliseconds taken by all map tasks=68882

Total megabyte-milliseconds taken by all map tasks=70535168

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=425

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=801

CPU time spent (ms)=15480

Physical memory (bytes) snapshot=1097326592

Virtual memory (bytes) snapshot=11271196672

Total committed heap usage (bytes)=629669888

Peak Map Physical memory (bytes)=295751680

Peak Map Virtual memory (bytes)=2828283904

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

2019-03-13 12:05:16,949 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 38.224 seconds (0 bytes/sec)

2019-03-13 12:05:16,954 INFO mapreduce.ImportJobBase: Retrieved 2 records.

??这时，我们再用 shell 测试连接 Hbase ,查看我们刚刚导入的数据，能够看到 mysql2hase_mshk 已经存在，并且可以获取其中的数据

[[email protected] ~]# hbase shell

HBase Shell

Use "help" to get list of supported commands.

Use "exit" to quit this interactive shell.

Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018

hbase(main):001:0> list

TABLE

mysql2hase_mshk

1 row(s) in 0.1870 seconds

=> ["mysql2hase_mshk"]

hbase(main):002:0> scan 'mysql2hase_mshk'

ROW COLUMN+CELL

10086 column=id:namea, timestamp=1552449912494, value=my.mshk.top

10086 column=id:nameb, timestamp=1552449912494, value=you.mshk.top

12306 column=id:namea, timestamp=1552449906986, value=mname

12306 column=id:nameb, timestamp=1552449906986, value=yname

2 row(s) in 0.1330 seconds

hbase(main):003:0> hbase(main):003:0> get 'mysql2hase_mshk','10086'

COLUMN CELL

id:namea timestamp=1552449912494, value=my.mshk.top

id:nameb timestamp=1552449912494, value=you.mshk.top

1 row(s) in 0.0230 seconds

hbase(main):004:0>

??如何在 Hbase 和 Hive 中互相导入、导出数据，请参考文章：Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建中的 9.2.4 和 9.2.5 章节。

??希望您发现它很有用，感谢您的支持和阅读我的博客。

博文作者：迦壹

博客地址：通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出

假设您认为这篇文章对您有帮助，可以通过以下方式进行捐赠，谢谢！

比特币地址：1KdgydfKMcFVpicj5w4vyn3T88dwjBst6Y

以太坊地址：0xbB0a92d634D7b9Ac69079ed0e521CC2e0a97c420

原文：https://www.cnblogs.com/lion.net/p/10522311.html

MySQL中hase与tree的区别_通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出...

相关文章

oracle压缩和mysql压缩_Oracle （11gR2）中的表压缩

计算机桌面文件自动备份取消,W7电脑系统电脑如何设置自动备份文件

LeetCode 589、N 叉树的前序遍历

android初体验——HelloWord

计算机应用基础第5次作业,华中大本科段-计算机应用基础第5次作业

电信联通五方面再“联手” 合并只是时间问题

java 获取pinblock_PinBlock 算法使用ANSI x9.8计算 | 学步园

LeetCode 496、下一个更大元素 I