一、环境搭建
1.Hadoop
2.Sqoop2.x
3. mysql
二、从mysql导入hdfs
1.创建mysql数据库、表、以及测试数据
xxxxxxxx$ mysql -uroot -pEnter password: mysql> show databases;+--------------------+| Database |+--------------------+| information_schema || mysql || performance_schema || test |+--------------------+4 rows in set (0.00 sec)test => 是新建的数据库mysql> use test;mysql> show tables;+----------------------+| Tables_in_test |+----------------------+ || test |+----------------------+1 rows in set (0.00 sec)test => 是新增的表mysql> desc test;+-------+-------------+------+-----+---------+----------------+| Field | Type | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+----------------+| id | int(11) | NO | PRI | NULL | auto_increment || name | varchar(45) | YES | | NULL | || age | int(11) | YES | | NULL | |+-------+-------------+------+-----+---------+----------------+3 rows in set (0.00 sec)mysql> select * from test;+----+------+------+| id | name | age |+----+------+------+| 7 | a | 1 || 8 | b | 2 || 9 | c | 3 |+----+------+------+3 rows in set (0.00 sec)
2. 为各个用户授权
注意:sqoop提交job后,各个节点在map阶段会访问数据库,所以需事先授权
mysql> grant [all | select | ...] on {db}.{table} to {user}@{host} identified by {passwd};mysql> flush privileges;#我给特定的hostname授权 username:root passwd:root 访问db:test 中任意table,权限是allmysql> grant all on test.* to 'root'@{host} identified by 'root';
3.启动sqoop2-server
[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ pwd/home/hadoop/sqoop-1.99.6-bin-hadoop200[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-server start......webui可以访问校验,也可以查看log
4.启动sqoop2-shell
[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ pwd/home/hadoop/sqoop-1.99.6-bin-hadoop200[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-shell ......sqoop:000> show version......sqoop:000> show connector+----+------------------------+---------+------------------------------------------------------+----------------------+| Id | Name | Version | Class | Supported Directions |+----+------------------------+---------+------------------------------------------------------+----------------------+| 1 | generic-jdbc-connector | 1.99.6 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO || 2 | kite-connector | 1.99.6 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO || 3 | hdfs-connector | 1.99.6 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO || 4 | kafka-connector | 1.99.6 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |+----+------------------------+---------+------------------------------------------------------+----------------------+根据你的connector创建connectorsqoop:000> create link -c 1 => 先创建jdbc会填写name、jdbc-driver、url、username、passwd等等sqoop:000> create link -c 3 => 创建hdfs会填写name、hdfs url、等等sqoop:000> show link+----+-------------+--------------+------------------------+---------+| Id | Name | Connector Id | Connector Name | Enabled |+----+-------------+--------------+------------------------+---------+| 3 | 10-21_jdbc1 | 1 | generic-jdbc-connector | true || 4 | 10-21_hdfs1 | 3 | hdfs-connector | true |+----+-------------+--------------+------------------------+---------+创建job -f=> from -t to 即从哪些导入到哪里sqoop:000> create job -f 3 -t 4会填写,相应的table信息。还有hdfs信息sqoop:000> show job +----+---------------+----------------+--------------+---------+| Id | Name | From Connector | To Connector | Enabled |+----+---------------+----------------+--------------+---------+| 1 | 10-20_sqoopy2 | 1 | 3 | true |+----+---------------+----------------+--------------+---------+#启动jobsqoop:000> start job -j 2.........可以再webui上访问到,查看进度,也可以使用sqoop:000> status job -j 2
sqoop的guide
5.troubleshooting
多看日志,慢慢的排查