sqoop

Definition

sqoop ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•œ๋‹ค. sql ์ฟผ๋ฆฌ๋‚˜ rdbms์—์„œ hdfs๋กœ ๋ฐ์ดํ„ฐ ์ด๋™์‹œ ์‚ฌ์šฉํ•œ๋‹ค. [Flow๊ตฌ์„ฑ]๋…ธ๋“œ ์ค‘ [sqoop]๋…ธ๋“œ๋ฅผ drag & drop ํ•œ ํ›„ Property ํ•ญ๋ชฉ์„ ์ž…๋ ฅํ•œ๋‹ค. Property ํŒจ๋„์˜ [๋”๋ณด๊ธฐ+] ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด ์ž…๋ ฅ๊ฐ€๋Šฅํ•œ ์ „์ฒด Property ํ•ญ๋ชฉ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

Set

[setting], [scheduler], [parameter] ์„ค์ •์€ [์›Œํฌํ”Œ๋กœ์šฐ ์ƒ์„ฑ] > [์„ค์ •]์„ ์ฐธ๊ณ ํ•œ๋‹ค.

property

[Node Description] ์ž‘์„ฑ ์ค‘์ธ ๋…ธ๋“œ๋ช… ์ž…๋ ฅ

flow022

  1. prepare : ๋…ธ๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ๋ฅผ HDFS์— ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด ํŒŒ์ผ์‚ญ์ œ/ํด๋”์ƒ์„ฑ ์ž‘์—… ์„  ์ˆ˜ํ–‰(์›Œํฌํ”Œ๋กœ์šฐ ๋ฐ˜๋ณต์‹คํ–‰ ์‹œ ํ™œ์šฉ๊ฐ€๋Šฅ)
    • delete : ๋…ธ๋“œ ์‹คํ–‰ ์ „ ์‚ญ์ œํ•  ํด๋”/ํŒŒ์ผ ๊ฒฝ๋กœ
    • mkdir : ๋…ธ๋“œ ์‹คํ–‰ ์ „ ์ƒ์„ฑํ•  ํด๋” ๊ฒฝ๋กœ
  2. file : ๋…ธ๋“œ์—์„œ ์‚ฌ์šฉํ•  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ฒฝ๋กœ๋ฅผ ์„ค์ •
  3. archive : archive๊ฒฝ๋กœ๋ฅผ ์„ค์ •
  4. retry
    • max : ์‹คํ–‰ ์‹คํŒจ ์‹œ ์žฌ์‹œ๋„ ํšŸ์ˆ˜
    • period : ์žฌ์‹œ๋„ ์ฃผ๊ธฐ(๋ถ„ ๋‹จ์œ„)
  5. configuration : ์‹คํ–‰์— ์‚ฌ์šฉํ•  ์†์„ฑ (key, value) ์ž…๋ ฅ
  6. argument : ์‹คํ–‰ํ•  ์ปค๋ฉ˜๋“œ(EL expressions)
  7. jobXml : jobXml ๊ฒฝ๋กœ ์ž…๋ ฅ. ์žก์— ์ „๋‹ฌํ•  ํ”„๋กœํผํ‹ฐ๋ฅผ ๋ณ„๋„ xml ๋กœ ์ž‘์„ฑํ•˜์—ฌ ์ „๋‹ฌ๊ฐ€๋Šฅ
  8. forceOK : ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ฐ€ ์‹คํŒจํ•ด๋„ ์ •์ƒ์œผ๋กœ ํ‘œ์‹œํ•˜๊ณ  ์ข…๋ฃŒ

Example

RDBMS์— ์œ„์น˜ํ•œ ํ…Œ์ด๋ธ”์„ hdfs๋กœ ์ด๋™ํ•˜๊ธฐ ์œ„ํ•œ ์˜ˆ์ œ๋กœ sqoop import syntax ์— ๋งž๊ฒŒ argement๋ฅผ ์„ค์ •ํ•œ๋‹ค. sqoop import common argument๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Argument [value]Description
--connect [jdbc-uri]Specify JDBC connect string
--connection-manager [class-name]Specify connection manager class to use
--driver [class-name]Manually specify JDBC driver class to use
--hadoop-mapred-home [dir]Override $HADOOP_MAPRED_HOME
--helpPrint usage instructions
--password-fileSet path for a file containing the authentication password
-PRead password from console
--password [password]Set authentication password
--username [username]Set authentication username
--verbosePrint more information while working
--connection-param-file [filename]Optional properties file that provides connection parameters
--relaxed-isolationSet connection transaction isolation to read uncommitted for the mappers.

[Note] sqoop ๊ด€๋ จ ์ƒ์„ธ ๋‚ด์šฉ์€ ์•„๋ž˜ ๋งํฌ ์ฐธ๊ณ 
https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_syntax

  1. [Flow๊ตฌ์„ฑ] > [sqoop] ๋…ธ๋“œ๋ฅผ drag & drop ํ•œ ํ›„ setting ํŒจ๋„์—์„œ ์‹คํ–‰ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์„ ํƒํ•œ๋‹ค.

  2. "6.argement"์˜ ์ถ”๊ฐ€๋ฒ„ํŠผ ํด๋ฆญ ํ›„ sqoop import syntax์— ๋งž๊ฒŒ argment์ •๋ณด๋ฅผ ์ž…๋ ฅํ•œ๋‹ค. import keyword ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ argument, value๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ์ž…๋ ฅํ•œ๋‹ค.

    argument
    import
    --connect
    jdbc:mysql://10.178.XXX.XXX:302XX/modeler
    --username
    modeler
    --password
    XXXXXX
    --table
    tbl_user_grp_map
    --m
    1
    --target-dir
    /user/hive/warehouse/test

[Note] -m,--num-mappers [n] : option Use n map tasks to import in parallel

flow077