fork

Definition

์ž‘์—… flow๋ฅผ ๋ถ„๊ธฐํ•˜์—ฌ ๋…ธ๋“œ๋“ค์„ ๋…๋ฆฝ๋œ flow์—์„œ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
์ขŒ์ธก [Flow๊ตฌ์„ฑ]๋…ธ๋“œ ์ค‘ [fork]๋…ธ๋“œ๋ฅผ drag & drop ํ•œ ํ›„ Property ํ•ญ๋ชฉ์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๋™์‹œ์— ์ˆ˜ํ–‰ํ•  ์ž‘์—…์„ ์ ์ ˆํžˆ ์„ ํƒํ•œ ํ›„ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ , ์ด๋ ‡๊ฒŒ ๋ถ„๊ธฐ๋œ ์ž‘์—…์€ join์œผ๋กœ ํ•ฉ์ณ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

fork-join์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” ์›Œํฌํ”Œ๋กœ์šฐ์˜ ํŠน์ง•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๋ถ„๊ธฐ๋œ ๋ชจ๋“  ๋…ธ๋“œ๊ฐ€ ์ข…๋ฃŒํ•ด์•ผ ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ succeed ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋ถ„๊ธฐ๋œ ๋…ธ๋“œ ์ค‘ ์ข…๋ฃŒ๋œ ์ž‘์—…์ด ์žˆ์–ด๋„ ์ „์ฒด ๋…ธ๋“œ๊ฐ€ ์ข…๋ฃŒํ•  ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐํ•ฉ๋‹ˆ๋‹ค.
  • ๋ถ„๊ธฐ ๋…ธ๋“œ ์ค‘ ํ•˜๋‚˜๋ผ๋„ ์‹คํŒจํ•  ๊ฒฝ์šฐ ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ ์‹คํŒจ ํ•ฉ๋‹ˆ๋‹ค.

Set

[setting], [scheduler], [parameter] ์„ค์ •์€ [์›Œํฌํ”Œ๋กœ์šฐ] > [์ƒ์„ฑ] > [๊ธฐ๋ณธ๊ตฌ์„ฑ]์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

property

[Node Description] ์ž‘์„ฑ ์ค‘์ธ ๋…ธ๋“œ๋ช… ์ž…๋ ฅ

Example

์„œ์šธํŠน๋ณ„์‹œ ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด(์ถœ์ฒ˜ : ๊ณต๊ณต๋ฐ์ดํ„ฐ ํฌํ„ธ, 2019๋…„ ๊ธฐ์ค€) ์ค‘ ๋ฏธ์„ธ๋จผ์ง€(PM10)์™€ ์ดˆ๋ฏธ์„ธ๋จผ์ง€(PM2.5)์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ„๋„ ํ…Œ์ด๋ธ”๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

fork01

[Note] ์„œ์šธํŠน๋ณ„์‹œ ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด ๋ฐ์ดํ„ฐ
์„œ์šธํŠน๋ณ„์‹œ ๋ณด๊ฑดํ™˜๊ฒฝ์—ฐ๊ตฌ์› ๋Œ€๊ธฐ์˜ค์—ผ์ธก์ •๋ง์‹œ์Šคํ…œ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด
์ถœ์ฒ˜ : https://www.data.go.kr (๊ณต๊ณต ๋ฐ์ดํ„ฐ ํฌํ„ธ)

  1. HDFS/Hive ๋ธŒ๋ผ์šฐ์ €๋ฅผ ํ†ตํ•ด ์›์‹œ๋ฐ์ดํ„ฐ ์ ์žฌ ๋ฐ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

    fork02

    CREATE EXTERNAL TABLE default.air_pollution(
    measurement_day STRING, -- ์ธก์ •์ผ์‹œ
    station_cd STRING, -- ์ธก์ •์†Œ ์ฝ”๋“œ
    metric_cd STRING, -- ์ธก์ •ํ•ญ๋ชฉ ์ฝ”๋“œ
    medium DOUBLE, -- ํ‰๊ท ๊ฐ’
    status INT, -- ์ธก์ •๊ธฐ ์ƒํƒœ
    national INT, -- ๊ตญ๊ฐ€ ๊ธฐ์ค€์ดˆ๊ณผ ๊ตฌ๋ถ„
    local_gov INT ) -- ์ง€์ž์ฒด ๊ธฐ์ค€์ดˆ๊ณผ ๊ตฌ๋ถ„
    COMMENT '์„œ์šธํŠน๋ณ„์‹œ ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด(์ถœ์ฒ˜ : ๊ณต๊ณต๋ฐ์ดํ„ฐ ํฌํ„ธ)'
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    STORED AS TEXTFILE
    LOCATION '/tmp/ManSample'
  2. [executeHive]๋…ธ๋“œ(๋…ธ๋“œ๋ช… : executeHive_sum)๋กœ ํ†ต๊ณ„์„ฑ ํ…Œ์ด๋ธ” ์ƒ์„ฑ(์ „์ฒด ๋ฐ์ดํ„ฐ๋Ÿ‰์„ ์ธก์ •์†Œ๋ณ„๋กœ ๋ถ„๋ฆฌ)

    fork03

  3. fork ๋…ธ๋“œ ์—ฐ๊ฒฐ ๋ฐ [executeHive] ๋…ธ๋“œ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์—ฐ๊ฒฐํ•˜์—ฌ ๋ฏธ์„ธ๋จผ์ง€(PM10), ์ดˆ๋ฏธ์„ธ๋จผ์ง€(PM25) ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•œ ๋ณ„๋„ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

    • [executeHive_pm10] create table air_pollution_pm10 as select * from air_pollution where air_pollution.metric_cd = '8'
    • [executeHive_pm25] create table air_pollution_pm25 as select * from air_pollution where air_pollution.metric_cd = '9'

    [Note] air_pollution ํ…Œ์ด๋ธ”์˜ ์ธก์ •ํ•ญ๋ชฉ ์ฝ”๋“œ(metric_cd)๋Š” ์•„๋ž˜์™€ ๊ฐ™์Œ

    ์ฝ”๋“œ์„ค๋ช…
    1SO2 (์•„ํ™ฉ์‚ฐ๊ฐ€์Šค)
    3NO2 (์ด์‚ฐํ™”์งˆ์†Œ)
    5CO (์ผ์‚ฐํ™”ํƒ„์†Œ)
    6O3 (์˜ค์กด)
    8PM10 (๋ฏธ์„ธ๋จผ์ง€:Particulate Matter)
    9PM2.5 (์ดˆ๋ฏธ์„ธ๋จผ์ง€)
  4. [fork]๋…ธ๋“œ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ [join]๋…ธ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ข…๋ฃŒ

  5. ์›Œํฌํ”Œ๋กœ์šฐ ์‹คํ–‰ ํ›„ ์ƒˆ๋กœ์šด ํ…Œ์ด๋ธ” ์ƒ์„ฑ ํ™•์ธ

    fork04

[Note] Control Node
[Flow๊ตฌ์„ฑ] ๋…ธ๋“œ ์ค‘ decision, fork, join Node๋ฅผ Control Node๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋ณด๋‹ค ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ Control ํ•˜๋Š” ์šฉ๋„๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.