fork

Definition

์ž‘์—… flow๋ฅผ ๋ถ„๊ธฐ์‹œ์ผœ ๋…ธ๋“œ๋“ค์„ ๋…๋ฆฝ๋œ flow์—์„œ ์ˆ˜ํ–‰ํ•œ๋‹ค.
์ขŒ์ธก [Flow๊ตฌ์„ฑ]๋…ธ๋“œ ์ค‘ [fork]๋…ธ๋“œ๋ฅผ drag & drop ํ•œ๋‹ค. ๋™์‹œ์— ์ˆ˜ํ–‰ํ•  ์ž‘์—…์„ ์ ์ ˆํžˆ ์„ ํƒํ•œ ํ›„ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ๋ถ„๊ธฐ๋œ ์ž‘์—…์€ join์œผ๋กœ ํ•ฉ์ณ์•ผ ํ•œ๋‹ค.

fork-join์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” ์›Œํฌํ”Œ๋กœ์šฐ์˜ ํŠน์ง•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ๋ถ„๊ธฐ๋œ ๋ชจ๋“  ๋…ธ๋“œ๊ฐ€ ์ข…๋ฃŒํ•ด์•ผ ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ succeed ๋จ(์ฆ‰, ๋ถ„๊ธฐ๋œ ๋…ธ๋“œ ์ค‘ ์ข…๋ฃŒ๋œ ์ž‘์—…์ด ์žˆ์–ด๋„ ์ „์ฒด ๋…ธ๋“œ๊ฐ€ ์ข…๋ฃŒํ•  ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐ)
  • ๋ถ„๊ธฐ ๋…ธ๋“œ ์ค‘ ํ•˜๋‚˜๋ผ๋„ ์‹คํŒจํ•  ๊ฒฝ์šฐ ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ ์‹คํŒจ

Set

[setting], [scheduler], [parameter] ์„ค์ •์€ [์›Œํฌํ”Œ๋กœ์šฐ ์ƒ์„ฑ] > [์„ค์ •]์„ ์ฐธ๊ณ ํ•œ๋‹ค.

property

[Node Description] ์ž‘์„ฑ ์ค‘์ธ ๋…ธ๋“œ๋ช… ์ž…๋ ฅ

Example

์„œ์šธํŠน๋ณ„์‹œ ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด(์ถœ์ฒ˜ : ๊ณต๊ณต๋ฐ์ดํ„ฐ ํฌํ„ธ, 2019๋…„ ๊ธฐ์ค€) ์ค‘ ๋ฏธ์„ธ๋จผ์ง€(PM10)์™€ ์ดˆ๋ฏธ์„ธ๋จผ์ง€(PM2.5)์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ„๋„ ํ…Œ์ด๋ธ”๋กœ ์ƒ์„ฑํ•œ๋‹ค.

flow037

[Note] ์„œ์šธํŠน๋ณ„์‹œ ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด ๋ฐ์ดํ„ฐ
์„œ์šธํŠน๋ณ„์‹œ ๋ณด๊ฑดํ™˜๊ฒฝ์—ฐ๊ตฌ์› ๋Œ€๊ธฐ์˜ค์—ผ์ธก์ •๋ง์‹œ์Šคํ…œ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด
์ถœ์ฒ˜ : https://www.data.go.kr (๊ณต๊ณต ๋ฐ์ดํ„ฐ ํฌํ„ธ)

  1. HDFS/Hive ๋ธŒ๋ผ์šฐ์ €๋ฅผ ํ†ตํ•ด ์›์‹œ๋ฐ์ดํ„ฐ ์ ์žฌ ๋ฐ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

    flow038

CREATE EXTERNAL TABLE default.air_pollution(
measurement_day STRING, -- ์ธก์ •์ผ์‹œ
station_cd STRING, -- ์ธก์ •์†Œ ์ฝ”๋“œ
metric_cd STRING, -- ์ธก์ •ํ•ญ๋ชฉ ์ฝ”๋“œ
medium DOUBLE, -- ํ‰๊ท ๊ฐ’
status INT, -- ์ธก์ •๊ธฐ ์ƒํƒœ
national INT, -- ๊ตญ๊ฐ€ ๊ธฐ์ค€์ดˆ๊ณผ ๊ตฌ๋ถ„
local_gov INT ) -- ์ง€์ž์ฒด ๊ธฐ์ค€์ดˆ๊ณผ ๊ตฌ๋ถ„
COMMENT '์„œ์šธํŠน๋ณ„์‹œ ๋Œ€๊ธฐ์˜ค์—ผ ์ธก์ •์ •๋ณด(์ถœ์ฒ˜ : ๊ณต๊ณต๋ฐ์ดํ„ฐ ํฌํ„ธ)'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/tmp/ManSample'
  1. [executeHive]๋…ธ๋“œ(๋…ธ๋“œ๋ช… : executeHive_sum)๋กœ ํ†ต๊ณ„์„ฑ ํ…Œ์ด๋ธ” ์ƒ์„ฑ(์ „์ฒด ๋ฐ์ดํ„ฐ๋Ÿ‰์„ ์ธก์ •์†Œ๋ณ„๋กœ ๋ถ„๋ฆฌ)

    flow039

  2. fork ๋…ธ๋“œ ์—ฐ๊ฒฐ ๋ฐ [executeHive] ๋…ธ๋“œ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์—ฐ๊ฒฐํ•˜์—ฌ ๋ฏธ์„ธ๋จผ์ง€(PM10), ์ดˆ๋ฏธ์„ธ๋จผ์ง€(PM25) ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•œ ๋ณ„๋„ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

  • [executeHive_pm10] create table air_pollution_pm10 as select * from air_pollution where air_pollution.metric_cd = '8'
  • [executeHive_pm25] create table air_pollution_pm25 as select * from air_pollution where air_pollution.metric_cd = '9'

[Note] air_pollution ํ…Œ์ด๋ธ”์˜ ์ธก์ •ํ•ญ๋ชฉ ์ฝ”๋“œ(metric_cd)๋Š” ์•„๋ž˜์™€ ๊ฐ™์Œ

์ฝ”๋“œ์„ค๋ช…
1SO2 (์•„ํ™ฉ์‚ฐ๊ฐ€์Šค)
3NO2 (์ด์‚ฐํ™”์งˆ์†Œ)
5CO (์ผ์‚ฐํ™”ํƒ„์†Œ)
6O3 (์˜ค์กด)
8PM10 (๋ฏธ์„ธ๋จผ์ง€:Particulate Matter)
9PM2.5 (์ดˆ๋ฏธ์„ธ๋จผ์ง€)
  1. [fork]๋…ธ๋“œ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ [join]๋…ธ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ข…๋ฃŒํ•œ๋‹ค.

์›Œํฌํ”Œ๋กœ์šฐ ์‹คํ–‰ ํ›„ ์ƒˆ๋กœ์šด ํ…Œ์ด๋ธ”์ด ์ƒ์„ฑ๋๋‹ค.

flow040

[Note] Control Node
[Flow๊ตฌ์„ฑ] ๋…ธ๋“œ ์ค‘ decision, fork, join Node๋ฅผ Control Node๋ผ๊ณ  ํ•œ๋‹ค.
์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋ณด๋‹ค ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ Control ํ•˜๋Š” ์šฉ๋„๋กœ ์‚ฌ์šฉํ•œ๋‹ค.