customCode

Definition

PySpark์˜ DataFrame ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ •์˜ ์ฝ”๋“œ๋ฅผ ์ž…๋ ฅ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ขŒ์ธก [๋ฐ์ดํ„ฐ์ฒ˜๋ฆฌ(๊ณ ๊ธ‰)]๋…ธ๋“œ ์ค‘ [customCode]๋…ธ๋“œ๋ฅผ drag & drop ํ•ฉ๋‹ˆ๋‹ค. Property ํŒจ๋„์˜ [๋”๋ณด๊ธฐ+] ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด ์ž…๋ ฅ๊ฐ€๋Šฅํ•œ ์ „์ฒด Property ํ•ญ๋ชฉ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Set

[setting], [parameter] ์„ค์ •์€ [์›Œํฌํ”Œ๋กœ์šฐ ์ƒ์„ฑ] > [์„ค์ •]์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

property

[Node Description] ์ž‘์„ฑ ์ค‘์ธ ๋…ธ๋“œ๋ช… ์ž…๋ ฅ
customCode

  1. code : Dataframe์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜๊ฐ€ ํฌํ•จ๋œ ์ฝ”๋“œ(filter(), drop(), limit() ๋“ฑ) ์ž‘์„ฑ
  2. variableName : ๋ณ€์ˆ˜๋ช… ์ž…๋ ฅ
  3. variableType : ๋ณ€์ˆ˜ํƒ€์ž… ์ž…๋ ฅ (spark DF, pandas DF, RDD)
  4. overwriteSchema : ์‹คํ–‰๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์Šคํ‚ค๋งˆ๋ฅผ ์žฌ์ •์˜(์ฒดํฌํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์ด์ „ ์Šคํ‚ค๋งˆ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•จ)
  5. newSchema

Note

dataset ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ (1์ค„ ์ž…๋ ฅ๋งŒ ๊ฐ€๋Šฅ)

  • ex. withColumn('temp_filled_spark', filled_column)

Example

HDFS ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€ PySpark์˜ limit() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ 3๊ฐœ ๋ฐ์ดํ„ฐ๋งŒ ํ‘œ์‹œํ•˜๋Š” ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  1. [HDFS๋ถˆ๋Ÿฌ์˜ค๊ธฐ], [customCode] ๋…ธ๋“œ๋ฅผ Designer์— Drag & Dropํ•˜์—ฌ ์›Œํฌํ”Œ๋กœ์šฐ ์ƒ์„ฑ
    customCode
  2. [customCode]๋…ธ๋“œ์— ์•„๋ž˜์™€ ๊ฐ™์ด ์ž…๋ ฅ ํ›„ snapshot ํด๋ฆญ
    customCode