AutoDL

AutoDL ์†Œ๊ฐœ

AutoDL์€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ž๋™ํ™”ํ•˜์—ฌ ์ตœ์ ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์„ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐ ์†Œ์š”๋˜๋Š” ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•ด ์ค๋‹ˆ๋‹ค. ํƒ์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ•œ ๋ˆˆ์— ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ ๋ณ„ ์„ฑ๋Šฅ์„ ์‹œ๊ฐํ™”ํ•œ ๊ทธ๋ž˜ํ”„๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ถ„์„๊ฐ€๊ฐ€ ๊ฑฐ์ณ์•ผ ํ•˜๋Š” ๋‹จ๊ณ„๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋„์ปค ์ด๋ฏธ์ง€ ์ž๋™ ์ƒ์„ฑ ๋ฐ trialTemplate ์ž๋™ ์™„์„ฑ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ, ๋ถ„์„ ๋ชฉ์ ์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. AutoDL์—์„œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋ธ ๊ด€๋ฆฌ๋ถ€์—์„œ ๋„์ปค ์ด๋ฏธ์ง€์™€ trialTemplate์„ ์ƒ์„ฑํ•œ ๋’ค AutoDL experiment๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ๊ด€๋ฆฌ๋ถ€

์ด๋ฏธ์ง€

autodl

  1. ์ด๋ฏธ์ง€ ๋ชฉ๋ก
ํ•„๋“œ์„ค๋ช…
์ด๋ฏธ์ง€๋ช…์ด๋ฏธ์ง€๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” ์ด๋ฆ„
์ƒํƒœBUILDING, UPLOADING, COMPLETE, FAIL
์›Œํฌ์ŠคํŽ˜์ด์Šค๋ช…Python ๋ชจ๋ธ ํŒŒ์ผ์ด ์œ„์น˜ํ•œ ์›Œํฌ์ŠคํŽ˜์ด์Šค
์„ค๋ช…์ด๋ฏธ์ง€ ์„ค๋ช…
๋“ฑ๋ก์ผ์ด๋ฏธ์ง€ ๋นŒ๋“œ ์ผ์‹œ
Owner์ƒ์„ฑ์ž ์ •๋ณด ํ‘œ๊ธฐ
Action์ˆ˜์ •: ์ด๋ฏธ์ง€ ์ƒ์„ธ ์‚ฌํ•ญ ํ™•์ธ ๋ฐ ์„ค๋ช… ์ˆ˜์ •
์‚ญ์ œ: ์ด๋ฏธ์ง€ ์‚ญ์ œ
  1. ์ด๋ฏธ์ง€ ์ถ”๊ฐ€
  • ์‚ฌ์šฉ์ž๊ฐ€ ์„ ํƒํ•œ ํŒŒ์ด์ฌ ๋ชจ๋ธ ํŒŒ์ผ(.py)๊ณผ filestorage ํด๋”๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋„์ปค ์ด๋ฏธ์ง€๋ฅผ ์ž๋™ ์ƒ์„ฑ
  • ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ: Tensorflow1, Tensorflow2, Keras, Pytorch, MXNet
    MXNet์„ ์ œ์™ธํ•œ ๋ชจ๋“  ํ”„๋ ˆ์ž„์›Œํฌ์˜ GPU ๋ฒ„์ „์„ ์ง€์›ํ•จ
  • ์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ home ๋””๋ ‰ํ† ๋ฆฌ์—์„œ Python ๋ชจ๋ธ ํŒŒ์ผ์„ ์„ ํƒ
  • '๋นŒ๋“œ' ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด ๋„์ปค ์ด๋ฏธ์ง€๊ฐ€ ์ƒ์„ฑ๋จ

trialTemplate

autodl

  • '์ด๋ฏธ์ง€' ํƒญ์—์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ trialTemplate์ด ์ƒ์„ฑ๋˜์–ด ์žˆ์Œ
  • worker ๊ฐœ์ˆ˜ ๋ฐ cpu, memory, gpu ๋ฆฌ์†Œ์Šค ์˜ต์…˜์„ ํŽธ๋ฆฌํ•˜๊ฒŒ ์„ค์ • ๊ฐ€๋Šฅ

autodl

  • 'ํ…œํ”Œ๋ฆฟ ์ถ”๊ฐ€' ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ trialTemplate์„ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅ
  • Job, TFJob, PyTorchJob ์ค‘์—์„œ ์„ ํƒ์‹œ ํ•ด๋‹น Job์˜ trialTemplate ๊ธฐ๋ณธํ˜•์ด ์ œ๊ณต๋จ

AutoDL

AutoDL experiment ๋ชฉ๋ก

autodl

ํ•„๋“œ์„ค๋ช…
์ด๋ฆ„AutoDL Experiment๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” ์ œ๋ชฉ
์ƒํƒœCREATED, RUNNING, SUCCEEDED, FAIL, UNKNOWN
๋“ฑ๋ก์ผAutoDL Experiment ์ƒ์„ฑ ์ผ์‹œ
Owner์ƒ์„ฑ์ž ์ •๋ณด ํ‘œ๊ธฐ
Action์‚ญ์ œ: Experiment ์‚ญ์ œ
YAML ํŒŒ์ผ ๋ณด๊ธฐ: Experiment๋ฅผ ์ƒ์„ฑํ•œ YAML ํŒŒ์ผ ํ™•์ธ ๋ฐ ํด๋ฆฝ๋ณด๋“œ๋กœ ๋ณต์‚ฌ

AutoDL experiment ์ƒ์„ฑ

autodl

  1. YAML file ์ •๋ณด ์ž…๋ ฅ
  • 1์—์„œ ์„ค์ •ํ•œ ์‚ฌํ•ญ์ด 2์˜ code mirror์— ๋ฐ˜์˜๋จ
  • '์ดˆ๊ธฐํ™”' ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ์„ค์ • ์‚ฌํ•ญ์„ ์ดˆ๊ธฐํ™”ํ•  ์ˆ˜ ์žˆ์Œ
  • Metadata
    • Name: AutoDL Experiment๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” ์ œ๋ชฉ
    • Namespace: ์‹คํ—˜์„ ์‹คํ–‰ํ•  namespace ๋ช…. ๊ณ ์ •๋œ ๊ฐ’
  • Trial Spec
    • trial pod ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ช…์„ธ์„œ
    • ๋ชจ๋ธ ๊ด€๋ฆฌ๋ถ€์˜ trialTemplate ํƒญ์—์„œ ์ƒ์„ฑํ•œ trialTemplate ์ค‘์—์„œ ์„ ํƒ ๊ฐ€๋Šฅ
  • Metrics Collector
    • ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ํ˜•์‹์„ ์ง€์ •
      • StdOut: metricsFormat์—์„œ ์ •์˜ํ•œ ์ •๊ทœ ํ‘œํ˜„์‹์— ๋งž์ถ”์–ด ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘
        metricsFormat์„ ์„ค์ •ํ•ด์•ผ ํ•จ. Default: ([\w|-]+)\s=\s((-?\d+)(.\d+)?)
      • TensorflowEvent: tf.summary๋กœ ๊ธฐ๋ก๋œ ๋กœ๊ทธ์—์„œ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘ (tensorflow2 ๋ฏธ์ง€์›)
        ๋กœ๊ทธ ๊ฒฝ๋กœ๋ฅผ ์„ค์ •ํ•ด์•ผ ํ•จ
      • File: ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ธฐ๋กํ•œ ๋กœ๊ทธ ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ  ํ•ด๋‹น ํŒŒ์ผ์—์„œ metricsFormat์— ๋งž์ถ”์–ด ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘
        ๋กœ๊ทธ ๊ฒฝ๋กœ์™€ metricsFormat์„ ์„ค์ •ํ•ด์•ผ ํ•จ. Default: ([\w|-]+)\s=\s((-?\d+)(.\d+)?)
  • Algorithm Name
  • Common Parameters
    • AutoDL์—์„œ trial์€ ํ•˜๋‚˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์„ ์ ์šฉํ•ด์„œ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•จ
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ParallelTrialCount=3์ด๊ณ  MaxTrialCount=12์ด๋ฉด 3๊ฐœ์˜ trial์„ ๋™์‹œ์— ์‹คํ–‰ํ•˜๋Š” ๊ณผ์ •์„ 4๋ฒˆ ๋ฐ˜๋ณตํ•˜์—ฌ 12๊ฐœ์˜ trial์„ ์‹คํ–‰
      • ParallelTrialCount: ๋™์‹œ์— ์‹คํ–‰ํ•  trial์˜ ๊ฐœ์ˆ˜
      • MaxTrialCount: ์‹คํ–‰ํ•  ์ „์ฒด trial์˜ ๊ฐœ์ˆ˜. ๋„๋‹ฌํ•˜๋ฉด ์‹คํ—˜์ด ์ข…๋ฃŒ๋จ
      • MaxFailedTrialCount: ์ตœ๋Œ€ trial ์‹คํŒจ ๊ฐœ์ˆ˜. ๋„๋‹ฌํ•˜๋ฉด ์‹คํ—˜์ด ์ข…๋ฃŒ๋จ
  • Parameters
    • ํƒ์ƒ‰ํ•  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๊ทธ ๋ฒ”์œ„๋ฅผ ์„ค์ •
      • Name: ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๋ช…
      • Parameter Type: ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์˜ ํƒ€์ž…์„ double, int, categorical ์ค‘์—์„œ ์„ ํƒ
      • Type: ๋ฒ”์œ„์˜ ํƒ€์ž…์„ FeasibleSpace, list ์ค‘์—์„œ ์„ ํƒ
      • Range: Type์ด FeasibleSpace์ด๋ฉด Min๊ณผ Max๋กœ, Type์ด list์ด๋ฉด ๋ฒ”์ฃผํ˜•์œผ๋กœ ํƒ์ƒ‰ ๋ฒ”์œ„๋ฅผ ์„ค์ •
  • Objective
    • ํ‰๊ฐ€ ์ง€ํ‘œ์™€ ๋ชฉํ‘œ๊ฐ’์„ ์„ค์ •
      • Type: Maximize์™€ Minimize ์ค‘์—์„œ ์„ ํƒ
      • Goal: ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ๋ชฉํ‘œ๊ฐ’
      • ObjectiveMetricName: ํ‰๊ฐ€ ์ง€ํ‘œ๋ช…
  1. YAML file ์ฝ๊ธฐ
  • ํŽธ์ง‘: yaml ํŒŒ์ผ์„ ์ˆ˜์ •
  • ์—…๋กœ๋“œ: ์‚ฌ์šฉ์ž๊ฐ€ ๋กœ์ปฌ์—์„œ ๋ฏธ๋ฆฌ ์ž‘์„ฑํ•ด ๋†“์€ yaml ํŒŒ์ผ์„ ์—…๋กœ๋“œ
  • ์ƒ์„ฑ: ์‹คํ—˜์„ ์ƒ์„ฑ ๋ฐ ์‹คํ–‰

๊ฒฐ๊ณผ ์ƒ์„ธ

autodl

  • ObjectiveMetricName์—์„œ ์„ค์ •ํ•œ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ๋ณ„ ๊ฒฐ๊ณผ ํ‘œ์™€ ๊ทธ๋ž˜ํ”„ ์ œ๊ณต

autodl

  • VIEW EXPERIMENT์—์„œ AutoDL experiment๋ฅผ ์‹คํ–‰ํ•œ yaml ํŒŒ์ผ๊ณผ ์ตœ์  ์กฐํ•ฉ์„ ํ™•์ธ

์ฃผ์˜์‚ฌํ•ญ

์ฝ”๋“œ ์ž‘์„ฑ์‹œ ์œ ์˜ ์‚ฌํ•ญ

  • Argument parser๋ฅผ ํ†ตํ•ด ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •

    def build_parser():
    parser = ArgumentParser()
    ######### ์ €์žฅํ•  ํด๋” ์„ค์ • #########
    parser.add_argument("--save_path", dest="save_path", default="filestorage/result/I_event_model", type=str)
    parser.add_argument("--dir_name", dest="dir_name", default=None, type=str)
    ##LOG option
    parser.add_argument("--log_name", dest="log_name", default="train_event_log.txt", type=str)
    ##Common option
    parser.add_argument("--device", dest="device", default="gpu", help='gpu, cpu, cuda', type=str)
    ##Loader option
    parser.add_argument("--label_path", dest="label_path", default='filestorage/result/F_label/future_class_hynix_1d.csv', type=str)
    parser.add_argument("--test_portion", dest="test_portion", default=0.19, type=float)
    parser.add_argument("--window_size", dest="window_size", default=3, type=int)
    parser.add_argument("--max_doc_length", dest="max_doc_length", default=50, type=int)
    parser.add_argument("--max_sent_length", dest="max_sent_length", default=3, type=int)
    parser.add_argument("--num_worker", dest="num_worker", default=4, type=int)
    ##Model option
    parser.add_argument("--hidden_size", dest="hidden_size", default=16, type=int)
    parser.add_argument("--n_layers", dest="n_layers", default=2, type=int)
    parser.add_argument("--dropout_p", dest="dropout_p", default=0.01, type=float)
    parser.add_argument("--num_class", dest="num_class", default=2, type=int)
    ##Train option
    parser.add_argument("--n_epochs", dest="n_epochs", default=5, type=int)
    parser.add_argument("--lr", dest="lr", default=0.001, type=float)
    parser.add_argument("--early_stop", dest="early_stop", default=70, type=int)
    parser.add_argument("--batch_size", dest="batch_size", default=128, type=int)
    config = parser.parse_args()
    return config
  • AutoDL์—์„œ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋„๋ก Metrics Collector์˜ metricsFormat์— ๋งž์ถ”์–ด ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์ถœ๋ ฅ

    print("lastEpoch {} f1-score={:.4f} accuracy={:.4f}".format(str(history["last_epoch"]), f1_score, accuracy))
  • ๋ฉ”์ธ ๋ชจ๋ธ ํŒŒ์ผ ์™ธ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ฐ ์ปค์Šคํ…€ ๋ชจ๋“ˆ ๋“ฑ์„ filestorage ํด๋”์— ์œ„์น˜์‹œํ‚ด
    filestorage ํด๋” ๋ช…์„ ์ˆ˜์ •ํ•˜๊ฑฐ๋‚˜ ์‚ญ์ œํ•˜๋ฉด ์•ˆ ๋จ

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜

  • Grid Search

    • Int, double type์˜ parameter๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, step(๊ฐ„๊ฒฉ)์„ ์„ค์ •ํ•ด์•ผ ํ•จ
  • Hyperband

    • resource_name, eta, r_l๋ฅผ ์„ค์ •ํ•ด์•ผ ํ•จ
    • resource_name์œผ๋กœ ์„ค์ •ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” Parameters์— ์ถ”๊ฐ€๋˜์–ด ์žˆ์–ด์•ผ ํ•จ
    • eta, r_l์— ๋”ฐ๋ผ ParallelTrialCount์˜ ์ตœ์†Ÿ๊ฐ’์ด ๊ฒฐ์ • ๋จ
      smax=int(mat.log(r_l)/math.log(eta))
      max_parallel=int(math.ceil(eta**smax))
    • ParallelTrialCount์™€ MaxTrialCount๋Š” ๋™์ผํ•œ ๊ฐ’์„ ๊ฐ€์ ธ์•ผ ํ•จ
    • ์˜ˆ๋ฅผ ๋“ค์–ด, resource_name๋Š” --epoch์ด๊ณ  eta=3, r_l=9์ธ ๊ฒฝ์šฐ,
      eta์™€ r_l์— ์˜ํ•ด ParallelTrialCount์˜ ์ตœ์†Ÿ๊ฐ’์€ 9์ž„
      ParallelTrialCount๋ฅผ 9๋กœ, epoch์„ 27๋กœ ์„ค์ •ํ•œ๋‹ค๋ฉด 9๊ฐœ์˜ trial์ด ๋™์‹œ์— 3 epoch์”ฉ ์‹คํ–‰๋จ