Detail výsledku VaV

Originální název

Towards Efficient Scheduling of Transformer Neural Network Computation for Edge AI Deployment

Anglický název

Towards Efficient Scheduling of Transformer Neural Network Computation for Edge AI Deployment

Druh

Stať ve sborníku mimo WoS a Scopus

Originální abstrakt

Transformer neural networks have gained popularity in recent years, demonstrating remarkable performance across many application domains. However, inference on resource-constrained embedded hardware remains challenging due to Transformers' substantial computational demands. We aim to address this problem by focusing on exploiting the inherent parallelism opportunities presented by the multi-head self attention operations of Transformers, to achieve a speedup in processing on embedded hardware. In this paper, we present an evolutionary-based scheduling approach for distribution and allocation of Transformer operations across systolic array-based hardware accelerators used for execution. Our methodology takes as input specifications of the Transformer workload and the target systolic array architecture and explores the large mapping space to identify an efficient plan of operation-to-array assignments. The plans are evaluated against a hardware-aware cost model, capturing the cost of computational cycles for a given operation and systolic array, with the objective to minimize the total sum across all operations. Through extensive experimental evaluations across diverse systolic array dimensions, we demonstrate that our evolutionary-based scheduler surpasses conventional heuristics and is able to find plans offering up to 33.8% average reduction in overall cycle count.

Anglický abstrakt

Transformer neural networks have gained popularity in recent years, demonstrating remarkable performance across many application domains. However, inference on resource-constrained embedded hardware remains challenging due to Transformers' substantial computational demands. We aim to address this problem by focusing on exploiting the inherent parallelism opportunities presented by the multi-head self attention operations of Transformers, to achieve a speedup in processing on embedded hardware. In this paper, we present an evolutionary-based scheduling approach for distribution and allocation of Transformer operations across systolic array-based hardware accelerators used for execution. Our methodology takes as input specifications of the Transformer workload and the target systolic array architecture and explores the large mapping space to identify an efficient plan of operation-to-array assignments. The plans are evaluated against a hardware-aware cost model, capturing the cost of computational cycles for a given operation and systolic array, with the objective to minimize the total sum across all operations. Through extensive experimental evaluations across diverse systolic array dimensions, we demonstrate that our evolutionary-based scheduler surpasses conventional heuristics and is able to find plans offering up to 33.8% average reduction in overall cycle count.

Klíčová slova

transformer networks, edge AI, evolutionary algorithms

Klíčová slova v angličtině

transformer networks, edge AI, evolutionary algorithms

Autoři

SEDLÁK, D.; KLHŮFEK, J.; MRÁZEK, V.; VAŠÍČEK, Z.

Vydáno

14.07.2025

Nakladatel

Association for Computing Machinery

Místo

Malaga

ISBN

979-8-4007-1464-1

Kniha

Proceedings of the Genetic and Evolutionary Computation Conference Companion

Strany od

2242

Strany do

2248

Strany počet

7

BibTex

@inproceedings{BUT197537,
  author="David {Sedlák} and Jan {Klhůfek} and Vojtěch {Mrázek} and Zdeněk {Vašíček}",
  title="Towards Efficient Scheduling of Transformer Neural Network Computation for Edge AI Deployment",
  booktitle="Proceedings of the Genetic and Evolutionary Computation Conference Companion",
  year="2025",
  pages="2242--2248",
  publisher="Association for Computing Machinery",
  address="Malaga",
  doi="10.1145/3712255.3734345",
  isbn="979-8-4007-1464-1"
}

VUT

Fakulty a vysokoškolské ústavy

Součásti

Towards Efficient Scheduling of Transformer Neural Network Computation for Edge AI Deployment