HyperCLOVA X: Leading AI Sovereignty in South Korea - 2. How NAVER Cloud developed the leading Sovereign AI for South Korea

(02) How NAVER Cloud developed the leading Sovereign AI for South Korea

HyperCLOVA X is the world's third hyperscale AI language model to be made available to the public. It is the result of years of research and development by NAVER Cloud. To build a competitive LLM, NAVER Cloud focused on 4 key aspects: back-bone model, high-quality data, tuning/customization, and supercomputing infrastructure.

Back-bone Model

NAVER heavily invested in hyper-scale AI since fall in 2020 and became the third company in the world to reveal its LLM – HyperCLOVA to the public in 2021. After years of continued investment in R&D, NAVER Cloud developed a high-performance back-bone model and made the announcement before other global tech companies.

High-quality Data

HyperCLOVA X is an LLM trained on NAVER's high-quality data. It includes various user-generated Korean content from NAVER Blog, Café, and Knowledge-iN that was accumulated with user consent for more than 20 years. The training data are 6,500 times larger than that of GPT-3, and contain relevant information on Korean culture and lifestyle. 

Because NAVER Cloud is highly invested in building an AI model that suits Korean culture, research was conducted to build a social bias dataset in Korean called ‘KoSBi’. ‘KoSBi’ can be used to train safe sentence classifiers to filter and mitigate social biases in LLM. According to the research, ratio of unsafe generation decreases by 16.47% after filtering based on ‘KoSBi’ data, and HyperCLOVA presented better qualitative performance after filtering than GPT-3 (KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application, 2023). NAVER Cloud also conducted research to construct a Korean bias benchmark dataset called ‘KoBBQ’ to enhance the cultural relevance of AI models regarding Korean culture (KoBBQ: Korean Bias Benchmark for Question Answering). These datasets also make HyperCLOVA X safer and trustworthy.

By utilizing high-quality data, HyperCLOVA X can understand context and answer questions better in Korean, improving its usefulness for Korean users. 

Tuning/Customization

NAVER Cloud accumulated many years of experiences in cloud-based platform development and operation, as well as experts who specialize in LLMOps. HyperCLOVA X is continuously tuned and customized to the needs of our actual customers.

Infrastructure

NAVER Cloud owns multiple supercomputers sufficient for training LLMs with more than 100B parameters and has years of operational experience in maintaining the infrastructure. NAVER Cloud has the largest AI data center in Korea and is developing optimal AI semiconductors to provide a safe and high-quality infrastructure that can efficiently support HyperCLOVA X.

Taggar
Artificial Intellience