Recent advances in natural language processing (NLP) and large language models (LLMs) have been largely driven by access to vast amounts of high-quality training data. However, data remains imbalanced across domains, languages, and stylistic variations, posing a significant challenge known as data sparsity. This dissertation investigates how machine translation (MT) models and LLMs can adapt effectively under data-sparse conditions, emphasizing generalization, robustness, and efficiency. We introduce the concept of adaptation to data sparsity (ADaS), and present a systematic exploration across three dimensions: domain, language, and style. The work proposes and evaluates novel adaptation strategies, including robust meta-learning, multilingual meta-adaptation, bidirectional contrastive learning, cross-lingual instruction tuning, neuron-level style control, and joint activation editing. These methods are validated on a range of tasks and benchmarks, demonstrating improvements in performance, sample efficiency, and generalization under low-resource conditions. The contributions of this thesis span six peer-reviewed publications and include new datasets, methodologies, and frameworks that enhance the adaptability of both MT systems and LLMs. Taken together, they advance our understanding of how modern NLP models can operate reliably and fairly under data limitations or imbalances, laying the groundwork for more inclusive and resource-efficient language technologies.
BibTeXKey: Lai25