AMÁLIA and the future of European Portuguese LLMs

TL;DR

Portugal announced AMÁLIA, a €5.5 million investment in a large language model focused on European Portuguese. The project is open-source but currently lacks model weights and full data release. It marks a significant step for Portuguese NLP, with ongoing challenges in data and benchmarking.

Portugal’s government announced the launch of AMÁLIA, a €5.5 million project to develop a fully open-source large language model focused on European Portuguese, marking a significant step in language-specific AI research.

AMÁLIA is a collaborative effort involving leading Portuguese universities and research labs, including NOVA, IST, IT, and FCT. It builds upon the EuroLLM project, continuing pre-training with a focus on European Portuguese data, primarily sourced from Arquivo.pt and synthetically generated datasets.

Despite its open-source ambitions, the project has not yet released model weights, training logs, or the full datasets, limiting immediate practical use. The model’s training involved 107 billion tokens, with only about 5.8 billion tokens from Portuguese sources, representing roughly 5.5% of the total training data.

AMÁLIA has demonstrated strong performance on Portuguese benchmarks, outperforming some state-of-the-art models like Qwen 3-8B on most tests, but still lags on specific benchmarks such as ALBA. The project also introduced four new benchmarks tailored for European Portuguese, focusing on grammar, syntax, general knowledge, and bias towards Brazilian Portuguese.

Why It Matters

This development is significant because it represents a dedicated effort to create AI models that understand and generate European Portuguese, a language with limited NLP resources compared to global languages like English or Chinese. It highlights Portugal’s commitment to advancing local AI capabilities and the importance of language-specific models in fostering digital sovereignty and cultural preservation.

However, the limited data and current lack of model weights mean practical applications are still in the future. The project underscores ongoing challenges in data availability, benchmarking, and open resource sharing for smaller languages.

AI Translator Language Translator Device Spanish & English Practice Companion – Supports 165 Languages，Tu compañero para practicar español e inglés,Travel Essentials traductor de idiomas portatil

Real-Time 165-Language Translation: Instant two-way translation with 0.5s latency
AI Language Tutor & Accent Support: Native pronunciation correction for Mexican Spanish and regional accents
Smart Vocabulary Flashcards: Automatically saves and reviews new words for better learning

View Latest Price

As an affiliate, we earn on qualifying purchases.

Background

Portugal has historically lagged behind in NLP research compared to English-speaking countries, largely due to limited data and resources. The AMÁLIA project follows earlier efforts like EuroLLM and aligns with international trends toward language-specific models, such as Italy’s Minerva. The project reflects a broader push for regional AI development, emphasizing open-source principles despite current limitations in resource sharing.

Previous benchmarks and models have often focused on Brazilian Portuguese, which differs significantly from European Portuguese, making specialized models like AMÁLIA crucial for accurate language processing in Portugal.

“AMÁLIA aims to treat European Portuguese as a first-class citizen in AI, leveraging dedicated data and collaborative efforts.”

— Portuguese research team lead

“Despite the impressive work, the lack of open model weights and datasets limits immediate usability, raising questions about transparency and progress.”

— Hacker News source

Fado Portugues – Songs from the Soul of Portugal

Includes Melody, Lyrics, and Chords: Melody, lyrics, and chords included
Number of Pages: 144 pages
Instrumentation Details: Features melody, lyrics, and chords

View Latest Price

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear when the full model weights, datasets, and training logs will be publicly released. The impact of the relatively small proportion of Portuguese data in training on model performance is also still under assessment. Additionally, the effectiveness of the benchmarks in capturing Portuguese language nuances needs further validation.

Language Translator Device No WiFi Needed, Tradutor de Idiomas, Upgrade AI Translator Support ChatGPT, Real Time Voice Translation with 150+ Languages, Offline/Photo Translation for Business, Learning

Supports 138 Online Languages: Fast online translation for 138 languages
Offline Translation for 16 Languages: Accurate offline translation in 16 languages
Real-Time Voice Translation: Instant voice translation with AI support

View Latest Price

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include the release of model weights and datasets, further benchmarking, and community engagement to improve Portuguese NLP tools. Continued development will likely focus on increasing Portuguese data, refining benchmarks, and expanding practical applications.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

Language: Portuguese vocabulary and phrases
Learning Aid: Phonetic pronunciation and English translation
Quality: Durable, high-quality cards with portable box

View Latest Price

As an affiliate, we earn on qualifying purchases.

Key Questions

Will the AMÁLIA model be publicly available for use?

It is not yet confirmed when the model weights and datasets will be released. Currently, only some components like processing scripts are accessible.

How does AMÁLIA compare to other multilingual or Portuguese models?

AMÁLIA outperforms some models like Qwen 3-8B on Portuguese benchmarks but still faces challenges in specific areas like the ALBA benchmark, indicating room for improvement.

What are the main challenges in developing European Portuguese NLP models?

The primary challenges include limited Portuguese data, especially high-quality, diverse datasets, and the need for benchmarks that accurately reflect Portuguese language and cultural nuances.

Why is open-sourcing important for models like AMÁLIA?

Open-sourcing promotes transparency, community-driven improvements, and broader accessibility, especially vital for smaller languages with limited commercial NLP resources.

AMÁLIA and the future of European Portuguese LLMs

Up next

Private equity firm EQT to buy Japan restaurant review operator for $3.7b

Author

The Genius Factory Team

Share article

Why It Matters

AI Translator Language Translator Device Spanish & English Practice Companion – Supports 165 Languages，Tu compañero para practicar español e inglés,Travel Essentials traductor de idiomas portatil

Background

Fado Portugues – Songs from the Soul of Portugal

What Remains Unclear

Language Translator Device No WiFi Needed, Tradutor de Idiomas, Upgrade AI Translator Support ChatGPT, Real Time Voice Translation with 150+ Languages, Offline/Photo Translation for Business, Learning

What’s Next

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

Key Questions

Will the AMÁLIA model be publicly available for use?

How does AMÁLIA compare to other multilingual or Portuguese models?

What are the main challenges in developing European Portuguese NLP models?

Why is open-sourcing important for models like AMÁLIA?

What Practical Quantum Advantage Could Mean for Business

The Real Cost of a Local-Inference Rig in 2026

How Fish Finders Became Data Tools for Serious Anglers

The Link Between SDL3 And Better Gaming Signals In Minecraft Java Edition

AI Student Planners: Your Key To Success In 2026

12 Best Scientific Calculator For Students In 2026

Will The Maximum Temperature Be >90° On Jul 29, 2026?

Floodlight Cameras Do More Than Light Up a Driveway

AMÁLIA and the future of European Portuguese LLMs

Up next

Author

The Genius Factory Team

Share article

Why It Matters

AI Translator Language Translator Device Spanish & English Practice Companion – Supports 165 Languages，Tu compañero para practicar español e inglés,Travel Essentials traductor de idiomas portatil

Background

Fado Portugues – Songs from the Soul of Portugal

What Remains Unclear

Language Translator Device No WiFi Needed, Tradutor de Idiomas, Upgrade AI Translator Support ChatGPT, Real Time Voice Translation with 150+ Languages, Offline/Photo Translation for Business, Learning

What’s Next

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

Key Questions

Will the AMÁLIA model be publicly available for use?

How does AMÁLIA compare to other multilingual or Portuguese models?

What are the main challenges in developing European Portuguese NLP models?

Why is open-sourcing important for models like AMÁLIA?

You May Also Like