TL;DR
Portugal announced AMÁLIA, a €5.5 million investment in a large language model focused on European Portuguese. The project is open-source but currently lacks model weights and full data release. It marks a significant step for Portuguese NLP, with ongoing challenges in data and benchmarking.
Portugal’s government announced the launch of AMÁLIA, a €5.5 million project to develop a fully open-source large language model focused on European Portuguese, marking a significant step in language-specific AI research.
AMÁLIA is a collaborative effort involving leading Portuguese universities and research labs, including NOVA, IST, IT, and FCT. It builds upon the EuroLLM project, continuing pre-training with a focus on European Portuguese data, primarily sourced from Arquivo.pt and synthetically generated datasets.
Despite its open-source ambitions, the project has not yet released model weights, training logs, or the full datasets, limiting immediate practical use. The model’s training involved 107 billion tokens, with only about 5.8 billion tokens from Portuguese sources, representing roughly 5.5% of the total training data.
AMÁLIA has demonstrated strong performance on Portuguese benchmarks, outperforming some state-of-the-art models like Qwen 3-8B on most tests, but still lags on specific benchmarks such as ALBA. The project also introduced four new benchmarks tailored for European Portuguese, focusing on grammar, syntax, general knowledge, and bias towards Brazilian Portuguese.
Why It Matters
This development is significant because it represents a dedicated effort to create AI models that understand and generate European Portuguese, a language with limited NLP resources compared to global languages like English or Chinese. It highlights Portugal’s commitment to advancing local AI capabilities and the importance of language-specific models in fostering digital sovereignty and cultural preservation.
However, the limited data and current lack of model weights mean practical applications are still in the future. The project underscores ongoing challenges in data availability, benchmarking, and open resource sharing for smaller languages.

Portuguese for Beginners: Practical Learning with SynapseLingo (Learn Portuguese)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
Portugal has historically lagged behind in NLP research compared to English-speaking countries, largely due to limited data and resources. The AMÁLIA project follows earlier efforts like EuroLLM and aligns with international trends toward language-specific models, such as Italy’s Minerva. The project reflects a broader push for regional AI development, emphasizing open-source principles despite current limitations in resource sharing.
Previous benchmarks and models have often focused on Brazilian Portuguese, which differs significantly from European Portuguese, making specialized models like AMÁLIA crucial for accurate language processing in Portugal.
“AMÁLIA aims to treat European Portuguese as a first-class citizen in AI, leveraging dedicated data and collaborative efforts.”
— Portuguese research team lead
“Despite the impressive work, the lack of open model weights and datasets limits immediate usability, raising questions about transparency and progress.”
— Hacker News source

Fado Portugues – Songs from the Soul of Portugal
Melody/Lyrics/Chords
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It remains unclear when the full model weights, datasets, and training logs will be publicly released. The impact of the relatively small proportion of Portuguese data in training on model performance is also still under assessment. Additionally, the effectiveness of the benchmarks in capturing Portuguese language nuances needs further validation.

AI Language Translator Device, 2026 Upgraded VORMOR Translator No WiFi Needed, Support ChatGPT, Instant Two-Way 150 Languages Translation, Offline/Photo Translation for Business Travel
【AI Translator Supporting 150 Languages】Vormor instant translator adopts the latest technology, ultra-fast and accurate translation, the response time…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Next steps include the release of model weights and datasets, further benchmarking, and community engagement to improve Portuguese NLP tools. Continued development will likely focus on increasing Portuguese data, refining benchmarks, and expanding practical applications.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips
PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Will the AMÁLIA model be publicly available for use?
It is not yet confirmed when the model weights and datasets will be released. Currently, only some components like processing scripts are accessible.
How does AMÁLIA compare to other multilingual or Portuguese models?
AMÁLIA outperforms some models like Qwen 3-8B on Portuguese benchmarks but still faces challenges in specific areas like the ALBA benchmark, indicating room for improvement.
What are the main challenges in developing European Portuguese NLP models?
The primary challenges include limited Portuguese data, especially high-quality, diverse datasets, and the need for benchmarks that accurately reflect Portuguese language and cultural nuances.
Why is open-sourcing important for models like AMÁLIA?
Open-sourcing promotes transparency, community-driven improvements, and broader accessibility, especially vital for smaller languages with limited commercial NLP resources.