Commit add1ad1a authored by Jan Kocoń's avatar Jan Kocoń

updated README

parent e45fcde4
This is a cloned version of [Megatron-LM]( repository adapted for the Polish language. Trained model for Polish is available [here](
Example run for text generation:
sh scripts/
Megatron is a large, powerful transformer. This repo is for ongoing research on training large, powerful transformer language models at scale. Currently, we support model-parallel, multinode training of [GPT2]( and [BERT]( in mixed precision.
Our codebase is capable of efficiently training a 72-layer, 8.3 Billion Parameter GPT2 Language model with 8-way model and 64-way data parallelism across 512 GPUs. We find that bigger language models are able to surpass current GPT2-1.5B wikitext perplexities in as little as 5 epochs of training.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment