diff --git a/README.md b/README.md index 27ad5c002f75025f60e2c4ced4e70106d0911473..7e1178e1517b4bab87b92408d821df64b72cc7bb 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,14 @@ +This is a cloned version of [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) repository adapted for the Polish language. Trained model for Polish is available [here](https://minio.clarin-pl.eu/minio/public/models/gpt2/). + +Example run for text generation: +``` +sh scripts/generate_text_3.sh +``` + +Questions: jan.kocon@pwr.edu.pl + +ORIGINAL README: + Megatron is a large, powerful transformer. This repo is for ongoing research on training large, powerful transformer language models at scale. Currently, we support model-parallel, multinode training of [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [BERT](https://arxiv.org/pdf/1810.04805.pdf) in mixed precision. Our codebase is capable of efficiently training a 72-layer, 8.3 Billion Parameter GPT2 Language model with 8-way model and 64-way data parallelism across 512 GPUs. We find that bigger language models are able to surpass current GPT2-1.5B wikitext perplexities in as little as 5 epochs of training.