updated README

add1ad1a · Jan Kocoń · e45fcde4 · add1ad1a
Commit add1ad1a authored Mar 3, 2021 by Jan Kocoń
--- a/README.md
+++ b/README.md
+This is a cloned version of [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) repository adapted for the Polish language. Trained model for Polish is available [here](https://minio.clarin-pl.eu/minio/public/models/gpt2/).
+
+Example run for text generation: 
+```
+sh scripts/generate_text_3.sh
+```
+
+Questions: jan.kocon@pwr.edu.pl
+
+ORIGINAL README:
+
 Megatron is a large, powerful transformer. This repo is for ongoing research on training large, powerful transformer language models at scale. Currently, we support model-parallel, multinode training of [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [BERT](https://arxiv.org/pdf/1810.04805.pdf) in mixed precision. 

 Our codebase is capable of efficiently training a 72-layer, 8.3 Billion Parameter GPT2 Language model with 8-way model and 64-way data parallelism across 512 GPUs. We find that bigger language models are able to surpass current GPT2-1.5B wikitext perplexities in as little as 5 epochs of training.