Commit add1ad1a authored by Jan Kocoń's avatar Jan Kocoń

updated README

parent e45fcde4
This is a cloned version of [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) repository adapted for the Polish language. Trained model for Polish is available [here](https://minio.clarin-pl.eu/minio/public/models/gpt2/).
Example run for text generation:
```
sh scripts/generate_text_3.sh
```
Questions: jan.kocon@pwr.edu.pl
ORIGINAL README:
Megatron is a large, powerful transformer. This repo is for ongoing research on training large, powerful transformer language models at scale. Currently, we support model-parallel, multinode training of [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [BERT](https://arxiv.org/pdf/1810.04805.pdf) in mixed precision.
Our codebase is capable of efficiently training a 72-layer, 8.3 Billion Parameter GPT2 Language model with 8-way model and 64-way data parallelism across 512 GPUs. We find that bigger language models are able to surpass current GPT2-1.5B wikitext perplexities in as little as 5 epochs of training.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment