Skip to content
Snippets Groups Projects
Commit add1ad1a authored by Jan Kocoń's avatar Jan Kocoń
Browse files

updated README

parent e45fcde4
No related branches found
No related tags found
Loading
This is a cloned version of [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) repository adapted for the Polish language. Trained model for Polish is available [here](https://minio.clarin-pl.eu/minio/public/models/gpt2/).
Example run for text generation:
```
sh scripts/generate_text_3.sh
```
Questions: jan.kocon@pwr.edu.pl
ORIGINAL README:
Megatron is a large, powerful transformer. This repo is for ongoing research on training large, powerful transformer language models at scale. Currently, we support model-parallel, multinode training of [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [BERT](https://arxiv.org/pdf/1810.04805.pdf) in mixed precision.
Our codebase is capable of efficiently training a 72-layer, 8.3 Billion Parameter GPT2 Language model with 8-way model and 64-way data parallelism across 512 GPUs. We find that bigger language models are able to surpass current GPT2-1.5B wikitext perplexities in as little as 5 epochs of training.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment