From add1ad1a4e84741b9c4114aac578205c0d904d65 Mon Sep 17 00:00:00 2001 From: Jan Kocon <jan.kocon@pwr.edu.pl> Date: Wed, 3 Mar 2021 16:10:02 +0100 Subject: [PATCH] updated README --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 27ad5c0..7e1178e 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,14 @@ +This is a cloned version of [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) repository adapted for the Polish language. Trained model for Polish is available [here](https://minio.clarin-pl.eu/minio/public/models/gpt2/). + +Example run for text generation: +``` +sh scripts/generate_text_3.sh +``` + +Questions: jan.kocon@pwr.edu.pl + +ORIGINAL README: + Megatron is a large, powerful transformer. This repo is for ongoing research on training large, powerful transformer language models at scale. Currently, we support model-parallel, multinode training of [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [BERT](https://arxiv.org/pdf/1810.04805.pdf) in mixed precision. Our codebase is capable of efficiently training a 72-layer, 8.3 Billion Parameter GPT2 Language model with 8-way model and 64-way data parallelism across 512 GPUs. We find that bigger language models are able to surpass current GPT2-1.5B wikitext perplexities in as little as 5 epochs of training. -- GitLab