d0rj
d0rj/rut5-base-summ
- ru - en - summarization - dialogue-summarization - text2text-generation - t5 - d0rj/samsum-ru - IlyaGusev/gazeta - zjkarina/matreshka - rc...
Model Documentation
rut5-base-summ
Model
Finetuned ai-forever/ruT5-base for text and dialogue summarization.
Data
All 'train' subsets was concatenated and shuffled with seed
1000 7 .Train subset = 155678 rows.
Metrics
Evaluation on 10% of concatenated 'validation' subsets = 1458 rows.
See WandB logs.
See report at REPORT WIP.
Notes
> Scheduler, optimizer and trainer states are saved into this repo, so you can use that to continue finetune with your own data with existing gradients.
Usage
Summarization pipeline
python
from transformers import pipeline
pipe = pipeline('summarization', model='d0rj/rut5-base-summ')
pipe(text)
Text-to-text generation
python
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained('d0rj/rut5-base-summ')
model = T5ForConditionalGeneration.from_pretrained('d0rj/rut5-base-summ').eval()
input_ids = tokenizer(text, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)