tsmatz
tsmatz/mt5_summarize_japanese
No description available.
Model Documentation
This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
(Japanese caption : 日本語の要約のモデル)
This model is a fine-tuned version of google/mt5-small trained for Japanese summarization.
This model is fine-tuned on BBC news articles (XL-Sum Japanese dataset), in which the first sentence (headline sentence) is used for summary and others are used for article.
So, please fill news story (including, such as, event, background, result, and comment) as source text in the inferece widget. (Other corprasuch as, conversation, business document, academic paper, or short tale - are not seen in training set.)
It achieves the following results on the evaluation set:Loss: 1.8952
Rouge1: 0.4625
Rouge2: 0.2866
Rougel: 0.3656
Rougelsum: 0.3868
You can download the source code for fine-tuning from here.
The following hyperparameters were used during training:learning_rate: 0.0005
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 90
num_epochs: 10
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:| | 4.2501 | 0.36 | 100 | 3.3685 | 0.3114 | 0.1654 | 0.2627 | 0.2694 | | 3.6436 | 0.72 | 200 | 3.0095 | 0.3023 | 0.1634 | 0.2684 | 0.2764 | | 3.3044 | 1.08 | 300 | 2.8025 | 0.3414 | 0.1789 | 0.2912 | 0.2984 | | 3.2693 | 1.44 | 400 | 2.6284 | 0.3616 | 0.1935 | 0.2979 | 0.3132 | | 3.2025 | 1.8 | 500 | 2.5271 | 0.3790 | 0.2042 | 0.3046 | 0.3192 | | 2.9772 | 2.17 | 600 | 2.4203 | 0.4083 | 0.2374 | 0.3422 | 0.3542 | | 2.9133 | 2.53 | 700 | 2.3863 | 0.3847 | 0.2096 | 0.3316 | 0.3406 | | 2.9383 | 2.89 | 800 | 2.3573 | 0.4016 | 0.2297 | 0.3361 | 0.3500 | | 2.7608 | 3.25 | 900 | 2.3223 | 0.3999 | 0.2249 | 0.3461 | 0.3566 | | 2.7864 | 3.61 | 1000 | 2.2293 | 0.3932 | 0.2219 | 0.3297 | 0.3445 | | 2.7846 | 3.97 | 1100 | 2.2097 | 0.4386 | 0.2617 | 0.3766 | 0.3826 | | 2.7495 | 4.33 | 1200 | 2.1879 | 0.4100 | 0.2449 | 0.3481 | 0.3551 | | 2.6092 | 4.69 | 1300 | 2.1515 | 0.4398 | 0.2714 | 0.3787 | 0.3842 | | 2.5598 | 5.05 | 1400 | 2.1195 | 0.4366 | 0.2545 | 0.3621 | 0.3736 | | 2.5283 | 5.41 | 1500 | 2.0637 | 0.4274 | 0.2551 | 0.3649 | 0.3753 | | 2.5947 | 5.77 | 1600 | 2.0588 | 0.4454 | 0.2800 | 0.3828 | 0.3921 | | 2.5354 | 6.14 | 1700 | 2.0357 | 0.4253 | 0.2582 | 0.3546 | 0.3687 | | 2.5203 | 6.5 | 1800 | 2.0263 | 0.4444 | 0.2686 | 0.3648 | 0.3764 | | 2.5303 | 6.86 | 1900 | 1.9926 | 0.4455 | 0.2771 | 0.3795 | 0.3948 | | 2.4953 | 7.22 | 2000 | 1.9576 | 0.4523 | 0.2873 | 0.3869 | 0.4053 | | 2.4271 | 7.58 | 2100 | 1.9384 | 0.4455 | 0.2811 | 0.3713 | 0.3862 | | 2.4462 | 7.94 | 2200 | 1.9230 | 0.4530 | 0.2846 | 0.3754 | 0.3947 | | 2.3303 | 8.3 | 2300 | 1.9311 | 0.4519 | 0.2814 | 0.3755 | 0.3887 | | 2.3916 | 8.66 | 2400 | 1.9213 | 0.4598 | 0.2897 | 0.3688 | 0.3889 | | 2.5995 | 9.03 | 2500 | 1.9060 | 0.4526 | 0.2820 | 0.3733 | 0.3946 | | 2.3348 | 9.39 | 2600 | 1.9021 | 0.4595 | 0.2856 | 0.3762 | 0.3988 | | 2.4035 | 9.74 | 2700 | 1.8952 | 0.4625 | 0.2866 | 0.3656 | 0.3868 |
Transformers 4.23.1
Pytorch 1.12.1+cu102
Datasets 2.6.1
Tokenizers 0.13.1
mt5_summarize_japanese
(Japanese caption : 日本語の要約のモデル)
This model is a fine-tuned version of google/mt5-small trained for Japanese summarization.
This model is fine-tuned on BBC news articles (XL-Sum Japanese dataset), in which the first sentence (headline sentence) is used for summary and others are used for article.
So, please fill news story (including, such as, event, background, result, and comment) as source text in the inferece widget. (Other corpra
It achieves the following results on the evaluation set:
Intended uses
python
from transformers import pipeline
seq2seq = pipeline("summarization", model="tsmatz/mt5_summarize_japanese")
sample_text = "サッカーのワールドカップカタール大会、世界ランキング24位でグループEに属する日本は、23日の1次リーグ初戦において、世界11位で過去4回の優勝を誇るドイツと対戦しました。試合は前半、ドイツの一方的なペースではじまりましたが、後半、日本の森保監督は攻撃的な選手を積極的に動員して流れを変えました。結局、日本は前半に1点を奪われましたが、途中出場の堂安律選手と浅野拓磨選手が後半にゴールを決め、2対1で逆転勝ちしました。ゲームの流れをつかんだ森保采配が功を奏しました。"
result = seq2seq(sample_text)
print(result)
Training procedure
You can download the source code for fine-tuning from here.
Training hyperparameters
The following hyperparameters were used during training:
Training results
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:| | 4.2501 | 0.36 | 100 | 3.3685 | 0.3114 | 0.1654 | 0.2627 | 0.2694 | | 3.6436 | 0.72 | 200 | 3.0095 | 0.3023 | 0.1634 | 0.2684 | 0.2764 | | 3.3044 | 1.08 | 300 | 2.8025 | 0.3414 | 0.1789 | 0.2912 | 0.2984 | | 3.2693 | 1.44 | 400 | 2.6284 | 0.3616 | 0.1935 | 0.2979 | 0.3132 | | 3.2025 | 1.8 | 500 | 2.5271 | 0.3790 | 0.2042 | 0.3046 | 0.3192 | | 2.9772 | 2.17 | 600 | 2.4203 | 0.4083 | 0.2374 | 0.3422 | 0.3542 | | 2.9133 | 2.53 | 700 | 2.3863 | 0.3847 | 0.2096 | 0.3316 | 0.3406 | | 2.9383 | 2.89 | 800 | 2.3573 | 0.4016 | 0.2297 | 0.3361 | 0.3500 | | 2.7608 | 3.25 | 900 | 2.3223 | 0.3999 | 0.2249 | 0.3461 | 0.3566 | | 2.7864 | 3.61 | 1000 | 2.2293 | 0.3932 | 0.2219 | 0.3297 | 0.3445 | | 2.7846 | 3.97 | 1100 | 2.2097 | 0.4386 | 0.2617 | 0.3766 | 0.3826 | | 2.7495 | 4.33 | 1200 | 2.1879 | 0.4100 | 0.2449 | 0.3481 | 0.3551 | | 2.6092 | 4.69 | 1300 | 2.1515 | 0.4398 | 0.2714 | 0.3787 | 0.3842 | | 2.5598 | 5.05 | 1400 | 2.1195 | 0.4366 | 0.2545 | 0.3621 | 0.3736 | | 2.5283 | 5.41 | 1500 | 2.0637 | 0.4274 | 0.2551 | 0.3649 | 0.3753 | | 2.5947 | 5.77 | 1600 | 2.0588 | 0.4454 | 0.2800 | 0.3828 | 0.3921 | | 2.5354 | 6.14 | 1700 | 2.0357 | 0.4253 | 0.2582 | 0.3546 | 0.3687 | | 2.5203 | 6.5 | 1800 | 2.0263 | 0.4444 | 0.2686 | 0.3648 | 0.3764 | | 2.5303 | 6.86 | 1900 | 1.9926 | 0.4455 | 0.2771 | 0.3795 | 0.3948 | | 2.4953 | 7.22 | 2000 | 1.9576 | 0.4523 | 0.2873 | 0.3869 | 0.4053 | | 2.4271 | 7.58 | 2100 | 1.9384 | 0.4455 | 0.2811 | 0.3713 | 0.3862 | | 2.4462 | 7.94 | 2200 | 1.9230 | 0.4530 | 0.2846 | 0.3754 | 0.3947 | | 2.3303 | 8.3 | 2300 | 1.9311 | 0.4519 | 0.2814 | 0.3755 | 0.3887 | | 2.3916 | 8.66 | 2400 | 1.9213 | 0.4598 | 0.2897 | 0.3688 | 0.3889 | | 2.5995 | 9.03 | 2500 | 1.9060 | 0.4526 | 0.2820 | 0.3733 | 0.3946 | | 2.3348 | 9.39 | 2600 | 1.9021 | 0.4595 | 0.2856 | 0.3762 | 0.3988 | | 2.4035 | 9.74 | 2700 | 1.8952 | 0.4625 | 0.2866 | 0.3656 | 0.3868 |