DataPilot
/

ArrowSmartPlus_3.6B_instruction

Text Generation

text-generation-inference

Model card Files Files and versions

DataPilot commited on Mar 27, 2024

Commit

54ae212

·

verified ·

1 Parent(s): 87319d3

２

Files changed (1) hide show

README.md +62 -0

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+inference: false
+language: ja
 ---
+## 概要
+「LOCAL AI HACKATHON」における、チームDataPilot,3つめの成果品です。Line社が開発した「japanese-large-lm-3.6b-instruction-sft」をウィキブックの内容をもとに中学、高校範囲にてファインチューニングを行いました。
+## how to use
+```python
+import torch
+from transformers import AutoModelForSequenceClassification
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline #transformerとtorchがインストールされていることを前提とします。
+model = AutoModelForCausalLM.from_pretrainedmodel = AutoModelForCausalLM.from_pretrained("DataPilot/ArrowSmartPlus_3.6B_instant_sft")
+tokenizer = AutoTokenizer.from_pretrained("DataPilot/ArrowSmartPlus_3.6B_instant_sft")
+generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
+torch.cuda.empty_cache()
+input_text = """有機物とは"""
+text = generator(
+    f"ユーザー: {input_text} システム: ",
+    max_length = 100,
+    do_sample = True,
+    temperature = 0.7,
+    top_p = 0.9,
+    top_k = 0,
+    repetition_penalty = 1.1,
+    num_beams = 1,
+    pad_token_id = tokenizer.pad_token_id,
+    num_return_sequences = 1,
+)
+print(text)
+```
+## トークン化:
+ユニグラム言語モデルとバイトフォールバックを備えたセンテンスピーストークナイザー(sentencepiece tokenizer)を使用します。日本語トークナイザーによる事前トークン化は適用されません。したがって、ユーザーは生の文をトークナイザーに直接フィードできます。
+## ライセンス：
+当LLMはオープンソースソフトウェアです。詳しくは下記のリンクをご覧ください。
+https://www.apache.org/licenses/LICENSE-2.0
+## 謝辞：
+機材を貸していただいた Witnessさん 、このような機会を与えてくださった さるどらさん 、その他助言を与えてくださった「ローカルLLMに向き合う会」のみなさま、そしてすべての関係者の皆様に感謝を申し上げます。
+witnessさん：
+https://twitter.com/i_witnessed_it
+さるどらさん：
+https://twitter.com/sald_ra
+ローカルLLMに向き合う会：
+https://discord.com/invite/VuYCYkYaHK