model_id	GPUメモリ（VRAM)使用量 ※4bit量子化を使用	ストレージ使用量	使用したGPU
meta-llama/Llama-2-7b-hf	6.7GB	13GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-13b-hf	10.3GB	25GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-70b-hf	37.9GB	129GB	NVIDIA A100 80GB x 1
meta-llama/Llama-2-7b-chat-hf	6.7GB	13GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-13b-chat-hf	10.1GB	25GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-70b-chat-hf	37.9GB	129GB	NVIDIA A100 80GB x 1

HugginFaceの記事によると量子化を行わない場合は、Llama-2-70bの場合で、140GBのGPUメモリが必要になります。またGithubでは、8つのマルチGPU構成（=MP 8）を使用することを推奨されています。

Llama2を使用するための環境

Llama2は、基本的にはモデルをローカル環境またはクラウド環境にダウンロードして利用しますが、他にもブラウザやAPIを介して利用する方法もあります。

それぞれの使い方について詳しく説明していきます。

ローカルPC（Windows、Mac、Linux）
GPUクラウドサービス（Linux）
ブラウザ（デモサイト）
Web APIを介して利用する

Windows + GPUのローカル環境

Llama2-7bモデルの推論を実行する場合、GPUメモリ（VRAM）が16GBのローカルPCで利用できます。

ただし、70bモデルの推論やモデルのファインチューニングになると、より大きなGPUメモリが必要です。本格的なLLM開発には、マルチGPUで高速に計算ができるNVIDIA A100やH100が推奨されます。

Windows + GPUのローカル環境の使い方については、以下の記事で詳しく解説しています。

【Llama2】Windows GPUでの使い方この記事では、ローカルのWindows環境を使ってLlama2によるテキスト生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

Windows + CPUのローカル環境（Llama.cpp、Llama-cpp-python）

Llama.cppはCPUやM1チップだけでLLMを動かせるランタイムです。

このランタイムでは、4bitの量子化を使ってモデルを軽くし、CPUでもLlama2を実行できます。ただし、処理速度はかなり低下しますので、できるだけGPUを使うことをお勧めします。

Llama.cppはC言語で書かれていますが、Pythonで動くlama-cpp-pythonも利用できます。

Windows + CPUのローカル環境で「Llama.cpp」を使う方法は、以下の記事で詳しく解説しています。

【Llama2】Winodws CPUでのLlama.cppの使い方この記事では、Windows環境のCPUを使ってLlama.cppによるテキスト生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

Windows + CPUのローカル環境で「Llama-cpp-python」を使う方法については、以下の記事で解説しています。

【Llama2】 Windows CPUでのllama-cpp-pythonの使い方この記事では、Windows CPUのローカル環境を使ってLlama-cpp-pythonによるテキスト生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

Macのローカル環境（Llama.cpp、Llama-cpp-python）

Llama.cppはもともとMacBookでLlamaを実行することを目指して開発されたランタイムです。

Macの環境では、主にM1チップを利用して、4bit量子化されたLlama2モデルを実行することが可能です。

Llama-2-7bの軽量モデルの推論には対応していますが、より大きなモデルの推論やファインチューニングにはGPUの使用をお勧めします。

Llama.cppはC言語で記述されていますが、Pythonで動くLlama-cpp-pythonも使用できます。

Macのローカル環境での「Llama.cpp」の具体的な使い方は、以下の記事で詳しく解説されています。

【Llama2】Macでの Llama.cppの使い方この記事では、Macのローカル環境を使ってLlama.cppによるテキスト生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

「Llama-cpp-python」をMacのローカル環境で使う方法についても、以下の記事で解説しています。

【Llama2】Macでの llama-cpp-pythonの使い方この記事では、Macのローカル環境を使ってLlama-cpp-pythonによるテキスト生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

GPUクラウドサービスとは

ハイスペックなPCの購入や運用にハードルを感じる方には、GPUクラウドサービスがおすすめです。

GPU搭載のサーバーをクラウド上で利用できるため、ハイスペックなPCを購入する必要はありません。PCを購入してしまうと後からスペックを上げることが難しいですが、クラウドサーバーなら時間単位で借りられて、スペックの変更も簡単です。

インターネット環境さえあれば、ローカルPCは低スペックでも問題なく、Windows・MacのどちらのPCからでも接続して利用できます。

GPUクラウドサービスでは、以下のようなサービスが提供されています。

Google Colab
GPUSOROBAN
Amazon Web Service(AWS)
Microsoft Azure（Azure）
Google Cloud Platform（GCP）

Google Colaboratory(Colab)の環境

Google Colab（Colab）は、Googleが提供するブラウザ上でJupyter Notebook形式のPython実行サービスです。Colab上でLlama2を実行することが可能です。

Colabは通常のクラウドサービスとは異なり、一時的に利用するための簡便なサービスです。そのため、セッションが自動的に切れたり、環境やデータの保存ができなかったり、GPUの使用に制限があるなど、いくつかの制約がありますので、ご注意ください。

ColabでLlama2を使用する方法については、以下の記事で詳しく解説しています。

【Llama2】Google Colabでの使い方この記事では、Google Colabの環境を使ってLlama2によるテキスト生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

GPUクラウドサービス(GPUSOROBAN)の環境

GPUSOROBANは、業界最安級のGPUクラウドサービスで、1時間50円から利用できます。

AWSやGCP、AzureのGPUクラウドと比較して50％～70%も安い料金で、高性能なNVIDIA GPUを活用できます。Google ColabのようなGPU使用量の制限やランタイムリセット時にデータが削除される心配もありません。

GPUに特化したシンプルなサービスで、クラウドサーバー（インスタンス）の設定はわずか3分で完了します。データ転送料やストレージコストもインスタンス利用料に含まれ、明瞭で理解しやすい料金体系を採用しています。

国内最上位のNVIDIAエリートパートナーに認定され、日本人による技術サポートも無料で提供されます。

GPUSOROBANを使ったLlama2の使い方については、以下の記事で詳しく解説しています。

【Llama2】GPUSOROBANでの使い方(Ubuntu) この記事では、GPUSOROBANの環境を使ってLlama2によるテキストを生成（推論）する方法について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

Llama2を追加学習・ファインチューニングする方法

Llama2のモデルを新しい領域やタスクに適応させるためにファインチューニング（追加学習）を行うことができます。

専門的な情報やローカルな情報など、未知のデータを使ってモデルをファインチューニングすることで、Llama2ベースの独自のモデルを作成できます。

Llama2をファインチューニングする手順については、以下の記事で詳しく解説しています。

【Llama2】追加学習・ファインチューニング | 7b・13b・70b この記事ではLlama2のモデルをファインチューニングする方法を解説しています。業界最安級GPUクラウド | GPUSOROBAN

Text generation web UIの使い方

Text generation web UI とは、言語生成AI（LLM）のモデルをGUIで使用できるツールです。

無料でインストールすることができ、Web UIの起動後は、コーディング不要でテキスト生成ができるようになります。

【Llama2】Text generation web UIのインストール・使い方この記事では、Text generation web UIからLlama2の日本語モデルを使って、テキスト生成する方法を解説しています。業界最安級GPUクラウド | GPUSOROBAN

ブラウザでLlama2を体験してみる

llama2.aiは、手軽にLlama2を試せるデモサイトです。

ブラウザのWebアプリ上でプロンプトを入力するだけで、簡単にテキスト生成ができます。

llama2.aiはあくまでもデモサイトになりますので、本格的に使用する場合は、ローカル環境やクラウドの環境を用意する必要があります。

llama2.aiの詳しい使い方は、以下の記事で解説されています。

【Llama2】無料でブラウザでお試しできる | llama2.ai この記事では、Llama2をWEBブラウザで使う方法について解説しています。業界最安級GPUクラウド | GPUSOROBAN

Llama2をAPIで使う方法（Replicate API）

Replicateは、手軽に生成AIモデルを実行できるクラウドプラットフォームです。

このプラットフォーム上で提供される生成AIモデルは、Web APIを通じてアクセス可能です。

APIを利用することで、数行のコードを実装するだけで、簡単にLlama2を実行できます。

Llama2のAPIを使用した具体的な手順については、以下の記事で詳しく解説しています。

【Llama2】APIでの使い方 | Replicate API この記事では、Llama2をWeb API経由で使う方法について解説しています。業界最安級GPUクラウド | GPUSOROBAN

Llama2を推論で動かす

ここからはLlama2のChatモデルを利用して、以下の推論タスクを実行していきます。

テキスト生成
質問応答
テキスト分類
テキスト要約
テキスト抽出
日本語翻訳
コード生成

Metaへのモデル利用申請とHuggingFaceの設定

Llama2を利用する前に、Meta社とHuggingFaceへのモデルの利用申請を行います。

設定が完了したら、HuggingFaceのアクセストークンを後で使いますので、メモしておきます。

Metaへのモデル利用申請・HuggingFaceの設定方法について、以下の記事で詳しく解説しています。

【Llama2】Meta・HuggingFaceへの利用申請この記事では、Llama2を使用するためのMeta・HuggingFaceへの利用申請について解説しています。業界最安級GPUクラウド | GPUSOROBAN

推論環境の準備

この記事では、Jupyter Lab形式でLlama2の推論を行います。そのため、環境構築の手順はそれぞれの環境に合わせて記事をご確認ください。

ローカルPC（Windows）の環境構築に関する詳細は、以下の記事で解説しています。

GPUSOROBAN（Ubuntu）の環境構築に関する詳細は、以下の記事で解説しています。

Google Colabの環境構築に関する詳細は、以下の記事で解説しています。

ライブラリのインストール

Jupyter Labを起動したら、新しいNotebookを開きます。

Notebookのコードセルに以下のコマンドを実行し、必要なライブラリをインストールします。

コードセルにコマンドを入力し、[Alt]+[Enter]キーで実行できます。

Windows, GPUSOROBANの場合

pip install transformers sentencepiece accelerate bitsandbytes scipy

Google Colabの場合

!pip install transformers sentencepiece accelerate bitsandbytes scipy

必要なライブラリをインポートします。

import torch
from torch import cuda,bfloat16
from transformers import AutoTokenizer,AutoModelForCausalLM
import transformers

モデルの設定

HuggingFaceのtransformersというライブラリを使用してモデルの準備をします。

HuggingFaceで利用申請したLlamaのモデルを読み込みます。

この段階でモデルがGPUメモリにロードされますので、しばらく時間がかかります。

model_id = "meta-llama/Llama-2-70b-chat-hf"

この記事ではLlama-2-70b-chat-hfのパラメータ70bのチャットモデルを使用していますが、他のモデルを使いたい場合は、表を参考に適宜model_idを変更してください。

model_id	GPUメモリ（VRAM)使用量 ※4bit量子化を使用	ストレージ使用量	使用したGPU
meta-llama/Llama-2-7b-hf	6.7GB	13GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-13b-hf	10.3GB	25GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-70b-hf	37.9GB	129GB	NVIDIA A100 80GB x 1
meta-llama/Llama-2-7b-chat-hf	6.7GB	13GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-13b-chat-hf	10.1GB	25GB	NVIDIA A4000 16GB x 1
meta-llama/Llama-2-70b-chat-hf	37.9GB	129GB	NVIDIA A100 80GB x 1

HuggingFaceにアクセスするためのトークンを設定します。

token = 'hf_*******************************'

HuggignFaceでのアクセストークンの発行方法は以下の記事で解説しています。

モデルの量子化の設定を行います。
量子化は、モデルのパラメータや活性化関数などを低bitに変換する技術で、モデルサイズを軽量化することができます。

quant_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

この記事では、モデルのパラメータを4bitでロードするように設定し、4bitの計算に使用されるデータ型をBFloat16に設定しています。

モデルを読み込みます。初回はモデルをダウンロードするため時間がかかりますが、
2回目以降はモデルの読み込み

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    token=token,
    quantization_config=quant_config,
    device_map="auto"
)

トークナイザーを読み込みます。

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    token=token
)

transformers ライブラリの pipeline 関数を使用して、テキスト生成のためのパイプラインを構築します。

pipeline = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    repetition_penalty=1.1
)

テキスト生成のタスク

プロンプトの実行

タイ旅行のスケジュールを作成するプロンプトを実行してみます。

prompt = """USER:I'm planning to stay in Thailand 1day, so please make a travel plan and schedule.
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=300,
)
print(sequences[0]["generated_text"])

生成結果

タイの観光地やスケジュールなどプロンプトに忠実な回答が得られました。
便宜上max_length=300のトークン数で生成していますが、生成するトークン数の上限を増やすこともできます。

USER:I'm planning to stay in Thailand 1day, so please make a travel plan and schedule.
SYSTEM: Great! Thailand is an amazing country with a rich culture and history. Here's a suggested itinerary for your one-day trip:
9:00 AM - Arrival in Bangkok

Start your day by exploring the city's bustling streets and taking in the sights and sounds of Bangkok. Visit the Grand Palace, Wat Phra Kaew, and Wat Arun for a glimpse into Thailand's royal heritage.
12:00 PM - Lunch at a local restaurant
Head to a local restaurant to try some delicious Thai cuisine. You can sample dishes like tom yum soup, green curry, or pad thai.
1:30 PM - Explore Chinatown
Take a stroll through Bangkok's vibrant Chinatown district and discover its hidden gems. From street food stalls to traditional shops, there's plenty to see and experience.
3:00 PM - Shopping at MBK Center
Spend some time shopping at MBK Center, one of Bangkok's largest shopping malls. You can find everything from fashion items to

日本語翻訳

USER:タイに1日滞在する予定なので、旅行計画とスケジュールを立ててください。

システム: 素晴らしい! タイは豊かな文化と歴史を持つ素晴らしい国です。 日帰り旅行におすすめの旅程は次のとおりです。

9:00 AM - バンコク到着
街のにぎやかな通りを散策し、バンコクの景色や音を満喫することで 1 日を始めましょう。 王宮、ワット プラケオ、ワット アルンを訪れて、タイ王室の伝統を垣間見てみましょう。

12:00 PM - 地元のレストランでのランチ
地元のレストランに行って、おいしいタイ料理を試してみませんか。 トムヤムクンスープ、グリーンカレー、パッタイなどの料理を試すことができます。

1:30 PM - チャイナタウンを探索
バンコクの活気に満ちたチャイナタウン地区を散策し、隠れた名所を発見してください。 屋台から伝統的なお店まで、見どころや体験がたくさんあります。

3:00 PM - MBK センターでショッピング
バンコク最大のショッピング モールの 1 つである MBK センターでショッピングをお楽しみください。 ファッションアイテムから何でも揃う

質問応答のタスク

プロンプトの実行

LLMパラメーターの”temperature”を調整した場合にテキスト生成にどのような影響を及ぼすか質問をしました。

prompt = """USER:How does adjusting the LLM parameter temperature change the generated results?
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=300,
)
print(sequences[0]["generated_text"])

生成結果

LLMパラメーターの”temperature”の質問に対して適切な回答が得られました。

USER:How does adjusting the LLM parameter temperature change the generated results?

SYSTEM:Adjusting the LLM parameter temperature can have a significant impact on the generated results. The temperature parameter controls the complexity of the language model, with higher temperatures resulting in more complex and diverse outputs, while lower temperatures result in simpler and more predictable outputs. Here are some general observations on how adjusting the LLM parameter temperature can affect the generated results:

1. Increasing the temperature:
	* Generates more creative and unexpected responses, especially for open-ended prompts.
	* Can lead to more coherent and fluent text, but may also introduce more errors or incoherences.
	* May be more suitable for tasks that require novelty and creativity, such as storytelling or poetry generation.

2. Decreasing the temperature:
	* Generates more conservative and predictable responses, with fewer errors or incoherences.
	* May be more suitable for tasks that require accuracy and precision, such as text summarization or machine translation.
	* Can also help to reduce the risk of generating offensive or inappropriate content.

3. Finding the optimal temperature:
	* Requires experimentation and evaluation of the generated output quality.
	* A good starting point is often around 0.7-0.8, but

日本語翻訳

User:LLM パラメーターのtemperatureを調整すると、生成される結果はどのように変化しますか?

SYSTEM:LLM パラメータのtemperatureを調整すると、生成される結果に大きな影響を与える可能性があります。 
temperatureパラメーターは言語モデルの複雑さを制御します。
temperatureが高いほど、より複雑で多様な出力が生成され、temperatureが低いほど、出力はより単純で予測可能になります。
 LLM パラメータのtemperature調整が生成される結果にどのような影響を与えるかについて、一般的な観察結果をいくつか示します。

1. temperatureを上げる:
* 特に自由形式のプロンプトの場合、より創造的で予期せぬ応答が生成されます。
* より一貫性のある流暢なテキストが得られる可能性がありますが、より多くのエラーや矛盾が発生する可能性もあります。
* ストーリーテリングや詩の作成など、新規性と創造性が必要なタスクに適している可能性があります。

2. temperatureを下げる:
* エラーや矛盾が少なく、より保守的で予測可能な応答が生成されます。
* テキストの要約や機械翻訳など、正確さと精度が必要なタスクに適している可能性があります。
* 不快なコンテンツや不適切なコンテンツが生成されるリスクを軽減するのにも役立ちます。

3. 最適なtemperatureを見つける:
* 生成された出力品質の実験と評価が必要です。
* 多くの場合、適切な開始点は 0.7 ～ 0.8 程度ですが、

テキスト分類のタスク

プロンプトの実行

ニュース記事の文章から、記事が次のどのカテゴリに分類されるかを確認します。

カテゴリ：「テック」「エンタメ」「健康」

prompt = """USER:Which category does the following article fall under: "Tech", "Entertainment", or "Health"?Please also tell me the basis for that.
YouTube has revealed an artificial intelligence tool that allows users to imitate pop stars like Demi Lovato and John Legend.
The experimental feature, called Dream Track, allows users to create short songs by describing qualities including lyrical content and mood.
Nine artists have allowed their voice to be "cloned" by the software, including Charli XCX, Troye Sivan, T-Pain and Sia.
For now, about 100 creators in the US have been given access to the tool, which can only be used to soundtrack videos on YouTube Shorts - the platform's rival to TikTok.
The company released two sample videos created with Dream Track, featuring passable, but clearly inferior, imitations of Charlie Puth and T-Pain.
The Puth track was generated by the prompt: "A ballad about how opposites attract, upbeat acoustic."
The result feels like a low-quality MP3, full of digital artifacts. At times, Puth's voice sounds "smudged" with consonants that are occasionally muddied and indistinct.
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=500,
)
print(sequences[0]["generated_text"])

生成結果

生成結果はニュース記事を”Tech”に分類しており正解です。根拠も適切な内容になっています。

USER:Which category does the following article fall under: "Tech", "Entertainment", or "Health"?Please also tell me the basis for that.
YouTube has revealed an artificial intelligence tool that allows users to imitate pop stars like Demi Lovato and John Legend.
The experimental feature, called Dream Track, allows users to create short songs by describing qualities including lyrical content and mood.
Nine artists have allowed their voice to be "cloned" by the software, including Charli XCX, Troye Sivan, T-Pain and Sia.
For now, about 100 creators in the US have been given access to the tool, which can only be used to soundtrack videos on YouTube Shorts - the platform's rival to TikTok.
The company released two sample videos created with Dream Track, featuring passable, but clearly inferior, imitations of Charlie Puth and T-Pain.The Puth track was generated by the prompt: "A ballad about how opposites attract, upbeat acoustic."
The result feels like a low-quality MP3, full of digital artifacts. At times, Puth's voice sounds "smudged" with consonants that are occasionally muddied and indistinct.

SYSTEM:Based on the information provided in the article, the category that the article falls under is "Tech". 
This is because the article primarily deals with the launch of an AI-powered tool called Dream Track by YouTube, which allows users to create short songs using artificial intelligence. 
The article discusses the features and capabilities of the tool, as well as the voices of several artists who have allowed their voices to be cloned by the software. Therefore, the article belongs to the "Tech" category.

日本語翻訳

USER:次の記事は「テクノロジー」「エンタメ」「ヘルス」のどれに該当しますか?その根拠も教えてください。
YouTubeは、ユーザーがデミ・ロヴァートやジョン・レジェンドなどのポップスターを模倣できる人工知能ツールを公開した。
ドリーム トラックと呼ばれる実験的な機能を使用すると、ユーザーは歌詞の内容や雰囲気などの性質を記述して短い曲を作成できます。Charli XCX、Troye Sivan、T-Pain、Sia を含む 9 人のアーティストが、このソフトウェアによって自分の声を「クローン」することを許可しました。
現時点では、米国の約100人のクリエイターがこのツールへのアクセスを許可されているが、このツールはTikTokのライバルであるYouTubeショートの動画のサウンドトラックにのみ使用できる。
同社は、Dream Track で作成された 2 つのサンプル ビデオをリリースしました。このビデオでは、Charlie Puth と T-Pain の模倣は許容できるものの、明らかに劣っています。
Puth トラックは、「対立するものがどのように惹かれるかについてのバラード、アップビートなアコースティック」というプロンプトによって生成されました。
結果は、デジタルアーチファクトが満載の低品質の MP3 のように感じられます。 時々、プースの声は子音で「汚れた」ように聞こえ、時折濁って不明瞭になります。

SYSTEM:
記事で提供されている情報に基づいて、記事が該当するカテゴリは「テクノロジー」です。 
なぜなら、この記事は主に、ユーザーが人工知能を使って短い曲を作成できる、YouTube による Dream Track と呼ばれる AI を活用したツールの立ち上げについて扱っているからです。 
この記事では、このツールの機能と機能、およびソフトウェアによる声の複製を許可した数人のアーティストの声について説明しています。 したがって、この記事は「技術」カテゴリに属します。

テキスト要約のタスク

プロンプトの実行

ニュース記事を要約するプロンプトを実行します。

prompt = """USER:Please summarize the following sentences in one sentence.
Amazon, which launched with an ambition to become the "everything store", is adding another product to its online shopping site: cars.
Buyers in the US will be able to browse and purchase vehicles from dealers on Amazon starting next year, according to an announcement from the company.
The head of Amazon said the move was aimed at "changing the ease with which customers can buy vehicles online".
The selection will be limited to the Hyundai brand to start, it said.
Online car sales remain a tiny fraction of the car market, but a big surge in such transactions during the pandemic shattered the assumption that customers would avoid making such a big purchase online.
Forecasters are expecting such sales to become a bigger part of the business in the years ahead.
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=300,
)
print(sequences[0]["generated_text"])

生成結果

生成結果では、ニュース記事を適切に要約できています。

USER:Please summarize the following sentences in one sentence.
Amazon, which launched with an ambition to become the "everything store", is adding another product to its online shopping site: cars.
Buyers in the US will be able to browse and purchase vehicles from dealers on Amazon starting next year, according to an announcement from the company.
The head of Amazon said the move was aimed at "changing the ease with which customers can buy vehicles online".
The selection will be limited to the Hyundai brand to start, it said.
Online car sales remain a tiny fraction of the car market, but a big surge in such transactions during the pandemic shattered the assumption that customers would avoid making such a big purchase online.
Forecasters are expecting such sales to become a bigger part of the business in the years ahead.

SYSTEM:Sure! Here's a summary of the sentences you provided:
Amazon is expanding its online shopping platform to include cars for sale in the US, starting with the Hyundai brand. The move is aimed at making it easier for customers to buy vehicles online, despite the fact that online car sales currently make up a small fraction of the overall market.

日本語翻訳

USER:
次の文を一文にまとめてください。
「何でも屋」になるという目標を掲げて立ち上げたアマゾンは、自社のオンラインショッピングサイトにもう一つの商品、自動車を加えようとしている。
同社の発表によると、米国の購入者は来年からアマゾン上でディーラーから車両を閲覧して購入できるようになるという。
アマゾンの責任者は、この動きは「顧客がオンラインで自動車を購入することの容易さを変える」ことを目的としていると述べた。
当初は選択肢はヒュンダイブランドに限定されるという。
オンラインでの自動車販売は依然として自動車市場のほんの一部にすぎないが、パンデミック中にこうした取引が大幅に急増したことで、顧客はオンラインでそのような高額な購入を避けるだろうという想定は打ち砕かれた。
予測者らは、こうした売上が今後数年間でビジネスのより大きな部分を占めるようになるだろうと予想している。

SYSTEM:
もちろん! ご提供いただいた文章を要約すると次のとおりです。
アマゾンは、ヒュンダイブランドを皮切りに、米国で販売される自動車を含めてオンラインショッピングプラットフォームを拡大している。 現在、オンライン自動車販売が市場全体に占める割合はごく一部であるにもかかわらず、この動きは、顧客がオンラインで自動車を購入しやすくすることを目的としている。

テキスト抽出のタスク

プロンプトの実行

prompt = """USER:Below are the steps to make pancakes. Extract the one sentence that make pancakes fluffy from the steps below.
Mix Dry Ingredients in a Bowl:
In a large bowl, mix together all-purpose flour, sugar, baking powder, and salt.
Add Egg and Liquid Ingredients:
In a separate bowl, beat the egg and then add milk, melted butter (or alternatives), and vanilla extract if desired. Mix well.
Combine Dry and Liquid Ingredients:
Pour the liquid mixture into the bowl with dry ingredients. Stir lightly until just combined. It's okay if the batter is a bit lumpy.
Let the Batter Rest:
Allow the batter to rest for about 5 minutes. This resting period helps fluff up the texture in the pancakes.
Cook the Pancakes:
Preheat a skillet or griddle over medium heat and grease it with butter or oil.
Pour 1/4 cup of batter onto the skillet for each pancake. Spread the batter into a round shape.
When bubbles form on the surface and the edges start to puff up, flip the pancake and cook the other side until golden brown.
Serve:
Once the pancakes are cooked, serve them hot with your favorite toppings such as maple syrup, butter, fruit, or whipped cream.
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=300,
)
print(sequences[0]["generated_text"])

生成結果

生成結果では、要求に対して適切に文章を抽出できています。

USER:
Below are the steps to make pancakes. Extract the one sentence that make pancakes fluffy from the steps below.
Mix Dry Ingredients in a Bowl:
In a large bowl, mix together all-purpose flour, sugar, baking powder, and salt.
Add Egg and Liquid Ingredients:
In a separate bowl, beat the egg and then add milk, melted butter (or alternatives), and vanilla extract if desired. Mix well.
Combine Dry and Liquid Ingredients:
Pour the liquid mixture into the bowl with dry ingredients. Stir lightly until just combined. It's okay if the batter is a bit lumpy.
Let the Batter Rest:
Allow the batter to rest for about 5 minutes. This resting period helps fluff up the texture in the pancakes.
Cook the Pancakes:
Preheat a skillet or griddle over medium heat and grease it with butter or oil.
Pour 1/4 cup of batter onto the skillet for each pancake. Spread the batter into a round shape.
When bubbles form on the surface and the edges start to puff up, flip the pancake and cook the other side until golden brown.
Serve:
Once the pancakes are cooked, serve them hot with your favorite toppings such as maple syrup, butter, fruit, or whipped cream.

SYSTEM:
The sentence that makes pancakes fluffy is "Allow the batter to rest for about 5 minutes. This resting period helps fluff up the texture in the pancakes."

日本語翻訳

USER：
以下にパンケーキの作り方をご紹介します。以下の手順の中からパンケーキをふわふわに作る一文を抽出します。
乾燥した材料をボウルに入れて混ぜます。
大きなボウルに中力粉、砂糖、ベーキングパウダー、塩を入れて混ぜます。
卵と液体材料を追加します。
別のボウルで卵を溶き、牛乳、溶かしたバター（または代替品）、必要に応じてバニラエッセンスを加えます。 よく混ぜます。
乾燥成分と液体成分を組み合わせる:
液体混合物を乾燥材料の入ったボウルに注ぎます。 ちょうど結合するまで軽くかき混ぜます。 生地が多少ダマになっても大丈夫です。
バッターを休ませる:
生地を5分ほど休ませます。 この休息期間により、パンケーキの食感がふわふわになります。
パンケーキを調理します。
フライパンまたはグリドルを中火で予熱し、バターまたは油を塗ります。
パンケーキごとに1/4カップの生地をフライパンに注ぎます。 生地を丸い形に広げます。
表面に泡が立ち、端が膨らみ始めたら、パンケーキをひっくり返し、反対側もきつね色になるまで焼きます。
仕える：
パンケーキが焼き上がったら、メープルシロップ、バター、フルーツ、ホイップクリームなどのお好みのトッピングを添えて温かいうちにお召し上がりください。

SYSTEM：
パンケーキをふわふわに作る文章は、「生地を約 5 分間休ませてください。この休息期間は、パンケーキの質感をふわふわにするのに役立ちます。」です。

日本語翻訳のタスク

プロンプトの実行

英語から日本語へ翻訳するプロンプトを実行します。

prompt = """USER:Please translate the following sentences into Japanese.
YouTube has revealed an artificial intelligence tool that allows users to imitate pop stars like Demi Lovato and John Legend.
The experimental feature, called Dream Track, allows users to create short songs by describing qualities including lyrical content and mood.
Nine artists have allowed their voice to be "cloned" by the software, including Charli XCX, Troye Sivan, T-Pain and Sia.
For now, about 100 creators in the US have been given access to the tool, which can only be used to soundtrack videos on YouTube Shorts - the platform's rival to TikTok.
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=500,
)
print(sequences[0]["generated_text"])

生成結果

生成結果では正確な翻訳になっています。

SYSTEM:以下の文を日本語に翻訳してください。
USER:
Please translate the following sentences into Japanese.
YouTube has revealed an artificial intelligence tool that allows users to imitate pop stars like Demi Lovato and John Legend.
The experimental feature, called Dream Track, allows users to create short songs by describing qualities including lyrical content and mood.
Nine artists have allowed their voice to be "cloned" by the software, including Charli XCX, Troye Sivan, T-Pain and Sia.
For now, about 100 creators in the US have been given access to the tool, which can only be used to soundtrack videos on YouTube Shorts - the platform's rival to TikTok.

SYSTEM: 
Sure! Here are the translations of the sentences into Japanese:
YouTubeは、人工知能toolを公開した。その機能は、ポップスター達、デミ・ロバートソンやジョン・レジェンドといった歌手の声をイメージすることができます。
この機能は、曲の内容や雰囲気を描写する方法で作成された短い歌を生成することができます。ナインのアーティストが声を「クローン」によってソフトウェアによって複製され、Charli XCX、トロイ・サイヴァン、T-ペイン、シアといったアーティストらが参加しています。
現在は、アメリカ国内で約100人のクリーターがこのツールにアクセスがあります。このツールは、YouTubeShortsというプラットフォームの Rivaleでしか使用できません。

コード生成のタスク

プロンプトの実行

PyTorchのDataLoaderの使い方をLlama2に尋ねてみます。

prompt = """USER:How to use PyTorch DataLoader?
SYSTEM:"""
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=300,
)
print(sequences[0]["generated_text"])

生成結果

生成結果では、有用なアドバイスの内容になっています。

USER:How to use PyTorch DataLoader?
SYSTEM:PyTorch DataLoader is a class in PyTorch that allows you to efficiently load and preprocess large datasets for training machine learning models. Here's an example of how to use it:

1. First, import the DataLoader class from PyTorch:
```
import torch
from torch.utils.data import DataLoader
```
2. Next, define a dataset class that inherits from `Dataset`:
```
class MyDataset(Dataset):
    def __init__(self, data):
        self.data = data
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        item = {key: self.data[index][key] for key in ['feature1', 'feature2', 'label']}
        return item
```
3. Create an instance of the `DataLoader` class and pass in your custom dataset class and any additional arguments:
```
dataloader = DataLoader(MyDataset(train_data), batch_size=32, shuffle=True)
```
In this example, `train_data` is a list of tensors representing the input data for your model. The `batch_size` argument specifies the number of examples

LLMならGPUクラウド

Llama2やその他のLLMを使用する際には、モデルサイズやタスクに応じて必要なスペックが異なります。

LLMで使用されるGPUは高価なため、買い切りのオンプレミスよりも、コストパフォーマンスが高く柔軟な使い方ができるGPUクラウドがおすすめです。

GPUクラウドのメリットは以下の通りです。

必要なときだけ利用して、コストを最小限に抑えられる
タスクに応じてGPUサーバーを変更できる
需要に応じてGPUサーバーを増減できる
簡単に環境構築ができ、すぐに開発をスタートできる
新しいGPUを利用できるため、陳腐化による買い替えが不要
GPUサーバーの高電力・熱管理が不要

コスパをお求めなら、メガクラウドと比較して50%以上安いGPUクラウドサービス「GPUSOROBAN」がおすすめです。

生成AIに最適なGPUクラウド「高速コンピューティング」｜GPUSOROBAN GPUSOROBANの高速コンピューティングは、NVIDIAの高速GPUが業界最安級で使えるクラウドサービスです。NVIDIA A100を始めする高速GPUにより、画像生成AI、大規模言語モデルLLM、機械学習、シミュレーションを高速化します。業界最安級GPUクラウド | GPUSOROBAN

大規模なLLMを計算する場合は、NVIDIA H100のクラスタが使える「GPUSOROBAN AIスパコンクラウド」がおすすめです。

LLMに最適なH100が業界最安級「AIスパコンクラウド」| GPUSOROBAN AIスパコンクラウドはNVIDIA H100を搭載したGPUインスタンスが業界最安級で使えるクラウドサービスです。HGX H100（H100 x8枚）を複数連結したクラスタ構成により、LLMやマルチモーダルAIの計算時間を短縮します。料金はAWSのH100インスタンスと比較して75%安く設定しており、大幅なコストダウンが可能です。業界最安級GPUクラウド | GPUSOROBAN

まとめ

この記事では、Llama2を使用してテキストを生成する方法（推論）について紹介しました。

Llama2は無料で使えて商用利用可能な利便性の高いモデルでありながら、ChatGPTと同等以上の性能があります。

Llama2についてさらに詳しく知りたい方は、関連記事もあわせてご覧ください。

Llama2のファインチューニングの使い方

Llama2の日本語学習済みモデルの使い方

コード生成に特化したCodeLlamaの使い方

【Llama2】コード生成Code Llamaの使い方 | 7B・13B・34B この記事では、Code Llamaによるコード生成をする方法（推論）について紹介しています。業界最安級GPUクラウド | GPUSOROBAN

Llama2とは？使い方・日本語性能・商用利用について解説 | 初心者ガイド

Llama2とは

Llama2の性能と安全性（ChatGPTとの比較）

有用性の評価

安全性の評価

Llama2モデルのバリエーション（7b,13b,70b,Chat）

Llama2は無料で使えて商用利用も可能

クローズドなローカル環境で使える軽量LLM

Llama2の日本語モデル（ELYZA-japanese-Llama-2）

Llama2を動かすにはGPUが必要

Llama2を使用するための環境

Windows + GPUのローカル環境

Windows + CPUのローカル環境（Llama.cpp、Llama-cpp-python）

Macのローカル環境（Llama.cpp、Llama-cpp-python）

GPUクラウドサービスとは

Google Colaboratory(Colab)の環境

GPUクラウドサービス(GPUSOROBAN)の環境

Llama2を追加学習・ファインチューニングする方法

Text generation web UIの使い方

ブラウザでLlama2を体験してみる

Llama2をAPIで使う方法（Replicate API）

Llama2を推論で動かす

Metaへのモデル利用申請とHuggingFaceの設定

推論環境の準備

ライブラリのインストール

モデルの設定

テキスト生成のタスク

プロンプトの実行

生成結果

日本語翻訳

質問応答のタスク

プロンプトの実行

生成結果

日本語翻訳

テキスト分類のタスク

プロンプトの実行

生成結果

日本語翻訳

テキスト要約のタスク

プロンプトの実行

生成結果

日本語翻訳

テキスト抽出のタスク

プロンプトの実行

生成結果

日本語翻訳

日本語翻訳のタスク

プロンプトの実行

生成結果

コード生成のタスク

プロンプトの実行

生成結果

LLMならGPUクラウド

まとめ

前の記事

次の記事

関連記事

GPUでお困りの方はGPUSOROBANで解決！お気軽にご相談ください

GPUでお困りの方はGPUSOROBANで解決！
お気軽にご相談ください