我们是 南信大 DH 互联网技术社团
在这里收集当下火热的技术文章,并且分到对应板块内,作为社团的技术积累,供历届社友学习
本站不开放注册,访客可以正常浏览
| Task | Description | Modality | Pipeline identifier |
|---|---|---|---|
| Text classification | assign a label to a given sequence of text | NLP | pipeline(task=“sentiment-analysis”) |
| Text generation | generate text given a prompt | NLP | pipeline(task=“text-generation”) |
| Summarization | generate a summary of a sequence of text or document | NLP | pipeline(task=“summarization”) |
| Image classification | assign a label to an image | Computer vision | pipeline(task=“image-classification”) |
| Image segmentation | assign a label to each individual pixel of an image (supports semantic, panoptic, and instance segmentation) | Computer vision | pipeline(task=“image-segmentation”) |
| Object detection | predict the bounding boxes and classes of objects in an image | Computer vision | pipeline(task=“object-detection”) |
| Audio classification | assign a label to some audio data | Audio | pipeline(task=“audio-classification”) |
| Automatic speech recognition | transcribe speech into text | Audio | pipeline(task=“automatic-speech-recognition”) |
| Visual question answering | answer a question about the image, given an image and a question | Multimodal | pipeline(task=“vqa”) |
| Document question answering | answer a question about the document, given a document and a question | Multimodal | pipeline(task=“document-question-answering”) |
| Image captioning | generate a caption for a given image | Multimodal | pipeline(task=“image-to-text”) |
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("We are very happy to show you the ? Transformers library.")
results = classifier(["We are very happy to show you the ? Transformers library.", "We hope you don't hate it."])
for result in results:
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
classifier("Nous sommes très heureux de vous présenter la bibliothèque ? Transformers.")
from transformers import AutoTokenizer
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
encoding = tokenizer("We are very happy to show you the ? Transformers library.")
print(encoding)
pt_batch = tokenizer(
["We are very happy to show you the ? Transformers library.", "We hope you don't hate it."],
padding=True,
truncation=True,
max_length=512,
return_tensors="pt",
)
from transformers import AutoModelForSequenceClassification
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
pt_outputs = pt_model(**pt_batch)
from torch import nn
pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
print(pt_predictions)
pt_save_directory = "./pt_save_pretrained"
tokenizer.save_pretrained(pt_save_directory)
pt_model.save_pretrained(pt_save_directory)
pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
# 从tf -> pt
from transformers import AutoModel
tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
from transformers import AutoConfig
my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
# n_heads是一个参数,用于指定模型中的注意力头的数量
from transformers import AutoModel
my_model = AutoModel.from_config(my_config)
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="path/to/save/folder/",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=2,
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
from datasets import load_dataset
dataset = load_dataset("rotten_tomatoes") # doctest: +IGNORE_RESULT
def tokenize_dataset(dataset):
return tokenizer(dataset["text"])
dataset = dataset.map(tokenize_dataset, batched=True)
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
from transformers import Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
tokenizer=tokenizer,
data_collator=data_collator,
) # doctest: +SKIP
trainer.train()