# Fine-tuning tasks

## Masked Language Model

https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/auto#transformers.AutoModelForMaskedLM
    

## Next Sentence Prediction

https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/auto#transformers.AutoModelForNextSentencePrediction

## Sequence classification

https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/auto#transformers.AutoModelForSequenceClassification

## Token classification

https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/auto#transformers.AutoModelForTokenClassification
    


## Question-Answering

https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/auto#transformers.AutoModelForQuestionAnswering


# Fine-tuning a pre-trained model

In [1]:
import os
import numpy as np
import transformers
from datasets import load_metric

from dataset_loader import IntentDataset

from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
    DataCollatorWithPadding
)


In [2]:
# transformers.logging.set_verbosity_info()
transformers.logging.set_verbosity_error() 
# We set the verbosity to error to avoid the annoying huggingface warnings 
# when loading models before training them. If you're having trouble getting things to work
# maybe comment that line (setting the verbosity to info also may lead to interesting outputs!)
os.environ['TOKENIZERS_PARALLELISM'] = "false" # trainer (?) was complaining about parallel tokenization
os.environ["WANDB_DISABLED"] = "true" # trainer was complaining about wandb

## Tokenizer

In [3]:
tokenizer_name = 'roberta-base' # try 'bert-base-uncased', 'bert-base-cased', 'bert-large-uncased'
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) # loads a tokenizer

## Dataset

In [4]:
dataset_name = 'twiz-data' # rename to your dataset dir

train_dataset = IntentDataset(dataset_name, tokenizer, 'train') # check twiz_dataset.py for dataset loading code
val_dataset = IntentDataset(dataset_name, tokenizer, 'val')
test_dataset = IntentDataset(dataset_name, tokenizer, 'test')

Loaded Intent detection dataset. 5916 examples. (train). 
Loaded Intent detection dataset. 819 examples. (val). 
Loaded Intent detection dataset. 842 examples. (test). 


## Model

In [5]:
model_pre_trained = 'roberta-base'
model_finetuned = './twiz-intent/checkpoint-555'

model = AutoModelForSequenceClassification.from_pretrained(model_pre_trained, 
                                                           num_labels=len(train_dataset.all_intents)) # Loads the BERT model weights

## Inspect a data sample

In [6]:
inspect_index = 0

print('All data keys:', train_dataset[inspect_index].keys())

print()
print("INPUT: ", tokenizer.decode(train_dataset[inspect_index]['input_ids'])) #train_dataset[inspect_index]['input_ids'].shape

# you can check the correspondence of a label by checking the all_intents attribute, as such:
print()

print("INTENT: ", train_dataset[inspect_index]['label'], " ", train_dataset.all_intents[train_dataset[inspect_index]['label']])

All data keys: dict_keys(['input_ids', 'attention_mask', 'label'])

INPUT:  <s>Please be careful when using any tools or equipment. Remember, safety first! Here is some information about Bacon and Tomato Pasta. It has a 4.8 star rating.  It is estimated to take about 35 minutes. It serves 4. Its difficulty level is Easy.  If this is not quite what you are looking for say, go back. To continue the task, just say, show ingredients.</s></s>show ingredients</s><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>

INTENT:  tensor(29)   IngredientsConfirmationIntent


## Training hyper-parameters

In [7]:
acc = load_metric('accuracy')

training_args = TrainingArguments(
    output_dir='twiz-intent',
    do_train=True,
    do_eval=True,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    logging_strategy='epoch',
    metric_for_best_model='accuracy',
    learning_rate=2e-5,
    num_train_epochs=3,
    weight_decay=0.01,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    load_best_model_at_end=True,
    disable_tqdm=False,
)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = acc.compute(predictions=predictions, references=labels)
    return accuracy

def get_trainer(model):
    return Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics,
    )

trainer = get_trainer(model)

## Results: Pre-trained model

In [8]:
trainer.evaluate(eval_dataset=test_dataset)

***** Running Evaluation *****
  Num examples = 842
  Batch size = 32


{'eval_loss': 3.577387571334839,
 'eval_accuracy': 0.0,
 'eval_runtime': 2.9657,
 'eval_samples_per_second': 283.913,
 'eval_steps_per_second': 9.104}

## Model fine-tuning

In [9]:
trainer.train()

***** Running training *****
  Num examples = 5916
  Num Epochs = 3
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 555


Epoch,Training Loss,Validation Loss,Accuracy
1,1.9107,1.195577,0.752137
2,0.7996,0.885981,0.810745
3,0.5687,0.804752,0.820513


***** Running Evaluation *****
  Num examples = 819
  Batch size = 32
Saving model checkpoint to twiz-intent/checkpoint-185
Configuration saved in twiz-intent/checkpoint-185/config.json
Model weights saved in twiz-intent/checkpoint-185/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 819
  Batch size = 32
Saving model checkpoint to twiz-intent/checkpoint-370
Configuration saved in twiz-intent/checkpoint-370/config.json
Model weights saved in twiz-intent/checkpoint-370/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 819
  Batch size = 32
Saving model checkpoint to twiz-intent/checkpoint-555
Configuration saved in twiz-intent/checkpoint-555/config.json
Model weights saved in twiz-intent/checkpoint-555/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from twiz-intent/checkpoint-555 (score: 0.8205128205128205).


TrainOutput(global_step=555, training_loss=1.0929743036493524, metrics={'train_runtime': 180.1709, 'train_samples_per_second': 98.506, 'train_steps_per_second': 3.08, 'total_flos': 1642190814483840.0, 'train_loss': 1.0929743036493524, 'epoch': 3.0})

## Results: Fine-tuned model

In [10]:
# run the next cell with the next line uncommented and fill your checkpoint directory to evaluate the model

trainer.evaluate(eval_dataset=test_dataset)

***** Running Evaluation *****
  Num examples = 842
  Batch size = 32


{'eval_loss': 0.5670531988143921,
 'eval_accuracy': 0.8634204275534442,
 'eval_runtime': 2.3037,
 'eval_samples_per_second': 365.504,
 'eval_steps_per_second': 11.72,
 'epoch': 3.0}