Note
Go to the end to download the full example code
Quantize BERT on Task GLUE¶
Here we show an effective transformer simulated quantization process that NNI team has tried, and users can use NNI to discover better process.
We use the BERT model and the trainer pipeline in the Transformers to do some experiments. The entire quantization process can be divided into the following steps:
Use the BERT-base-uncased model and the trainer pipeline in the transformers to fine-tune the model on the downstream task GLUE. From our experience, the final performance of quantization on the finetuned model is better than quantization directly on the pre-trained model.
Use a specific quantizer to quantize the finetuned model on the GLUE. Here we apply QAT, LSQ and PTQ quantizers to quantize the BERT model so that we can compare their performance of the quantized BERT on the GLUE. Among them, LSQ and QAT are quantization aware training methods, and PTQ is a post-training quantization method.
During the process of quantizing BERT:
we use the BERT model and the trainer pipeline in the Transformers to do some experiments.
we use int8 to quantize Linear layers in the BERT.encoder.
Experiment¶
Preparation¶
This section is mainly for fine-tuning model on the downstream task GLUE. If you are familiar with how to finetune BERT on GLUE dataset, you can skip this section.
Load the tokenizer and BERT model from Huggingface transformers.
Create a trainer instance to fine-tune the BERT model.
Note
Please set dev_mode
to False
to run this tutorial. Here dev_mode
is True
by default is for generating documents.
from pathlib import Path
import argparse
import numpy as np
import torch
from torch.utils.data import ConcatDataset
import nni
from datasets import load_dataset, load_metric
from transformers import BertTokenizerFast, DataCollatorWithPadding, BertForSequenceClassification, EvalPrediction
from transformers.trainer import Trainer
from transformers.training_args import TrainingArguments
task_name = 'qnli'
finetune_lr = 4e-5
quant_lr = 1e-5
quant_method = 'lsq'
dev_mode = True
if dev_mode:
quant_max_epochs = 1
finetune_max_epochs = 1
else:
quant_max_epochs = 10
finetune_max_epochs = 10
Load the pre-trained model from the transformers
def build_model(pretrained_model_name_or_path: str, task_name: str):
is_regression = task_name == 'stsb'
num_labels = 1 if is_regression else (3 if task_name == 'mnli' else 2)
model = BertForSequenceClassification.from_pretrained(pretrained_model_name_or_path, num_labels=num_labels)
return model
Create datasets on the specific task GLUE
def prepare_datasets(task_name: str, tokenizer: BertTokenizerFast, cache_dir: str):
task_to_keys = {
'cola': ('sentence', None),
'mnli': ('premise', 'hypothesis'),
'mrpc': ('sentence1', 'sentence2'),
'qnli': ('question', 'sentence'),
'qqp': ('question1', 'question2'),
'rte': ('sentence1', 'sentence2'),
'sst2': ('sentence', None),
'stsb': ('sentence1', 'sentence2'),
'wnli': ('sentence1', 'sentence2'),
}
sentence1_key, sentence2_key = task_to_keys[task_name]
# used to preprocess the raw data
def preprocess_function(examples):
# Tokenize the texts
args = (
(examples[sentence1_key],) if sentence2_key is None else (examples[sentence1_key], examples[sentence2_key])
)
result = tokenizer(*args, padding=False, max_length=128, truncation=True)
if 'label' in examples:
# In all cases, rename the column to labels because the model will expect that.
result['labels'] = examples['label']
return result
raw_datasets = load_dataset('glue', task_name, cache_dir=cache_dir)
for key in list(raw_datasets.keys()):
if 'test' in key:
raw_datasets.pop(key)
processed_datasets = raw_datasets.map(preprocess_function, batched=True,
remove_columns=raw_datasets['train'].column_names)
train_dataset = processed_datasets['train']
if task_name == 'mnli':
validation_datasets = {
'validation_matched': processed_datasets['validation_matched'],
'validation_mismatched': processed_datasets['validation_mismatched']
}
else:
validation_datasets = {
'validation': processed_datasets['validation']
}
return train_dataset, validation_datasets
Create a trainer instance
Note
Please set is_quant
to False
to fine-tune the BERT model and set is_quant
to True
, when you need to create a traced trainer and use quant_lr
for model quantization.
def prepare_traced_trainer(model, load_best_model_at_end=False, is_quant=False):
is_regression = task_name == 'stsb'
metric = load_metric('glue', task_name)
def compute_metrics(p: EvalPrediction):
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
preds = np.squeeze(preds) if is_regression else np.argmax(preds, axis=1)
result = metric.compute(predictions=preds, references=p.label_ids)
result['default'] = result.get('f1', result.get('accuracy', 0.))
return result
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
train_dataset, validation_datasets = prepare_datasets(task_name, tokenizer, '')
merged_validation_dataset = ConcatDataset([d for d in validation_datasets.values()]) # type: ignore
data_collator = DataCollatorWithPadding(tokenizer)
training_args = TrainingArguments(output_dir='./output/trainer',
do_train=True,
do_eval=True,
evaluation_strategy='steps',
per_device_train_batch_size=128, #128,
per_device_eval_batch_size=128, #128,
num_train_epochs=finetune_max_epochs,
dataloader_num_workers=12,
save_strategy='steps',
save_total_limit=1,
metric_for_best_model='default',
greater_is_better=True,
seed=1024,
load_best_model_at_end=load_best_model_at_end,)
if is_quant:
training_args.learning_rate = quant_lr
else:
training_args.learning_rate = finetune_lr
trainer = nni.trace(Trainer)(model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=merged_validation_dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
return trainer
Create the finetuned model
def build_finetuning_model(state_dict_path: str, is_quant=False):
model = build_model('bert-base-uncased', task_name)
if Path(state_dict_path).exists():
model.load_state_dict(torch.load(state_dict_path))
else:
trainer = prepare_traced_trainer(model, True, is_quant)
trainer.train()
torch.save(model.state_dict(), state_dict_path)
return model
Quantization¶
After fine-tuning the BERT model on the specific task GLUE, a specific quantizer instsance can be created to process quantization aware training or post-training quantization with BERT on the GLUE.
The entire quantization process can be devided into the following steps:
Call
build_finetuning_model
to load or fine-tune the BERT model on a specific task GLUECall
prepare_traced_trainer
and setis_quant
toTrue
to create a traced trainer instance for model quantizationCall the TransformersEvaluator to create an evaluator instance
Use the defined config_list and evaluator to create a quantizer instance
Define
max_steps
ormax_epochs
. Note thatmax_steps
andmax_epochs
cannot be None at the same time.Call
quantizer.compress(max_steps, max_epochs)
to execute the simulated quantization process
import nni
from nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
from nni.contrib.compression.utils import TransformersEvaluator
def fake_quantize():
config_list = [{
'op_types': ['Linear'],
'op_names_re': ['bert.encoder.layer.{}'.format(i) for i in range(12)],
'target_names': ['weight', '_output_'],
'quant_dtype': 'int8',
'quant_scheme': 'affine',
'granularity': 'default',
}]
# create a finetune model
Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)
model: torch.nn.Module = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False) # type: ignore
traced_trainer = prepare_traced_trainer(model, is_quant=False)
evaluator = TransformersEvaluator(traced_trainer)
if quant_method == 'lsq':
quantizer = LsqQuantizer(model, config_list, evaluator)
model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)
elif quant_method == 'qat':
quantizer = QATQuantizer(model, config_list, evaluator, 1000)
model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)
elif quant_method == 'ptq':
quantizer = PtqQuantizer(model, config_list, evaluator)
model, calibration_config = quantizer.compress(max_steps=1, max_epochs=None)
else:
raise ValueError(f"quantization method {quant_method} is not supported")
print(calibration_config)
# evaluate the performance of the fake quantize model
quantizer.evaluator.bind_model(model, quantizer._get_param_names_map())
print(quantizer.evaluator.evaluate())
def evaluate():
model = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)
trainer = prepare_traced_trainer(model, is_quant=False)
metrics = trainer.evaluate()
print(f"Evaluate metrics={metrics}")
fake_quantize()
evaluate()
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/xinzhang3/bug_fix/nni/examples/tutorials/quantization_bert_glue.py:141: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate
metric = load_metric('glue', task_name)
Found cached dataset glue (/home/xinzhang3/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
0%| | 0/3 [00:00<?, ?it/s]
100%|##########| 3/3 [00:00<00:00, 949.37it/s]
Loading cached processed dataset at /home/xinzhang3/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-d191c03109bc149f.arrow
Loading cached processed dataset at /home/xinzhang3/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-dde15e00c1ce7271.arrow
trainer.optimzer is not wrapped by nni.trace, or trainer.optimzer is None, will using huggingface default optimizer.
/anaconda/envs/bug_fix/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
trainer.lr_scheduler is not wrapped by nni.trace, or trainer.lr_scheduler is None, will using huggingface default lr_scheduler.
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
LsqQuantizer only supports symmetric mode, but got affine
0%| | 0/819 [00:00<?, ?it/s]
0%| | 1/819 [00:01<27:12, 2.00s/it]
0%| | 2/819 [00:02<15:31, 1.14s/it]
0%| | 3/819 [00:03<12:45, 1.07it/s]
0%| | 4/819 [00:03<11:08, 1.22it/s]
1%| | 5/819 [00:04<10:25, 1.30it/s]
1%| | 6/819 [00:05<09:50, 1.38it/s]
1%| | 7/819 [00:05<09:46, 1.38it/s]
1%| | 8/819 [00:06<09:39, 1.40it/s]
1%|1 | 9/819 [00:07<09:07, 1.48it/s]
1%|1 | 10/819 [00:07<09:16, 1.45it/s]
1%|1 | 11/819 [00:08<09:22, 1.44it/s]
1%|1 | 12/819 [00:09<08:45, 1.53it/s]
2%|1 | 13/819 [00:09<08:55, 1.51it/s]
2%|1 | 14/819 [00:10<08:32, 1.57it/s]
2%|1 | 15/819 [00:11<08:51, 1.51it/s]
2%|1 | 16/819 [00:11<08:31, 1.57it/s]
2%|2 | 17/819 [00:12<08:49, 1.51it/s]
2%|2 | 18/819 [00:13<08:21, 1.60it/s]
2%|2 | 19/819 [00:13<08:15, 1.62it/s]
2%|2 | 20/819 [00:14<08:37, 1.54it/s]
3%|2 | 21/819 [00:14<08:30, 1.56it/s]
3%|2 | 22/819 [00:15<08:47, 1.51it/s]
3%|2 | 23/819 [00:16<08:59, 1.47it/s]
3%|2 | 24/819 [00:17<09:07, 1.45it/s]
3%|3 | 25/819 [00:17<09:05, 1.46it/s]
3%|3 | 26/819 [00:18<08:27, 1.56it/s]
3%|3 | 27/819 [00:18<08:21, 1.58it/s]
3%|3 | 28/819 [00:19<08:40, 1.52it/s]
4%|3 | 29/819 [00:20<08:53, 1.48it/s]
4%|3 | 30/819 [00:21<09:02, 1.45it/s]
4%|3 | 31/819 [00:21<08:36, 1.53it/s]
4%|3 | 32/819 [00:22<08:49, 1.49it/s]
4%|4 | 33/819 [00:23<08:58, 1.46it/s]
4%|4 | 34/819 [00:23<08:25, 1.55it/s]
4%|4 | 35/819 [00:24<08:37, 1.51it/s]
4%|4 | 36/819 [00:25<08:48, 1.48it/s]
5%|4 | 37/819 [00:25<08:13, 1.58it/s]
5%|4 | 38/819 [00:26<08:00, 1.63it/s]
5%|4 | 39/819 [00:26<07:39, 1.70it/s]
5%|4 | 40/819 [00:27<08:08, 1.60it/s]
5%|5 | 41/819 [00:28<08:20, 1.56it/s]
5%|5 | 42/819 [00:28<08:36, 1.51it/s]
5%|5 | 43/819 [00:29<08:47, 1.47it/s]
5%|5 | 44/819 [00:30<08:34, 1.51it/s]
5%|5 | 45/819 [00:30<08:46, 1.47it/s]
6%|5 | 46/819 [00:31<08:53, 1.45it/s]
6%|5 | 47/819 [00:32<08:58, 1.43it/s]
6%|5 | 48/819 [00:32<09:01, 1.42it/s]
6%|5 | 49/819 [00:33<09:03, 1.42it/s]
6%|6 | 50/819 [00:34<09:04, 1.41it/s]
6%|6 | 51/819 [00:35<08:45, 1.46it/s]
6%|6 | 52/819 [00:35<08:52, 1.44it/s]
6%|6 | 53/819 [00:36<08:18, 1.54it/s]
7%|6 | 54/819 [00:37<08:32, 1.49it/s]
7%|6 | 55/819 [00:37<07:53, 1.61it/s]
7%|6 | 56/819 [00:38<08:08, 1.56it/s]
7%|6 | 57/819 [00:38<08:25, 1.51it/s]
7%|7 | 58/819 [00:39<07:54, 1.60it/s]
7%|7 | 59/819 [00:40<08:14, 1.54it/s]
7%|7 | 60/819 [00:40<08:00, 1.58it/s]
7%|7 | 61/819 [00:41<08:02, 1.57it/s]
8%|7 | 62/819 [00:42<08:13, 1.53it/s]
8%|7 | 63/819 [00:42<08:00, 1.57it/s]
8%|7 | 64/819 [00:43<08:18, 1.52it/s]
8%|7 | 65/819 [00:44<08:20, 1.51it/s]
8%|8 | 66/819 [00:44<08:31, 1.47it/s]
8%|8 | 67/819 [00:45<08:39, 1.45it/s]
8%|8 | 68/819 [00:46<08:29, 1.47it/s]
8%|8 | 69/819 [00:46<08:37, 1.45it/s]
9%|8 | 70/819 [00:47<08:42, 1.43it/s]
9%|8 | 71/819 [00:48<08:11, 1.52it/s]
9%|8 | 72/819 [00:48<08:23, 1.48it/s]
9%|8 | 73/819 [00:49<08:28, 1.47it/s]
9%|9 | 74/819 [00:50<08:17, 1.50it/s]
9%|9 | 75/819 [00:50<08:10, 1.52it/s]
9%|9 | 76/819 [00:51<08:22, 1.48it/s]
9%|9 | 77/819 [00:52<08:02, 1.54it/s]
10%|9 | 78/819 [00:52<07:49, 1.58it/s]
10%|9 | 79/819 [00:53<08:07, 1.52it/s]
10%|9 | 80/819 [00:54<08:01, 1.53it/s]
10%|9 | 81/819 [00:54<07:54, 1.56it/s]
10%|# | 82/819 [00:55<07:39, 1.60it/s]
10%|# | 83/819 [00:56<07:59, 1.53it/s]
10%|# | 84/819 [00:56<07:53, 1.55it/s]
10%|# | 85/819 [00:57<08:02, 1.52it/s]
11%|# | 86/819 [00:57<07:55, 1.54it/s]
11%|# | 87/819 [00:58<07:44, 1.58it/s]
11%|# | 88/819 [00:59<08:01, 1.52it/s]
11%|# | 89/819 [00:59<07:53, 1.54it/s]
11%|# | 90/819 [01:00<07:47, 1.56it/s]
11%|#1 | 91/819 [01:01<07:49, 1.55it/s]
11%|#1 | 92/819 [01:01<07:40, 1.58it/s]
11%|#1 | 93/819 [01:02<07:30, 1.61it/s]
11%|#1 | 94/819 [01:03<07:45, 1.56it/s]
12%|#1 | 95/819 [01:03<07:53, 1.53it/s]
12%|#1 | 96/819 [01:04<07:12, 1.67it/s]
12%|#1 | 97/819 [01:04<07:30, 1.60it/s]
12%|#1 | 98/819 [01:05<07:49, 1.54it/s]
12%|#2 | 99/819 [01:06<07:23, 1.62it/s]
12%|#2 | 100/819 [01:06<07:44, 1.55it/s]
12%|#2 | 101/819 [01:07<08:00, 1.49it/s]
12%|#2 | 102/819 [01:08<07:42, 1.55it/s]
13%|#2 | 103/819 [01:08<07:57, 1.50it/s]
13%|#2 | 104/819 [01:09<08:08, 1.46it/s]
13%|#2 | 105/819 [01:10<07:37, 1.56it/s]
13%|#2 | 106/819 [01:10<07:53, 1.51it/s]
13%|#3 | 107/819 [01:11<07:46, 1.53it/s]
13%|#3 | 108/819 [01:12<07:52, 1.50it/s]
13%|#3 | 109/819 [01:12<07:26, 1.59it/s]
13%|#3 | 110/819 [01:13<07:19, 1.61it/s]
14%|#3 | 111/819 [01:14<07:39, 1.54it/s]
14%|#3 | 112/819 [01:14<07:53, 1.49it/s]
14%|#3 | 113/819 [01:15<07:34, 1.55it/s]
14%|#3 | 114/819 [01:15<07:10, 1.64it/s]
14%|#4 | 115/819 [01:16<07:33, 1.55it/s]
14%|#4 | 116/819 [01:17<07:48, 1.50it/s]
14%|#4 | 117/819 [01:17<07:40, 1.52it/s]
14%|#4 | 118/819 [01:18<07:37, 1.53it/s]
15%|#4 | 119/819 [01:19<07:50, 1.49it/s]
15%|#4 | 120/819 [01:19<07:21, 1.58it/s]
15%|#4 | 121/819 [01:20<07:22, 1.58it/s]
15%|#4 | 122/819 [01:21<07:39, 1.52it/s]
15%|#5 | 123/819 [01:21<07:51, 1.48it/s]
15%|#5 | 124/819 [01:22<07:33, 1.53it/s]
15%|#5 | 125/819 [01:23<07:00, 1.65it/s]
15%|#5 | 126/819 [01:23<07:06, 1.62it/s]
16%|#5 | 127/819 [01:24<07:27, 1.55it/s]
16%|#5 | 128/819 [01:24<06:56, 1.66it/s]
16%|#5 | 129/819 [01:25<07:20, 1.57it/s]
16%|#5 | 130/819 [01:26<07:36, 1.51it/s]
16%|#5 | 131/819 [01:26<07:22, 1.55it/s]
16%|#6 | 132/819 [01:27<07:11, 1.59it/s]
16%|#6 | 133/819 [01:28<07:05, 1.61it/s]
16%|#6 | 134/819 [01:28<07:00, 1.63it/s]
16%|#6 | 135/819 [01:29<07:21, 1.55it/s]
17%|#6 | 136/819 [01:30<07:36, 1.50it/s]
17%|#6 | 137/819 [01:30<07:46, 1.46it/s]
17%|#6 | 138/819 [01:31<07:53, 1.44it/s]
17%|#6 | 139/819 [01:32<07:58, 1.42it/s]
17%|#7 | 140/819 [01:33<08:01, 1.41it/s]
17%|#7 | 141/819 [01:33<08:02, 1.41it/s]
17%|#7 | 142/819 [01:34<08:03, 1.40it/s]
17%|#7 | 143/819 [01:35<08:04, 1.40it/s]
18%|#7 | 144/819 [01:35<08:04, 1.39it/s]
18%|#7 | 145/819 [01:36<08:04, 1.39it/s]
18%|#7 | 146/819 [01:37<07:40, 1.46it/s]
18%|#7 | 147/819 [01:37<07:47, 1.44it/s]
18%|#8 | 148/819 [01:38<07:25, 1.51it/s]
18%|#8 | 149/819 [01:39<07:35, 1.47it/s]
18%|#8 | 150/819 [01:40<07:43, 1.44it/s]
18%|#8 | 151/819 [01:40<07:48, 1.43it/s]
19%|#8 | 152/819 [01:41<07:13, 1.54it/s]
19%|#8 | 153/819 [01:41<07:09, 1.55it/s]
19%|#8 | 154/819 [01:42<07:24, 1.50it/s]
19%|#8 | 155/819 [01:43<07:14, 1.53it/s]
19%|#9 | 156/819 [01:43<07:04, 1.56it/s]
19%|#9 | 157/819 [01:44<07:19, 1.50it/s]
19%|#9 | 158/819 [01:45<07:25, 1.48it/s]
19%|#9 | 159/819 [01:45<07:30, 1.47it/s]
20%|#9 | 160/819 [01:46<07:20, 1.50it/s]
20%|#9 | 161/819 [01:47<07:02, 1.56it/s]
20%|#9 | 162/819 [01:47<07:18, 1.50it/s]
20%|#9 | 163/819 [01:48<07:28, 1.46it/s]
20%|## | 164/819 [01:49<07:01, 1.55it/s]
20%|## | 165/819 [01:49<07:16, 1.50it/s]
20%|## | 166/819 [01:50<07:12, 1.51it/s]
20%|## | 167/819 [01:51<07:07, 1.53it/s]
21%|## | 168/819 [01:51<07:11, 1.51it/s]
21%|## | 169/819 [01:52<06:59, 1.55it/s]
21%|## | 170/819 [01:53<07:12, 1.50it/s]
21%|## | 171/819 [01:53<06:56, 1.56it/s]
21%|##1 | 172/819 [01:54<06:53, 1.56it/s]
21%|##1 | 173/819 [01:54<06:37, 1.62it/s]
21%|##1 | 174/819 [01:55<06:57, 1.55it/s]
21%|##1 | 175/819 [01:56<07:11, 1.49it/s]
21%|##1 | 176/819 [01:57<07:20, 1.46it/s]
22%|##1 | 177/819 [01:57<07:26, 1.44it/s]
22%|##1 | 178/819 [01:58<06:56, 1.54it/s]
22%|##1 | 179/819 [01:59<07:02, 1.52it/s]
22%|##1 | 180/819 [01:59<06:50, 1.56it/s]
22%|##2 | 181/819 [02:00<06:40, 1.59it/s]
22%|##2 | 182/819 [02:01<06:57, 1.52it/s]
22%|##2 | 183/819 [02:01<06:37, 1.60it/s]
22%|##2 | 184/819 [02:02<06:55, 1.53it/s]
23%|##2 | 185/819 [02:02<07:07, 1.48it/s]
23%|##2 | 186/819 [02:03<07:08, 1.48it/s]
23%|##2 | 187/819 [02:04<07:16, 1.45it/s]
23%|##2 | 188/819 [02:05<07:21, 1.43it/s]
23%|##3 | 189/819 [02:05<07:25, 1.42it/s]
23%|##3 | 190/819 [02:06<07:27, 1.41it/s]
23%|##3 | 191/819 [02:07<07:22, 1.42it/s]
23%|##3 | 192/819 [02:07<07:00, 1.49it/s]
24%|##3 | 193/819 [02:08<07:08, 1.46it/s]
24%|##3 | 194/819 [02:09<06:52, 1.51it/s]
24%|##3 | 195/819 [02:09<07:03, 1.47it/s]
24%|##3 | 196/819 [02:10<06:49, 1.52it/s]
24%|##4 | 197/819 [02:11<07:00, 1.48it/s]
24%|##4 | 198/819 [02:11<07:09, 1.45it/s]
24%|##4 | 199/819 [02:12<07:06, 1.45it/s]
24%|##4 | 200/819 [02:13<06:50, 1.51it/s]
25%|##4 | 201/819 [02:13<06:38, 1.55it/s]
25%|##4 | 202/819 [02:14<06:52, 1.50it/s]
25%|##4 | 203/819 [02:15<06:26, 1.59it/s]
25%|##4 | 204/819 [02:15<06:27, 1.59it/s]
25%|##5 | 205/819 [02:16<06:38, 1.54it/s]
25%|##5 | 206/819 [02:17<06:46, 1.51it/s]
25%|##5 | 207/819 [02:17<06:26, 1.59it/s]
25%|##5 | 208/819 [02:18<06:42, 1.52it/s]
26%|##5 | 209/819 [02:19<06:47, 1.50it/s]
26%|##5 | 210/819 [02:19<06:27, 1.57it/s]
26%|##5 | 211/819 [02:20<06:42, 1.51it/s]
26%|##5 | 212/819 [02:20<06:35, 1.53it/s]
26%|##6 | 213/819 [02:21<06:15, 1.61it/s]
26%|##6 | 214/819 [02:22<06:00, 1.68it/s]
26%|##6 | 215/819 [02:22<05:50, 1.72it/s]
26%|##6 | 216/819 [02:23<06:15, 1.60it/s]
26%|##6 | 217/819 [02:23<06:11, 1.62it/s]
27%|##6 | 218/819 [02:24<06:06, 1.64it/s]
27%|##6 | 219/819 [02:25<06:18, 1.58it/s]
27%|##6 | 220/819 [02:25<06:34, 1.52it/s]
27%|##6 | 221/819 [02:26<06:45, 1.47it/s]
27%|##7 | 222/819 [02:27<06:30, 1.53it/s]
27%|##7 | 223/819 [02:27<06:42, 1.48it/s]
27%|##7 | 224/819 [02:28<06:50, 1.45it/s]
27%|##7 | 225/819 [02:29<06:34, 1.51it/s]
28%|##7 | 226/819 [02:29<06:32, 1.51it/s]
28%|##7 | 227/819 [02:30<06:42, 1.47it/s]
28%|##7 | 228/819 [02:31<06:36, 1.49it/s]
28%|##7 | 229/819 [02:31<06:11, 1.59it/s]
28%|##8 | 230/819 [02:32<06:11, 1.59it/s]
28%|##8 | 231/819 [02:33<06:19, 1.55it/s]
28%|##8 | 232/819 [02:33<06:09, 1.59it/s]
28%|##8 | 233/819 [02:34<06:05, 1.60it/s]
29%|##8 | 234/819 [02:35<06:21, 1.53it/s]
29%|##8 | 235/819 [02:35<06:33, 1.48it/s]
29%|##8 | 236/819 [02:36<06:41, 1.45it/s]
29%|##8 | 237/819 [02:37<06:47, 1.43it/s]
29%|##9 | 238/819 [02:38<06:50, 1.41it/s]
29%|##9 | 239/819 [02:38<06:41, 1.44it/s]
29%|##9 | 240/819 [02:39<06:23, 1.51it/s]
29%|##9 | 241/819 [02:39<06:11, 1.56it/s]
30%|##9 | 242/819 [02:40<06:00, 1.60it/s]
30%|##9 | 243/819 [02:41<06:11, 1.55it/s]
30%|##9 | 244/819 [02:41<06:24, 1.50it/s]
30%|##9 | 245/819 [02:42<06:27, 1.48it/s]
30%|### | 246/819 [02:43<06:35, 1.45it/s]
30%|### | 247/819 [02:43<06:26, 1.48it/s]
30%|### | 248/819 [02:44<06:34, 1.45it/s]
30%|### | 249/819 [02:45<06:19, 1.50it/s]
31%|### | 250/819 [02:45<06:28, 1.46it/s]
31%|### | 251/819 [02:46<06:20, 1.49it/s]
31%|### | 252/819 [02:47<05:59, 1.58it/s]
31%|### | 253/819 [02:47<06:14, 1.51it/s]
31%|###1 | 254/819 [02:48<06:02, 1.56it/s]
31%|###1 | 255/819 [02:49<06:15, 1.50it/s]
31%|###1 | 256/819 [02:49<06:02, 1.55it/s]
31%|###1 | 257/819 [02:50<06:03, 1.54it/s]
32%|###1 | 258/819 [02:50<05:44, 1.63it/s]
32%|###1 | 259/819 [02:51<05:41, 1.64it/s]
32%|###1 | 260/819 [02:52<05:59, 1.55it/s]
32%|###1 | 261/819 [02:52<06:06, 1.52it/s]
32%|###1 | 262/819 [02:53<06:16, 1.48it/s]
32%|###2 | 263/819 [02:54<06:08, 1.51it/s]
32%|###2 | 264/819 [02:55<06:06, 1.51it/s]
32%|###2 | 265/819 [02:55<06:11, 1.49it/s]
32%|###2 | 266/819 [02:56<06:19, 1.46it/s]
33%|###2 | 267/819 [02:57<06:25, 1.43it/s]
33%|###2 | 268/819 [02:57<06:29, 1.42it/s]
33%|###2 | 269/819 [02:58<06:17, 1.46it/s]
33%|###2 | 270/819 [02:59<06:22, 1.43it/s]
33%|###3 | 271/819 [02:59<06:20, 1.44it/s]
33%|###3 | 272/819 [03:00<06:24, 1.42it/s]
33%|###3 | 273/819 [03:01<06:06, 1.49it/s]
33%|###3 | 274/819 [03:01<05:53, 1.54it/s]
34%|###3 | 275/819 [03:02<06:05, 1.49it/s]
34%|###3 | 276/819 [03:03<05:42, 1.58it/s]
34%|###3 | 277/819 [03:03<05:50, 1.55it/s]
34%|###3 | 278/819 [03:04<05:41, 1.58it/s]
34%|###4 | 279/819 [03:05<05:50, 1.54it/s]
34%|###4 | 280/819 [03:05<05:41, 1.58it/s]
34%|###4 | 281/819 [03:06<05:55, 1.51it/s]
34%|###4 | 282/819 [03:07<05:59, 1.49it/s]
35%|###4 | 283/819 [03:07<06:07, 1.46it/s]
35%|###4 | 284/819 [03:08<05:52, 1.52it/s]
35%|###4 | 285/819 [03:09<05:48, 1.53it/s]
35%|###4 | 286/819 [03:09<05:59, 1.48it/s]
35%|###5 | 287/819 [03:10<06:06, 1.45it/s]
35%|###5 | 288/819 [03:11<05:45, 1.54it/s]
35%|###5 | 289/819 [03:11<05:56, 1.49it/s]
35%|###5 | 290/819 [03:12<06:04, 1.45it/s]
36%|###5 | 291/819 [03:13<06:04, 1.45it/s]
36%|###5 | 292/819 [03:13<05:32, 1.58it/s]
36%|###5 | 293/819 [03:14<05:41, 1.54it/s]
36%|###5 | 294/819 [03:15<05:52, 1.49it/s]
36%|###6 | 295/819 [03:15<05:56, 1.47it/s]
36%|###6 | 296/819 [03:16<05:49, 1.50it/s]
36%|###6 | 297/819 [03:17<05:57, 1.46it/s]
36%|###6 | 298/819 [03:17<06:03, 1.43it/s]
37%|###6 | 299/819 [03:18<05:41, 1.52it/s]
37%|###6 | 300/819 [03:19<05:46, 1.50it/s]
37%|###6 | 301/819 [03:19<05:26, 1.59it/s]
37%|###6 | 302/819 [03:20<05:40, 1.52it/s]
37%|###6 | 303/819 [03:21<05:36, 1.53it/s]
37%|###7 | 304/819 [03:21<05:20, 1.60it/s]
37%|###7 | 305/819 [03:22<05:16, 1.62it/s]
37%|###7 | 306/819 [03:22<05:14, 1.63it/s]
37%|###7 | 307/819 [03:23<05:31, 1.55it/s]
38%|###7 | 308/819 [03:24<05:16, 1.61it/s]
38%|###7 | 309/819 [03:24<05:17, 1.61it/s]
38%|###7 | 310/819 [03:25<05:18, 1.60it/s]
38%|###7 | 311/819 [03:26<05:33, 1.52it/s]
38%|###8 | 312/819 [03:26<05:22, 1.57it/s]
38%|###8 | 313/819 [03:27<05:29, 1.54it/s]
38%|###8 | 314/819 [03:27<05:20, 1.58it/s]
38%|###8 | 315/819 [03:28<05:22, 1.56it/s]
39%|###8 | 316/819 [03:29<05:16, 1.59it/s]
39%|###8 | 317/819 [03:29<05:29, 1.52it/s]
39%|###8 | 318/819 [03:30<05:24, 1.54it/s]
39%|###8 | 319/819 [03:31<05:36, 1.49it/s]
39%|###9 | 320/819 [03:32<05:43, 1.45it/s]
39%|###9 | 321/819 [03:32<05:29, 1.51it/s]
39%|###9 | 322/819 [03:33<05:38, 1.47it/s]
39%|###9 | 323/819 [03:34<05:44, 1.44it/s]
40%|###9 | 324/819 [03:34<05:48, 1.42it/s]
40%|###9 | 325/819 [03:35<05:50, 1.41it/s]
40%|###9 | 326/819 [03:36<05:26, 1.51it/s]
40%|###9 | 327/819 [03:36<05:34, 1.47it/s]
40%|#### | 328/819 [03:37<05:40, 1.44it/s]
40%|#### | 329/819 [03:38<05:16, 1.55it/s]
40%|#### | 330/819 [03:38<05:27, 1.49it/s]
40%|#### | 331/819 [03:39<05:21, 1.52it/s]
41%|#### | 332/819 [03:40<05:26, 1.49it/s]
41%|#### | 333/819 [03:40<05:29, 1.47it/s]
41%|#### | 334/819 [03:41<05:25, 1.49it/s]
41%|#### | 335/819 [03:42<05:33, 1.45it/s]
41%|####1 | 336/819 [03:42<05:20, 1.51it/s]
41%|####1 | 337/819 [03:43<05:11, 1.55it/s]
41%|####1 | 338/819 [03:44<05:22, 1.49it/s]
41%|####1 | 339/819 [03:44<05:23, 1.48it/s]
42%|####1 | 340/819 [03:45<05:30, 1.45it/s]
42%|####1 | 341/819 [03:46<05:34, 1.43it/s]
42%|####1 | 342/819 [03:46<05:37, 1.41it/s]
42%|####1 | 343/819 [03:47<05:21, 1.48it/s]
42%|####2 | 344/819 [03:48<05:15, 1.50it/s]
42%|####2 | 345/819 [03:48<05:14, 1.51it/s]
42%|####2 | 346/819 [03:49<05:22, 1.47it/s]
42%|####2 | 347/819 [03:50<05:03, 1.55it/s]
42%|####2 | 348/819 [03:50<05:01, 1.56it/s]
43%|####2 | 349/819 [03:51<05:12, 1.50it/s]
43%|####2 | 350/819 [03:52<05:03, 1.54it/s]
43%|####2 | 351/819 [03:52<05:14, 1.49it/s]
43%|####2 | 352/819 [03:53<05:21, 1.45it/s]
43%|####3 | 353/819 [03:54<05:12, 1.49it/s]
43%|####3 | 354/819 [03:54<05:08, 1.51it/s]
43%|####3 | 355/819 [03:55<05:16, 1.46it/s]
43%|####3 | 356/819 [03:56<05:22, 1.44it/s]
44%|####3 | 357/819 [03:56<05:08, 1.50it/s]
44%|####3 | 358/819 [03:57<04:57, 1.55it/s]
44%|####3 | 359/819 [03:58<04:41, 1.63it/s]
44%|####3 | 360/819 [03:58<04:31, 1.69it/s]
44%|####4 | 361/819 [03:59<04:31, 1.69it/s]
44%|####4 | 362/819 [03:59<04:44, 1.60it/s]
44%|####4 | 363/819 [04:00<04:47, 1.59it/s]
44%|####4 | 364/819 [04:01<04:56, 1.53it/s]
45%|####4 | 365/819 [04:01<04:55, 1.54it/s]
45%|####4 | 366/819 [04:02<04:59, 1.51it/s]
45%|####4 | 367/819 [04:03<05:02, 1.49it/s]
45%|####4 | 368/819 [04:03<05:00, 1.50it/s]
45%|####5 | 369/819 [04:04<05:08, 1.46it/s]
45%|####5 | 370/819 [04:05<04:42, 1.59it/s]
45%|####5 | 371/819 [04:05<04:54, 1.52it/s]
45%|####5 | 372/819 [04:06<04:59, 1.49it/s]
46%|####5 | 373/819 [04:07<05:00, 1.48it/s]
46%|####5 | 374/819 [04:07<05:06, 1.45it/s]
46%|####5 | 375/819 [04:08<04:48, 1.54it/s]
46%|####5 | 376/819 [04:09<04:53, 1.51it/s]
46%|####6 | 377/819 [04:09<05:01, 1.47it/s]
46%|####6 | 378/819 [04:10<04:56, 1.49it/s]
46%|####6 | 379/819 [04:11<05:02, 1.45it/s]
46%|####6 | 380/819 [04:12<05:03, 1.45it/s]
47%|####6 | 381/819 [04:12<05:07, 1.43it/s]
47%|####6 | 382/819 [04:13<05:10, 1.41it/s]
47%|####6 | 383/819 [04:14<05:11, 1.40it/s]
47%|####6 | 384/819 [04:14<05:03, 1.43it/s]
47%|####7 | 385/819 [04:15<04:39, 1.55it/s]
47%|####7 | 386/819 [04:15<04:27, 1.62it/s]
47%|####7 | 387/819 [04:16<04:41, 1.54it/s]
47%|####7 | 388/819 [04:17<04:39, 1.54it/s]
47%|####7 | 389/819 [04:17<04:25, 1.62it/s]
48%|####7 | 390/819 [04:18<04:26, 1.61it/s]
48%|####7 | 391/819 [04:19<04:39, 1.53it/s]
48%|####7 | 392/819 [04:19<04:43, 1.51it/s]
48%|####7 | 393/819 [04:20<04:50, 1.46it/s]
48%|####8 | 394/819 [04:21<04:55, 1.44it/s]
48%|####8 | 395/819 [04:22<04:54, 1.44it/s]
48%|####8 | 396/819 [04:22<04:47, 1.47it/s]
48%|####8 | 397/819 [04:23<04:47, 1.47it/s]
49%|####8 | 398/819 [04:23<04:35, 1.53it/s]
49%|####8 | 399/819 [04:24<04:44, 1.48it/s]
49%|####8 | 400/819 [04:25<04:49, 1.45it/s]
49%|####8 | 401/819 [04:26<04:44, 1.47it/s]
49%|####9 | 402/819 [04:26<04:37, 1.50it/s]
49%|####9 | 403/819 [04:27<04:35, 1.51it/s]
49%|####9 | 404/819 [04:28<04:34, 1.51it/s]
49%|####9 | 405/819 [04:28<04:41, 1.47it/s]
50%|####9 | 406/819 [04:29<04:46, 1.44it/s]
50%|####9 | 407/819 [04:30<04:45, 1.44it/s]
50%|####9 | 408/819 [04:30<04:48, 1.43it/s]
50%|####9 | 409/819 [04:31<04:29, 1.52it/s]
50%|##### | 410/819 [04:31<04:14, 1.61it/s]
50%|##### | 411/819 [04:32<04:26, 1.53it/s]
50%|##### | 412/819 [04:33<04:23, 1.55it/s]
50%|##### | 413/819 [04:33<04:03, 1.67it/s]
51%|##### | 414/819 [04:34<04:02, 1.67it/s]
51%|##### | 415/819 [04:35<04:17, 1.57it/s]
51%|##### | 416/819 [04:35<04:12, 1.59it/s]
51%|##### | 417/819 [04:36<04:23, 1.52it/s]
51%|#####1 | 418/819 [04:37<04:29, 1.49it/s]
51%|#####1 | 419/819 [04:37<04:26, 1.50it/s]
51%|#####1 | 420/819 [04:38<04:17, 1.55it/s]
51%|#####1 | 421/819 [04:39<04:16, 1.55it/s]
52%|#####1 | 422/819 [04:39<04:05, 1.62it/s]
52%|#####1 | 423/819 [04:40<04:17, 1.54it/s]
52%|#####1 | 424/819 [04:40<04:11, 1.57it/s]
52%|#####1 | 425/819 [04:41<04:21, 1.51it/s]
52%|#####2 | 426/819 [04:42<04:24, 1.49it/s]
52%|#####2 | 427/819 [04:43<04:29, 1.45it/s]
52%|#####2 | 428/819 [04:43<04:33, 1.43it/s]
52%|#####2 | 429/819 [04:44<04:36, 1.41it/s]
53%|#####2 | 430/819 [04:45<04:28, 1.45it/s]
53%|#####2 | 431/819 [04:45<04:16, 1.51it/s]
53%|#####2 | 432/819 [04:46<04:19, 1.49it/s]
53%|#####2 | 433/819 [04:47<04:25, 1.46it/s]
53%|#####2 | 434/819 [04:47<04:24, 1.45it/s]
53%|#####3 | 435/819 [04:48<04:28, 1.43it/s]
53%|#####3 | 436/819 [04:49<04:30, 1.41it/s]
53%|#####3 | 437/819 [04:50<04:24, 1.45it/s]
53%|#####3 | 438/819 [04:50<04:27, 1.42it/s]
54%|#####3 | 439/819 [04:51<04:29, 1.41it/s]
54%|#####3 | 440/819 [04:52<04:16, 1.48it/s]
54%|#####3 | 441/819 [04:52<04:21, 1.45it/s]
54%|#####3 | 442/819 [04:53<04:24, 1.43it/s]
54%|#####4 | 443/819 [04:54<04:18, 1.46it/s]
54%|#####4 | 444/819 [04:54<04:13, 1.48it/s]
54%|#####4 | 445/819 [04:55<04:18, 1.45it/s]
54%|#####4 | 446/819 [04:56<04:18, 1.45it/s]
55%|#####4 | 447/819 [04:56<04:21, 1.42it/s]
55%|#####4 | 448/819 [04:57<04:09, 1.48it/s]
55%|#####4 | 449/819 [04:58<04:07, 1.50it/s]
55%|#####4 | 450/819 [04:58<04:03, 1.51it/s]
55%|#####5 | 451/819 [04:59<03:56, 1.56it/s]
55%|#####5 | 452/819 [05:00<04:04, 1.50it/s]
55%|#####5 | 453/819 [05:00<04:06, 1.49it/s]
55%|#####5 | 454/819 [05:01<03:50, 1.58it/s]
56%|#####5 | 455/819 [05:02<04:00, 1.51it/s]
56%|#####5 | 456/819 [05:02<03:49, 1.58it/s]
56%|#####5 | 457/819 [05:03<03:40, 1.64it/s]
56%|#####5 | 458/819 [05:03<03:49, 1.57it/s]
56%|#####6 | 459/819 [05:04<03:44, 1.60it/s]
56%|#####6 | 460/819 [05:05<03:54, 1.53it/s]
56%|#####6 | 461/819 [05:05<04:01, 1.48it/s]
56%|#####6 | 462/819 [05:06<04:06, 1.45it/s]
57%|#####6 | 463/819 [05:07<04:09, 1.43it/s]
57%|#####6 | 464/819 [05:08<04:11, 1.41it/s]
57%|#####6 | 465/819 [05:08<04:03, 1.45it/s]
57%|#####6 | 466/819 [05:09<03:49, 1.54it/s]
57%|#####7 | 467/819 [05:10<03:47, 1.55it/s]
57%|#####7 | 468/819 [05:10<03:54, 1.49it/s]
57%|#####7 | 469/819 [05:11<04:00, 1.46it/s]
57%|#####7 | 470/819 [05:12<04:03, 1.43it/s]
58%|#####7 | 471/819 [05:12<04:05, 1.42it/s]
58%|#####7 | 472/819 [05:13<04:07, 1.40it/s]
58%|#####7 | 473/819 [05:14<04:07, 1.40it/s]
58%|#####7 | 474/819 [05:15<04:05, 1.41it/s]
58%|#####7 | 475/819 [05:15<03:44, 1.53it/s]
58%|#####8 | 476/819 [05:16<03:41, 1.55it/s]
58%|#####8 | 477/819 [05:16<03:48, 1.49it/s]
58%|#####8 | 478/819 [05:17<03:53, 1.46it/s]
58%|#####8 | 479/819 [05:18<03:56, 1.43it/s]
59%|#####8 | 480/819 [05:19<03:59, 1.42it/s]
59%|#####8 | 481/819 [05:19<04:00, 1.40it/s]
59%|#####8 | 482/819 [05:20<03:57, 1.42it/s]
59%|#####8 | 483/819 [05:21<03:45, 1.49it/s]
59%|#####9 | 484/819 [05:21<03:50, 1.45it/s]
59%|#####9 | 485/819 [05:22<03:41, 1.51it/s]
59%|#####9 | 486/819 [05:23<03:46, 1.47it/s]
59%|#####9 | 487/819 [05:23<03:27, 1.60it/s]
60%|#####9 | 488/819 [05:24<03:24, 1.61it/s]
60%|#####9 | 489/819 [05:24<03:20, 1.64it/s]
60%|#####9 | 490/819 [05:25<03:22, 1.63it/s]
60%|#####9 | 491/819 [05:26<03:30, 1.56it/s]
60%|###### | 492/819 [05:26<03:30, 1.55it/s]
60%|###### | 493/819 [05:27<03:34, 1.52it/s]
60%|###### | 494/819 [05:28<03:33, 1.52it/s]
60%|###### | 495/819 [05:28<03:37, 1.49it/s]
61%|###### | 496/819 [05:29<03:32, 1.52it/s]
61%|###### | 497/819 [05:30<03:38, 1.47it/s]
61%|###### | 498/819 [05:30<03:30, 1.53it/s]
61%|###### | 499/819 [05:31<03:27, 1.54it/s]
61%|######1 | 500/819 [05:32<03:16, 1.62it/s]
{'loss': 0.0289, 'learning_rate': 1.557997557997558e-05, 'epoch': 0.61}
61%|######1 | 500/819 [05:32<03:16, 1.62it/s]
0%| | 0/43 [00:00<?, ?it/s]
5%|4 | 2/43 [00:00<00:04, 8.93it/s]
7%|6 | 3/43 [00:00<00:06, 6.63it/s]
9%|9 | 4/43 [00:00<00:07, 5.46it/s]
12%|#1 | 5/43 [00:00<00:07, 5.32it/s]
14%|#3 | 6/43 [00:01<00:07, 4.97it/s]
16%|#6 | 7/43 [00:01<00:07, 4.70it/s]
19%|#8 | 8/43 [00:01<00:07, 4.59it/s]
21%|## | 9/43 [00:01<00:07, 4.46it/s]
23%|##3 | 10/43 [00:01<00:07, 4.66it/s]
26%|##5 | 11/43 [00:02<00:06, 4.69it/s]
28%|##7 | 12/43 [00:02<00:06, 4.69it/s]
30%|### | 13/43 [00:02<00:06, 4.70it/s]
33%|###2 | 14/43 [00:02<00:06, 4.72it/s]
35%|###4 | 15/43 [00:03<00:06, 4.56it/s]
37%|###7 | 16/43 [00:03<00:06, 4.45it/s]
40%|###9 | 17/43 [00:03<00:05, 4.39it/s]
42%|####1 | 18/43 [00:03<00:05, 4.38it/s]
44%|####4 | 19/43 [00:04<00:05, 4.33it/s]
47%|####6 | 20/43 [00:04<00:05, 4.30it/s]
49%|####8 | 21/43 [00:04<00:05, 4.28it/s]
51%|#####1 | 22/43 [00:04<00:04, 4.50it/s]
53%|#####3 | 23/43 [00:04<00:04, 4.57it/s]
56%|#####5 | 24/43 [00:05<00:04, 4.46it/s]
58%|#####8 | 25/43 [00:05<00:04, 4.39it/s]
60%|###### | 26/43 [00:05<00:03, 4.34it/s]
63%|######2 | 27/43 [00:05<00:03, 4.30it/s]
65%|######5 | 28/43 [00:06<00:03, 4.28it/s]
67%|######7 | 29/43 [00:06<00:03, 4.32it/s]
70%|######9 | 30/43 [00:06<00:02, 4.34it/s]
72%|#######2 | 31/43 [00:06<00:02, 4.44it/s]
74%|#######4 | 32/43 [00:06<00:02, 4.64it/s]
77%|#######6 | 33/43 [00:07<00:02, 4.69it/s]
79%|#######9 | 34/43 [00:07<00:01, 4.54it/s]
81%|########1 | 35/43 [00:07<00:01, 4.70it/s]
84%|########3 | 36/43 [00:07<00:01, 4.55it/s]
86%|########6 | 37/43 [00:08<00:01, 4.61it/s]
88%|########8 | 38/43 [00:08<00:01, 4.74it/s]
91%|######### | 39/43 [00:08<00:00, 4.57it/s]
93%|#########3| 40/43 [00:08<00:00, 4.52it/s]
95%|#########5| 41/43 [00:08<00:00, 4.50it/s]
98%|#########7| 42/43 [00:09<00:00, 4.48it/s]
100%|##########| 43/43 [00:09<00:00, 5.21it/s]
{'eval_loss': 0.5235913395881653, 'eval_accuracy': 0.9057294526816767, 'eval_default': 0.9057294526816767, 'eval_runtime': 10.089, 'eval_samples_per_second': 541.481, 'eval_steps_per_second': 4.262, 'epoch': 0.61}
61%|######1 | 500/819 [05:42<03:16, 1.62it/s]
100%|##########| 43/43 [00:09<00:00, 5.21it/s]
61%|######1 | 501/819 [05:44<21:51, 4.12s/it]
61%|######1 | 502/819 [05:44<16:11, 3.07s/it]
61%|######1 | 503/819 [05:45<12:26, 2.36s/it]
62%|######1 | 504/819 [05:46<09:36, 1.83s/it]
62%|######1 | 505/819 [05:46<07:46, 1.48s/it]
62%|######1 | 506/819 [05:47<06:17, 1.21s/it]
62%|######1 | 507/819 [05:48<05:31, 1.06s/it]
62%|######2 | 508/819 [05:48<04:50, 1.07it/s]
62%|######2 | 509/819 [05:49<04:29, 1.15it/s]
62%|######2 | 510/819 [05:50<04:07, 1.25it/s]
62%|######2 | 511/819 [05:50<03:56, 1.30it/s]
63%|######2 | 512/819 [05:51<03:51, 1.32it/s]
63%|######2 | 513/819 [05:52<03:39, 1.39it/s]
63%|######2 | 514/819 [05:52<03:36, 1.41it/s]
63%|######2 | 515/819 [05:53<03:26, 1.47it/s]
63%|######3 | 516/819 [05:54<03:27, 1.46it/s]
63%|######3 | 517/819 [05:54<03:21, 1.50it/s]
63%|######3 | 518/819 [05:55<03:26, 1.46it/s]
63%|######3 | 519/819 [05:56<03:28, 1.44it/s]
63%|######3 | 520/819 [05:56<03:19, 1.50it/s]
64%|######3 | 521/819 [05:57<03:13, 1.54it/s]
64%|######3 | 522/819 [05:58<03:11, 1.55it/s]
64%|######3 | 523/819 [05:58<03:17, 1.50it/s]
64%|######3 | 524/819 [05:59<03:22, 1.46it/s]
64%|######4 | 525/819 [06:00<03:13, 1.52it/s]
64%|######4 | 526/819 [06:00<03:07, 1.56it/s]
64%|######4 | 527/819 [06:01<03:14, 1.50it/s]
64%|######4 | 528/819 [06:02<03:18, 1.46it/s]
65%|######4 | 529/819 [06:02<03:21, 1.44it/s]
65%|######4 | 530/819 [06:03<03:23, 1.42it/s]
65%|######4 | 531/819 [06:04<03:14, 1.48it/s]
65%|######4 | 532/819 [06:04<03:07, 1.53it/s]
65%|######5 | 533/819 [06:05<02:56, 1.62it/s]
65%|######5 | 534/819 [06:06<02:57, 1.60it/s]
65%|######5 | 535/819 [06:06<03:05, 1.53it/s]
65%|######5 | 536/819 [06:07<03:00, 1.57it/s]
66%|######5 | 537/819 [06:08<03:06, 1.51it/s]
66%|######5 | 538/819 [06:08<02:56, 1.60it/s]
66%|######5 | 539/819 [06:09<02:53, 1.61it/s]
66%|######5 | 540/819 [06:09<03:01, 1.54it/s]
66%|######6 | 541/819 [06:10<02:55, 1.58it/s]
66%|######6 | 542/819 [06:11<02:52, 1.61it/s]
66%|######6 | 543/819 [06:11<02:59, 1.53it/s]
66%|######6 | 544/819 [06:12<02:55, 1.56it/s]
67%|######6 | 545/819 [06:12<02:42, 1.68it/s]
67%|######6 | 546/819 [06:13<02:53, 1.58it/s]
67%|######6 | 547/819 [06:14<02:44, 1.65it/s]
67%|######6 | 548/819 [06:14<02:53, 1.56it/s]
67%|######7 | 549/819 [06:15<02:53, 1.56it/s]
67%|######7 | 550/819 [06:16<02:49, 1.59it/s]
67%|######7 | 551/819 [06:16<02:53, 1.54it/s]
67%|######7 | 552/819 [06:17<02:57, 1.51it/s]
68%|######7 | 553/819 [06:18<02:54, 1.53it/s]
68%|######7 | 554/819 [06:18<02:59, 1.48it/s]
68%|######7 | 555/819 [06:19<02:52, 1.53it/s]
68%|######7 | 556/819 [06:20<02:55, 1.50it/s]
68%|######8 | 557/819 [06:21<02:59, 1.46it/s]
68%|######8 | 558/819 [06:21<03:00, 1.45it/s]
68%|######8 | 559/819 [06:22<03:02, 1.43it/s]
68%|######8 | 560/819 [06:23<03:03, 1.41it/s]
68%|######8 | 561/819 [06:23<03:01, 1.42it/s]
69%|######8 | 562/819 [06:24<02:52, 1.49it/s]
69%|######8 | 563/819 [06:25<02:46, 1.54it/s]
69%|######8 | 564/819 [06:25<02:37, 1.62it/s]
69%|######8 | 565/819 [06:26<02:35, 1.64it/s]
69%|######9 | 566/819 [06:26<02:29, 1.70it/s]
69%|######9 | 567/819 [06:27<02:24, 1.74it/s]
69%|######9 | 568/819 [06:27<02:35, 1.61it/s]
69%|######9 | 569/819 [06:28<02:29, 1.68it/s]
70%|######9 | 570/819 [06:29<02:27, 1.68it/s]
70%|######9 | 571/819 [06:29<02:37, 1.58it/s]
70%|######9 | 572/819 [06:30<02:43, 1.51it/s]
70%|######9 | 573/819 [06:31<02:41, 1.53it/s]
70%|####### | 574/819 [06:31<02:45, 1.48it/s]
70%|####### | 575/819 [06:32<02:48, 1.45it/s]
70%|####### | 576/819 [06:33<02:40, 1.51it/s]
70%|####### | 577/819 [06:33<02:38, 1.53it/s]
71%|####### | 578/819 [06:34<02:33, 1.57it/s]
71%|####### | 579/819 [06:35<02:39, 1.51it/s]
71%|####### | 580/819 [06:35<02:34, 1.55it/s]
71%|####### | 581/819 [06:36<02:39, 1.50it/s]
71%|#######1 | 582/819 [06:37<02:35, 1.52it/s]
71%|#######1 | 583/819 [06:37<02:30, 1.57it/s]
71%|#######1 | 584/819 [06:38<02:35, 1.51it/s]
71%|#######1 | 585/819 [06:39<02:34, 1.52it/s]
72%|#######1 | 586/819 [06:39<02:35, 1.50it/s]
72%|#######1 | 587/819 [06:40<02:38, 1.46it/s]
72%|#######1 | 588/819 [06:41<02:40, 1.44it/s]
72%|#######1 | 589/819 [06:41<02:29, 1.54it/s]
72%|#######2 | 590/819 [06:42<02:29, 1.54it/s]
72%|#######2 | 591/819 [06:43<02:33, 1.49it/s]
72%|#######2 | 592/819 [06:43<02:36, 1.45it/s]
72%|#######2 | 593/819 [06:44<02:37, 1.43it/s]
73%|#######2 | 594/819 [06:45<02:39, 1.41it/s]
73%|#######2 | 595/819 [06:45<02:28, 1.51it/s]
73%|#######2 | 596/819 [06:46<02:27, 1.51it/s]
73%|#######2 | 597/819 [06:47<02:31, 1.47it/s]
73%|#######3 | 598/819 [06:48<02:33, 1.44it/s]
73%|#######3 | 599/819 [06:48<02:29, 1.47it/s]
73%|#######3 | 600/819 [06:49<02:23, 1.52it/s]
73%|#######3 | 601/819 [06:50<02:27, 1.48it/s]
74%|#######3 | 602/819 [06:50<02:15, 1.60it/s]
74%|#######3 | 603/819 [06:51<02:21, 1.53it/s]
74%|#######3 | 604/819 [06:51<02:25, 1.48it/s]
74%|#######3 | 605/819 [06:52<02:14, 1.59it/s]
74%|#######3 | 606/819 [06:53<02:19, 1.52it/s]
74%|#######4 | 607/819 [06:53<02:09, 1.64it/s]
74%|#######4 | 608/819 [06:54<02:07, 1.65it/s]
74%|#######4 | 609/819 [06:54<02:06, 1.66it/s]
74%|#######4 | 610/819 [06:55<02:06, 1.66it/s]
75%|#######4 | 611/819 [06:56<02:08, 1.62it/s]
75%|#######4 | 612/819 [06:56<02:09, 1.60it/s]
75%|#######4 | 613/819 [06:57<02:01, 1.69it/s]
75%|#######4 | 614/819 [06:57<01:57, 1.74it/s]
75%|#######5 | 615/819 [06:58<02:06, 1.61it/s]
75%|#######5 | 616/819 [06:59<02:05, 1.62it/s]
75%|#######5 | 617/819 [06:59<01:59, 1.68it/s]
75%|#######5 | 618/819 [07:00<01:59, 1.68it/s]
76%|#######5 | 619/819 [07:01<02:06, 1.58it/s]
76%|#######5 | 620/819 [07:01<02:05, 1.58it/s]
76%|#######5 | 621/819 [07:02<02:10, 1.51it/s]
76%|#######5 | 622/819 [07:02<02:06, 1.56it/s]
76%|#######6 | 623/819 [07:03<02:06, 1.55it/s]
76%|#######6 | 624/819 [07:04<02:10, 1.49it/s]
76%|#######6 | 625/819 [07:05<02:13, 1.46it/s]
76%|#######6 | 626/819 [07:05<02:14, 1.43it/s]
77%|#######6 | 627/819 [07:06<02:11, 1.46it/s]
77%|#######6 | 628/819 [07:07<02:13, 1.43it/s]
77%|#######6 | 629/819 [07:07<02:13, 1.42it/s]
77%|#######6 | 630/819 [07:08<02:12, 1.43it/s]
77%|#######7 | 631/819 [07:09<02:10, 1.44it/s]
77%|#######7 | 632/819 [07:09<02:09, 1.44it/s]
77%|#######7 | 633/819 [07:10<02:09, 1.43it/s]
77%|#######7 | 634/819 [07:11<02:10, 1.42it/s]
78%|#######7 | 635/819 [07:12<02:10, 1.41it/s]
78%|#######7 | 636/819 [07:12<02:05, 1.45it/s]
78%|#######7 | 637/819 [07:13<02:07, 1.43it/s]
78%|#######7 | 638/819 [07:14<02:00, 1.51it/s]
78%|#######8 | 639/819 [07:14<01:57, 1.53it/s]
78%|#######8 | 640/819 [07:15<02:00, 1.48it/s]
78%|#######8 | 641/819 [07:16<01:57, 1.51it/s]
78%|#######8 | 642/819 [07:16<01:53, 1.55it/s]
79%|#######8 | 643/819 [07:17<01:57, 1.50it/s]
79%|#######8 | 644/819 [07:18<01:55, 1.52it/s]
79%|#######8 | 645/819 [07:18<01:54, 1.52it/s]
79%|#######8 | 646/819 [07:19<01:53, 1.52it/s]
79%|#######8 | 647/819 [07:19<01:52, 1.53it/s]
79%|#######9 | 648/819 [07:20<01:55, 1.48it/s]
79%|#######9 | 649/819 [07:21<01:55, 1.48it/s]
79%|#######9 | 650/819 [07:22<01:56, 1.45it/s]
79%|#######9 | 651/819 [07:22<01:49, 1.53it/s]
80%|#######9 | 652/819 [07:23<01:48, 1.54it/s]
80%|#######9 | 653/819 [07:24<01:51, 1.49it/s]
80%|#######9 | 654/819 [07:24<01:47, 1.53it/s]
80%|#######9 | 655/819 [07:25<01:50, 1.48it/s]
80%|######## | 656/819 [07:26<01:52, 1.45it/s]
80%|######## | 657/819 [07:26<01:47, 1.50it/s]
80%|######## | 658/819 [07:27<01:50, 1.46it/s]
80%|######## | 659/819 [07:28<01:45, 1.52it/s]
81%|######## | 660/819 [07:28<01:45, 1.50it/s]
81%|######## | 661/819 [07:29<01:47, 1.47it/s]
81%|######## | 662/819 [07:29<01:40, 1.56it/s]
81%|######## | 663/819 [07:30<01:42, 1.53it/s]
81%|########1 | 664/819 [07:31<01:44, 1.48it/s]
81%|########1 | 665/819 [07:32<01:42, 1.50it/s]
81%|########1 | 666/819 [07:32<01:43, 1.48it/s]
81%|########1 | 667/819 [07:33<01:37, 1.56it/s]
82%|########1 | 668/819 [07:33<01:31, 1.66it/s]
82%|########1 | 669/819 [07:34<01:35, 1.56it/s]
82%|########1 | 670/819 [07:35<01:31, 1.64it/s]
82%|########1 | 671/819 [07:35<01:28, 1.67it/s]
82%|########2 | 672/819 [07:36<01:33, 1.57it/s]
82%|########2 | 673/819 [07:37<01:35, 1.52it/s]
82%|########2 | 674/819 [07:37<01:33, 1.56it/s]
82%|########2 | 675/819 [07:38<01:35, 1.50it/s]
83%|########2 | 676/819 [07:38<01:30, 1.57it/s]
83%|########2 | 677/819 [07:39<01:34, 1.51it/s]
83%|########2 | 678/819 [07:40<01:34, 1.49it/s]
83%|########2 | 679/819 [07:41<01:36, 1.45it/s]
83%|########3 | 680/819 [07:41<01:32, 1.51it/s]
83%|########3 | 681/819 [07:42<01:32, 1.49it/s]
83%|########3 | 682/819 [07:42<01:28, 1.54it/s]
83%|########3 | 683/819 [07:43<01:28, 1.54it/s]
84%|########3 | 684/819 [07:44<01:26, 1.55it/s]
84%|########3 | 685/819 [07:45<01:29, 1.50it/s]
84%|########3 | 686/819 [07:45<01:26, 1.54it/s]
84%|########3 | 687/819 [07:46<01:28, 1.49it/s]
84%|########4 | 688/819 [07:47<01:29, 1.47it/s]
84%|########4 | 689/819 [07:47<01:22, 1.57it/s]
84%|########4 | 690/819 [07:48<01:25, 1.51it/s]
84%|########4 | 691/819 [07:48<01:22, 1.56it/s]
84%|########4 | 692/819 [07:49<01:23, 1.52it/s]
85%|########4 | 693/819 [07:50<01:24, 1.50it/s]
85%|########4 | 694/819 [07:50<01:25, 1.46it/s]
85%|########4 | 695/819 [07:51<01:26, 1.44it/s]
85%|########4 | 696/819 [07:52<01:26, 1.42it/s]
85%|########5 | 697/819 [07:53<01:21, 1.49it/s]
85%|########5 | 698/819 [07:53<01:17, 1.57it/s]
85%|########5 | 699/819 [07:54<01:19, 1.51it/s]
85%|########5 | 700/819 [07:54<01:17, 1.54it/s]
86%|########5 | 701/819 [07:55<01:17, 1.52it/s]
86%|########5 | 702/819 [07:56<01:18, 1.49it/s]
86%|########5 | 703/819 [07:56<01:13, 1.57it/s]
86%|########5 | 704/819 [07:57<01:12, 1.59it/s]
86%|########6 | 705/819 [07:58<01:14, 1.52it/s]
86%|########6 | 706/819 [07:58<01:14, 1.52it/s]
86%|########6 | 707/819 [07:59<01:12, 1.54it/s]
86%|########6 | 708/819 [08:00<01:12, 1.53it/s]
87%|########6 | 709/819 [08:00<01:10, 1.57it/s]
87%|########6 | 710/819 [08:01<01:09, 1.56it/s]
87%|########6 | 711/819 [08:01<01:07, 1.60it/s]
87%|########6 | 712/819 [08:02<01:06, 1.62it/s]
87%|########7 | 713/819 [08:03<01:08, 1.54it/s]
87%|########7 | 714/819 [08:04<01:09, 1.51it/s]
87%|########7 | 715/819 [08:04<01:08, 1.53it/s]
87%|########7 | 716/819 [08:05<01:05, 1.58it/s]
88%|########7 | 717/819 [08:05<01:07, 1.51it/s]
88%|########7 | 718/819 [08:06<01:08, 1.47it/s]
88%|########7 | 719/819 [08:07<01:09, 1.44it/s]
88%|########7 | 720/819 [08:08<01:09, 1.42it/s]
88%|########8 | 721/819 [08:08<01:09, 1.41it/s]
88%|########8 | 722/819 [08:09<01:05, 1.47it/s]
88%|########8 | 723/819 [08:10<01:03, 1.50it/s]
88%|########8 | 724/819 [08:10<01:04, 1.46it/s]
89%|########8 | 725/819 [08:11<01:04, 1.45it/s]
89%|########8 | 726/819 [08:12<01:05, 1.43it/s]
89%|########8 | 727/819 [08:12<01:01, 1.49it/s]
89%|########8 | 728/819 [08:13<01:02, 1.45it/s]
89%|########9 | 729/819 [08:14<01:02, 1.43it/s]
89%|########9 | 730/819 [08:15<01:02, 1.41it/s]
89%|########9 | 731/819 [08:15<01:00, 1.46it/s]
89%|########9 | 732/819 [08:16<00:59, 1.46it/s]
89%|########9 | 733/819 [08:17<00:59, 1.44it/s]
90%|########9 | 734/819 [08:17<00:56, 1.51it/s]
90%|########9 | 735/819 [08:18<00:53, 1.56it/s]
90%|########9 | 736/819 [08:18<00:54, 1.52it/s]
90%|########9 | 737/819 [08:19<00:55, 1.47it/s]
90%|######### | 738/819 [08:20<00:56, 1.44it/s]
90%|######### | 739/819 [08:20<00:51, 1.55it/s]
90%|######### | 740/819 [08:21<00:51, 1.54it/s]
90%|######### | 741/819 [08:22<00:51, 1.51it/s]
91%|######### | 742/819 [08:23<00:52, 1.47it/s]
91%|######### | 743/819 [08:23<00:52, 1.44it/s]
91%|######### | 744/819 [08:24<00:47, 1.58it/s]
91%|######### | 745/819 [08:24<00:46, 1.58it/s]
91%|#########1| 746/819 [08:25<00:48, 1.51it/s]
91%|#########1| 747/819 [08:26<00:46, 1.56it/s]
91%|#########1| 748/819 [08:26<00:47, 1.50it/s]
91%|#########1| 749/819 [08:27<00:47, 1.46it/s]
92%|#########1| 750/819 [08:28<00:44, 1.55it/s]
92%|#########1| 751/819 [08:28<00:44, 1.52it/s]
92%|#########1| 752/819 [08:29<00:41, 1.63it/s]
92%|#########1| 753/819 [08:30<00:42, 1.54it/s]
92%|#########2| 754/819 [08:30<00:42, 1.51it/s]
92%|#########2| 755/819 [08:31<00:42, 1.52it/s]
92%|#########2| 756/819 [08:32<00:42, 1.47it/s]
92%|#########2| 757/819 [08:32<00:42, 1.44it/s]
93%|#########2| 758/819 [08:33<00:42, 1.42it/s]
93%|#########2| 759/819 [08:34<00:40, 1.49it/s]
93%|#########2| 760/819 [08:34<00:38, 1.55it/s]
93%|#########2| 761/819 [08:35<00:38, 1.49it/s]
93%|#########3| 762/819 [08:36<00:37, 1.53it/s]
93%|#########3| 763/819 [08:36<00:34, 1.61it/s]
93%|#########3| 764/819 [08:37<00:32, 1.70it/s]
93%|#########3| 765/819 [08:37<00:33, 1.61it/s]
94%|#########3| 766/819 [08:38<00:33, 1.59it/s]
94%|#########3| 767/819 [08:39<00:34, 1.52it/s]
94%|#########3| 768/819 [08:39<00:31, 1.62it/s]
94%|#########3| 769/819 [08:40<00:32, 1.54it/s]
94%|#########4| 770/819 [08:41<00:31, 1.53it/s]
94%|#########4| 771/819 [08:41<00:32, 1.48it/s]
94%|#########4| 772/819 [08:42<00:32, 1.46it/s]
94%|#########4| 773/819 [08:43<00:31, 1.48it/s]
95%|#########4| 774/819 [08:44<00:31, 1.45it/s]
95%|#########4| 775/819 [08:44<00:29, 1.49it/s]
95%|#########4| 776/819 [08:45<00:28, 1.50it/s]
95%|#########4| 777/819 [08:46<00:28, 1.46it/s]
95%|#########4| 778/819 [08:46<00:27, 1.51it/s]
95%|#########5| 779/819 [08:47<00:26, 1.49it/s]
95%|#########5| 780/819 [08:48<00:26, 1.45it/s]
95%|#########5| 781/819 [08:48<00:25, 1.51it/s]
95%|#########5| 782/819 [08:49<00:25, 1.46it/s]
96%|#########5| 783/819 [08:50<00:24, 1.49it/s]
96%|#########5| 784/819 [08:50<00:22, 1.54it/s]
96%|#########5| 785/819 [08:51<00:21, 1.57it/s]
96%|#########5| 786/819 [08:51<00:20, 1.60it/s]
96%|#########6| 787/819 [08:52<00:20, 1.53it/s]
96%|#########6| 788/819 [08:53<00:19, 1.59it/s]
96%|#########6| 789/819 [08:53<00:19, 1.52it/s]
96%|#########6| 790/819 [08:54<00:19, 1.48it/s]
97%|#########6| 791/819 [08:55<00:18, 1.51it/s]
97%|#########6| 792/819 [08:55<00:17, 1.59it/s]
97%|#########6| 793/819 [08:56<00:17, 1.52it/s]
97%|#########6| 794/819 [08:57<00:16, 1.49it/s]
97%|#########7| 795/819 [08:57<00:15, 1.53it/s]
97%|#########7| 796/819 [08:58<00:14, 1.57it/s]
97%|#########7| 797/819 [08:59<00:14, 1.51it/s]
97%|#########7| 798/819 [08:59<00:14, 1.47it/s]
98%|#########7| 799/819 [09:00<00:13, 1.45it/s]
98%|#########7| 800/819 [09:01<00:13, 1.43it/s]
98%|#########7| 801/819 [09:01<00:12, 1.43it/s]
98%|#########7| 802/819 [09:02<00:11, 1.43it/s]
98%|#########8| 803/819 [09:03<00:10, 1.54it/s]
98%|#########8| 804/819 [09:03<00:10, 1.49it/s]
98%|#########8| 805/819 [09:04<00:09, 1.51it/s]
98%|#########8| 806/819 [09:05<00:08, 1.49it/s]
99%|#########8| 807/819 [09:05<00:07, 1.57it/s]
99%|#########8| 808/819 [09:06<00:07, 1.53it/s]
99%|#########8| 809/819 [09:07<00:06, 1.54it/s]
99%|#########8| 810/819 [09:07<00:06, 1.49it/s]
99%|#########9| 811/819 [09:08<00:05, 1.57it/s]
99%|#########9| 812/819 [09:08<00:04, 1.63it/s]
99%|#########9| 813/819 [09:09<00:03, 1.54it/s]
99%|#########9| 814/819 [09:10<00:03, 1.49it/s]
100%|#########9| 815/819 [09:11<00:02, 1.51it/s]
100%|#########9| 816/819 [09:11<00:02, 1.47it/s]
100%|#########9| 817/819 [09:12<00:01, 1.52it/s]
100%|#########9| 818/819 [09:13<00:00, 1.47it/s]
100%|##########| 819/819 [09:13<00:00, 1.90it/s]
{'train_runtime': 553.4024, 'train_samples_per_second': 189.271, 'train_steps_per_second': 1.48, 'train_loss': 0.023367454426338093, 'epoch': 1.0}
100%|##########| 819/819 [09:13<00:00, 1.90it/s]
100%|##########| 819/819 [09:13<00:00, 1.48it/s]
defaultdict(<class 'dict'>, {'bert.encoder.layer.1.output.dense': {'weight': {'scale': tensor(0.0054, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6852, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6852, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0588, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(7.4687, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-7.4687, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.4.output.dense': {'weight': {'scale': tensor(0.0050, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6319, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6319, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0541, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.8723, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-6.8723, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.4.attention.output.dense': {'weight': {'scale': tensor(0.0035, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4439, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4439, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0464, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.8866, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.8866, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.11.attention.output.dense': {'weight': {'scale': tensor(0.0053, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6738, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6738, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0597, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(7.5858, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-7.5858, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.0.attention.self.key': {'weight': {'scale': tensor(0.0051, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6526, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6526, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1399, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(17.7635, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-17.7635, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.6.attention.output.dense': {'weight': {'scale': tensor(0.0049, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6181, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6181, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0404, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.1277, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.1277, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.11.intermediate.dense': {'weight': {'scale': tensor(0.0056, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7168, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7168, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.2705, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(34.3541, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-34.3541, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.2.attention.self.query': {'weight': {'scale': tensor(0.0063, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.8044, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.8044, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1587, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(20.1498, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-20.1498, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.2.attention.self.key': {'weight': {'scale': tensor(0.0056, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7145, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7145, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1538, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.5304, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.5304, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.7.output.dense': {'weight': {'scale': tensor(0.0050, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6346, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6346, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0451, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.7215, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.7215, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.6.output.dense': {'weight': {'scale': tensor(0.0057, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7268, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7268, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0451, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.7270, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.7270, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.7.attention.output.dense': {'weight': {'scale': tensor(0.0042, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5304, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5304, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0357, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(4.5284, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-4.5284, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.7.attention.self.value': {'weight': {'scale': tensor(0.0034, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4330, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4330, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0857, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(10.8800, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-10.8800, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.6.attention.self.value': {'weight': {'scale': tensor(0.0042, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5325, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5325, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0944, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.9918, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-11.9918, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.3.output.dense': {'weight': {'scale': tensor(0.0051, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6445, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6445, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0609, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(7.7393, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-7.7393, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.8.attention.self.value': {'weight': {'scale': tensor(0.0032, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4110, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4110, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0967, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(12.2853, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-12.2853, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.10.intermediate.dense': {'weight': {'scale': tensor(0.0050, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6302, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6302, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1629, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(20.6859, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-20.6859, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.3.attention.self.value': {'weight': {'scale': tensor(0.0027, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3490, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.3490, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0816, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(10.3665, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-10.3665, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.8.intermediate.dense': {'weight': {'scale': tensor(0.0033, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4196, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4196, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.2929, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(37.1974, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-37.1974, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.8.attention.self.query': {'weight': {'scale': tensor(0.0059, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7503, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7503, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1686, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(21.4158, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-21.4158, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.6.intermediate.dense': {'weight': {'scale': tensor(0.0036, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4597, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4597, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3280, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(41.6567, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-41.6567, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.11.attention.self.key': {'weight': {'scale': tensor(0.0057, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7220, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7220, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1521, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.3130, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.3130, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.9.attention.self.value': {'weight': {'scale': tensor(0.0034, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4343, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4343, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0962, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(12.2156, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-12.2156, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.7.intermediate.dense': {'weight': {'scale': tensor(0.0043, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5433, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5433, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3017, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(38.3149, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-38.3149, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.2.output.dense': {'weight': {'scale': tensor(0.0039, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4978, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4978, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0559, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(7.1001, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-7.1001, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.9.attention.output.dense': {'weight': {'scale': tensor(0.0044, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5543, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5543, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0329, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(4.1778, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-4.1778, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.3.attention.output.dense': {'weight': {'scale': tensor(0.0035, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4417, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4417, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0314, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.9923, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-3.9923, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.5.output.dense': {'weight': {'scale': tensor(0.0039, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4904, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4904, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0508, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.4460, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-6.4460, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.4.intermediate.dense': {'weight': {'scale': tensor(0.0036, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4611, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4611, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3553, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(45.1181, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-45.1181, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.10.attention.self.query': {'weight': {'scale': tensor(0.0054, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6905, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6905, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1725, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(21.9104, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-21.9104, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.4.attention.self.query': {'weight': {'scale': tensor(0.0042, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5366, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5366, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1567, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.8983, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.8983, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.8.attention.output.dense': {'weight': {'scale': tensor(0.0039, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4892, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4892, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0336, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(4.2649, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-4.2649, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.11.output.dense': {'weight': {'scale': tensor(0.0057, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7215, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7215, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0499, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.3371, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-6.3371, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.6.attention.self.query': {'weight': {'scale': tensor(0.0054, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6914, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6914, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1747, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(22.1843, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-22.1843, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.11.attention.self.value': {'weight': {'scale': tensor(0.0045, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5677, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5677, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1048, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(13.3124, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-13.3124, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.9.attention.self.query': {'weight': {'scale': tensor(0.0050, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6292, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6292, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1702, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(21.6130, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-21.6130, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.5.attention.self.key': {'weight': {'scale': tensor(0.0037, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4738, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4738, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1789, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(22.7168, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-22.7168, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.4.attention.self.value': {'weight': {'scale': tensor(0.0033, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4173, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4173, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0951, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(12.0836, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-12.0836, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.9.output.dense': {'weight': {'scale': tensor(0.0052, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6581, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6581, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0465, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9058, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.9058, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.3.attention.self.query': {'weight': {'scale': tensor(0.0064, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.8093, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.8093, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1454, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(18.4688, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-18.4688, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.5.attention.self.query': {'weight': {'scale': tensor(0.0052, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6615, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6615, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1797, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(22.8165, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-22.8165, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.0.intermediate.dense': {'weight': {'scale': tensor(0.0046, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5833, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5833, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.4222, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(53.6154, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-53.6154, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.7.attention.self.key': {'weight': {'scale': tensor(0.0055, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6941, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6941, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1787, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(22.6901, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-22.6901, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.3.attention.self.key': {'weight': {'scale': tensor(0.0058, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7379, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7379, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1494, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(18.9734, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-18.9734, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.1.attention.self.key': {'weight': {'scale': tensor(0.0052, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6605, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6605, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1510, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.1781, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.1781, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.1.intermediate.dense': {'weight': {'scale': tensor(0.0043, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5488, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5488, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3736, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(47.4496, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-47.4496, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.1.attention.self.query': {'weight': {'scale': tensor(0.0061, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7766, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7766, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1507, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.1350, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.1350, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.10.output.dense': {'weight': {'scale': tensor(0.0049, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6203, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6203, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0832, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(10.5676, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-10.5676, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.6.attention.self.key': {'weight': {'scale': tensor(0.0051, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6519, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6519, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1748, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(22.2015, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-22.2015, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.10.attention.output.dense': {'weight': {'scale': tensor(0.0050, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6349, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6349, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0660, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(8.3815, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-8.3815, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.0.attention.self.value': {'weight': {'scale': tensor(0.0030, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3765, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.3765, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0629, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(7.9911, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-7.9911, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.5.attention.output.dense': {'weight': {'scale': tensor(0.0042, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5374, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5374, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0416, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.2791, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.2791, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.5.intermediate.dense': {'weight': {'scale': tensor(0.0040, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5036, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5036, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3207, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(40.7252, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-40.7252, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.9.intermediate.dense': {'weight': {'scale': tensor(0.0034, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4361, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4361, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3014, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(38.2716, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-38.2716, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.0.attention.output.dense': {'weight': {'scale': tensor(0.0044, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5631, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5631, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0296, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.7581, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-3.7581, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.10.attention.self.key': {'weight': {'scale': tensor(0.0072, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.9113, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.9113, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1685, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(21.3934, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-21.3934, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.4.attention.self.key': {'weight': {'scale': tensor(0.0061, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7721, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7721, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1541, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.5688, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.5688, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.3.intermediate.dense': {'weight': {'scale': tensor(0.0043, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5519, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5519, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3450, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(43.8199, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-43.8199, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.2.attention.self.value': {'weight': {'scale': tensor(0.0022, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.2770, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.2770, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0759, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(9.6415, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-9.6415, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.1.attention.self.value': {'weight': {'scale': tensor(0.0029, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3693, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.3693, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0780, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(9.9091, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-9.9091, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.2.intermediate.dense': {'weight': {'scale': tensor(0.0042, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5300, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5300, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.3788, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(48.1056, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-48.1056, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.1.attention.output.dense': {'weight': {'scale': tensor(0.0034, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4258, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4258, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0237, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.0116, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-3.0116, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.8.output.dense': {'weight': {'scale': tensor(0.0064, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.8166, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.8166, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0466, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9245, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-5.9245, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.7.attention.self.query': {'weight': {'scale': tensor(0.0059, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7470, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7470, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1779, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(22.5906, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-22.5906, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.0.output.dense': {'weight': {'scale': tensor(0.0068, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.8580, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.8580, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1156, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(14.6778, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-14.6778, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.10.attention.self.value': {'weight': {'scale': tensor(0.0046, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5852, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5852, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0945, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.9993, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-11.9993, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.2.attention.output.dense': {'weight': {'scale': tensor(0.0034, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.4334, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.4334, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0280, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.5577, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-3.5577, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.0.attention.self.query': {'weight': {'scale': tensor(0.0067, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.8450, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.8450, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1277, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(16.2190, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-16.2190, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.11.attention.self.query': {'weight': {'scale': tensor(0.0062, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7828, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7828, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1547, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(19.6471, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-19.6471, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.9.attention.self.key': {'weight': {'scale': tensor(0.0054, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.6884, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.6884, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1607, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(20.4064, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-20.4064, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.5.attention.self.value': {'weight': {'scale': tensor(0.0045, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.5669, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.5669, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.0918, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.6587, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-11.6587, grad_fn=<ToCopyBackward0>)}}, 'bert.encoder.layer.8.attention.self.key': {'weight': {'scale': tensor(0.0056, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.7112, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-0.7112, grad_fn=<ToCopyBackward0>)}, '_output_0': {'scale': tensor(0.1620, grad_fn=<ToCopyBackward0>), 'zero_point': tensor(0.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(20.5749, grad_fn=<ToCopyBackward0>), 'tracked_min': tensor(-20.5749, grad_fn=<ToCopyBackward0>)}}})
/anaconda/envs/bug_fix/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
0%| | 0/43 [00:00<?, ?it/s]
5%|4 | 2/43 [00:00<00:04, 8.90it/s]
7%|6 | 3/43 [00:00<00:06, 6.62it/s]
9%|9 | 4/43 [00:00<00:07, 5.46it/s]
12%|#1 | 5/43 [00:00<00:07, 5.34it/s]
14%|#3 | 6/43 [00:01<00:07, 4.96it/s]
16%|#6 | 7/43 [00:01<00:07, 4.69it/s]
19%|#8 | 8/43 [00:01<00:07, 4.59it/s]
21%|## | 9/43 [00:01<00:07, 4.47it/s]
23%|##3 | 10/43 [00:01<00:07, 4.67it/s]
26%|##5 | 11/43 [00:02<00:06, 4.69it/s]
28%|##7 | 12/43 [00:02<00:06, 4.70it/s]
30%|### | 13/43 [00:02<00:06, 4.70it/s]
33%|###2 | 14/43 [00:02<00:06, 4.72it/s]
35%|###4 | 15/43 [00:03<00:06, 4.56it/s]
37%|###7 | 16/43 [00:03<00:06, 4.46it/s]
40%|###9 | 17/43 [00:03<00:05, 4.39it/s]
42%|####1 | 18/43 [00:03<00:05, 4.38it/s]
44%|####4 | 19/43 [00:04<00:05, 4.33it/s]
47%|####6 | 20/43 [00:04<00:05, 4.30it/s]
49%|####8 | 21/43 [00:04<00:05, 4.28it/s]
51%|#####1 | 22/43 [00:04<00:04, 4.50it/s]
53%|#####3 | 23/43 [00:04<00:04, 4.58it/s]
56%|#####5 | 24/43 [00:05<00:04, 4.47it/s]
58%|#####8 | 25/43 [00:05<00:04, 4.39it/s]
60%|###### | 26/43 [00:05<00:03, 4.34it/s]
63%|######2 | 27/43 [00:05<00:03, 4.31it/s]
65%|######5 | 28/43 [00:06<00:03, 4.29it/s]
67%|######7 | 29/43 [00:06<00:03, 4.33it/s]
70%|######9 | 30/43 [00:06<00:02, 4.35it/s]
72%|#######2 | 31/43 [00:06<00:02, 4.45it/s]
74%|#######4 | 32/43 [00:06<00:02, 4.65it/s]
77%|#######6 | 33/43 [00:07<00:02, 4.69it/s]
79%|#######9 | 34/43 [00:07<00:01, 4.54it/s]
81%|########1 | 35/43 [00:07<00:01, 4.70it/s]
84%|########3 | 36/43 [00:07<00:01, 4.54it/s]
86%|########6 | 37/43 [00:08<00:01, 4.60it/s]
88%|########8 | 38/43 [00:08<00:01, 4.74it/s]
91%|######### | 39/43 [00:08<00:00, 4.58it/s]
93%|#########3| 40/43 [00:08<00:00, 4.52it/s]
95%|#########5| 41/43 [00:08<00:00, 4.50it/s]
98%|#########7| 42/43 [00:09<00:00, 4.48it/s]
100%|##########| 43/43 [00:09<00:00, 5.21it/s]
100%|##########| 43/43 [00:09<00:00, 4.60it/s]
Evaluation function returns a dict metric without key `default`,will return None as the model evaluation metric value.
(None, {'eval_loss': 0.5088651180267334, 'eval_accuracy': 0.9110378912685337, 'eval_default': 0.9110378912685337, 'eval_runtime': 10.0053, 'eval_samples_per_second': 546.013, 'eval_steps_per_second': 4.298})
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Found cached dataset glue (/home/xinzhang3/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
0%| | 0/3 [00:00<?, ?it/s]
100%|##########| 3/3 [00:00<00:00, 831.82it/s]
Loading cached processed dataset at /home/xinzhang3/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-d191c03109bc149f.arrow
Loading cached processed dataset at /home/xinzhang3/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-dde15e00c1ce7271.arrow
0%| | 0/43 [00:00<?, ?it/s]
5%|4 | 2/43 [00:00<00:03, 11.33it/s]
9%|9 | 4/43 [00:00<00:05, 7.21it/s]
12%|#1 | 5/43 [00:00<00:05, 6.96it/s]
14%|#3 | 6/43 [00:00<00:05, 6.40it/s]
16%|#6 | 7/43 [00:01<00:05, 6.02it/s]
19%|#8 | 8/43 [00:01<00:05, 5.86it/s]
21%|## | 9/43 [00:01<00:05, 5.69it/s]
23%|##3 | 10/43 [00:01<00:05, 5.95it/s]
26%|##5 | 11/43 [00:01<00:05, 5.97it/s]
28%|##7 | 12/43 [00:01<00:05, 5.96it/s]
30%|### | 13/43 [00:02<00:05, 5.97it/s]
33%|###2 | 14/43 [00:02<00:04, 6.00it/s]
35%|###4 | 15/43 [00:02<00:04, 5.79it/s]
37%|###7 | 16/43 [00:02<00:04, 5.66it/s]
40%|###9 | 17/43 [00:02<00:04, 5.56it/s]
42%|####1 | 18/43 [00:02<00:04, 5.53it/s]
44%|####4 | 19/43 [00:03<00:04, 5.46it/s]
47%|####6 | 20/43 [00:03<00:04, 5.43it/s]
49%|####8 | 21/43 [00:03<00:04, 5.42it/s]
51%|#####1 | 22/43 [00:03<00:03, 5.71it/s]
53%|#####3 | 23/43 [00:03<00:03, 5.81it/s]
56%|#####5 | 24/43 [00:04<00:03, 5.65it/s]
58%|#####8 | 25/43 [00:04<00:03, 5.55it/s]
60%|###### | 26/43 [00:04<00:03, 5.49it/s]
63%|######2 | 27/43 [00:04<00:02, 5.46it/s]
65%|######5 | 28/43 [00:04<00:02, 5.43it/s]
67%|######7 | 29/43 [00:04<00:02, 5.46it/s]
70%|######9 | 30/43 [00:05<00:02, 5.47it/s]
72%|#######2 | 31/43 [00:05<00:02, 5.61it/s]
74%|#######4 | 32/43 [00:05<00:01, 5.87it/s]
77%|#######6 | 33/43 [00:05<00:01, 5.94it/s]
79%|#######9 | 34/43 [00:05<00:01, 5.75it/s]
81%|########1 | 35/43 [00:05<00:01, 5.96it/s]
84%|########3 | 36/43 [00:06<00:01, 5.75it/s]
86%|########6 | 37/43 [00:06<00:01, 5.82it/s]
88%|########8 | 38/43 [00:06<00:00, 6.00it/s]
91%|######### | 39/43 [00:06<00:00, 5.80it/s]
93%|#########3| 40/43 [00:06<00:00, 5.75it/s]
95%|#########5| 41/43 [00:07<00:00, 5.72it/s]
98%|#########7| 42/43 [00:07<00:00, 5.66it/s]
100%|##########| 43/43 [00:07<00:00, 5.79it/s]
Evaluate metrics={'eval_loss': 0.4634885787963867, 'eval_accuracy': 0.9167124290682775, 'eval_default': 0.9167124290682775, 'eval_runtime': 8.0757, 'eval_samples_per_second': 676.471, 'eval_steps_per_second': 5.325}
Result¶
We experimented with PTQ, LSQ, and QAT algorithms on the MNLI, QNLI, QQP and MRPC datasets respectively on an A100, and the experimental results are as follows.
Quant Method |
MNLI |
QNLI |
MRPC |
QQP |
---|---|---|---|---|
Metrics |
ACC |
ACC |
F1 |
F1 |
Baseline |
85.04% |
91.67% |
87.69% |
88.42% |
LSQ |
84.34% |
91.69% |
89.9% |
88.16% |
QAT |
83.68% |
90.52% |
89.16% |
87.62% |
PTQ |
76.37% |
67.67% |
74.79% |
84.42% |
Total running time of the script: ( 9 minutes 40.647 seconds)