Skip to content Skip to sidebar Skip to footer

Autotokenizer.from_pretrained Fails To Load Locally Saved Pretrained Tokenizer (pytorch)

I am new to PyTorch and recently, I have been trying to work with Transformers. I am using pretrained tokenizers provided by HuggingFace. I am successful in downloading and runnin

Solution 1:

I see several issues in your code which I listed below:

  1. distilroberta-tokenizer is a directory containing the vocab config, etc files. Please make sure to create this dir first.

  2. Using AutoTokenizer works if this dir contains config.json and NOT tokenizer_config.json. So, please rename this file.

I modified your code below and it works.

dir_name = "distilroberta-tokenizer"ifos.path.isdir(dir_name) == False:
    os.mkdir(dir_name)  

tokenizer.save_pretrained(dir_name)

#Rename config file now

#tmp = AutoTokenizer.from_pretrained(dir_name)   

I hope this helps!

Thanks!

Solution 2:

There is currently an issue under investigation which only affects the AutoTokenizers but not the underlying tokenizers like (RobertaTokenizer). For example the following should work:

from transformers import RobertaTokenizer

tokenizer = RobertaTokenizer.from_pretrained('YOURPATH')

To work with the AutoTokenizer you also need to save the config to load it offline:

from transformers import AutoTokenizer, AutoConfig

tokenizer = AutoTokenizer.from_pretrained('distilroberta-base')
config = AutoConfig.from_pretrained('distilroberta-base')

tokenizer.save_pretrained('YOURPATH')
config.save_pretrained('YOURPATH')

tokenizer = AutoTokenizer.from_pretrained('YOURPATH')

I recommend to either use a different path for the tokenizers and the model or to keep the config.json of your model because some modifications you apply to your model will be stored in the config.json which is created during model.save_pretrained() and will be overwritten when you save the tokenizer as described above after your model (i.e. you won't be able to load your modified model with tokenizer config.json).

Post a Comment for "Autotokenizer.from_pretrained Fails To Load Locally Saved Pretrained Tokenizer (pytorch)"