1 min readMay 31, 2020
For vocab.bpe, I looked at the implementation https://github.com/rsennrich/subword-nmt/blob/master/subword_nmt/learn_bpe.py so maybe that will help! For encoder.json, go to encoder.py's encode function. There, you can input the text and it'll encode it for you. If you are working with languages other than english, you might need to modify self.pat though