Compare commits

...

2 commits

Author SHA1 Message Date
array-in-a-matrix 99b70ab839 export chat history using element 2022-08-16 01:27:53 -04:00
array-in-a-matrix 2fcda33cf8 created 2 functions to gen text and train 2022-08-16 01:25:38 -04:00
2 changed files with 29 additions and 1 deletions

View file

@ -17,6 +17,8 @@ Client has started!
...
```
If you do not want to wait until the bot creates its own dataset from new messages, you can export chat history easily using [Element](https://element.io/blog/element-1-9-1-export-is-finally-here/). In this case, you will need to manually remove the time stamps from the text file.
## Setup
The project is split into 2 parts `index.js` and `textgen.py`. The `index.js` file contains the code that interacts with the user on Matrix and sends text generated by the `textgen.py` file.
@ -44,7 +46,7 @@ Before a bot can be used the fields in the `config.json` file must be populated
► user* ⇢ Account's User ID.
► file ⇢ Path of file used for training the AI.
► file ⇢ Path of file used for training the AI (.txt file only).
► prefix ⇢ Bot listens to commands that start with this prefix.

View file

@ -0,0 +1,26 @@
from aitextgen.TokenDataset import TokenDataset
from aitextgen.tokenizers import train_tokenizer
from aitextgen.utils import GPT2ConfigCPU
from aitextgen import aitextgen
import json
with open('config.json', 'r') as file:
json_object = json.load(file)
file_name = json_object['file']
def generate_message():
ai = aitextgen(model_folder="trained_model",
tokenizer_file="aitextgen.tokenizer.json")
ai.generate()
def train_ai():
train_tokenizer(file_name)
tokenizer_file = "aitextgen.tokenizer.json"
config = GPT2ConfigCPU()
ai = aitextgen(tokenizer_file=tokenizer_file, config=config)
data = TokenDataset(file_name, tokenizer_file=tokenizer_file, block_size=64)
ai.train(data, batch_size=8, num_steps=50000, generate_every=5000, save_every=5000)
print("AI has been trained!")
print(generate_message())