Hi, I've been training my SenticGCN model, and while my embed_models and tokenizers folders are populated, it seems like my models folder has not been populated at all...
Here is my sentic_gcn_config.json
From this, my model is supposed to be saved and updated into ./models/senticgcn/
However, when I enter that directory, it's empty, with no models saved at all
This image, for example, shows that the model is supposed to be saved to the filepath, but when I go to the filepath, it's empty.
I've run the code for a few hours already, and the folder is still not populated. Is there something I'm doing wrong, or something I'm not aware of? 🙁
Hi,
From your screenshot, I'm assuming that you are using Windows to run the training script.
Could you kindly try replacing the `save_model_path` config with an absolute path to see if the weights could be saved? (e.g. C:\Users\user\Desktop\AISG\models\senticgcn\).
Please also note that the default config is based on Unix which uses backslash whereas Windows uses forward slash.
https://jrogel.com/backslashes-v-forward-slashes-windows-linux-and-mac/
Hope this helps.
Hi, thank you for your suggestion! I tried using an absolute file path, but the model still isn't being saved. The tokenizer, and the embedding model, however, are being saved (the folders were updated); it's just the model that isn't. Has anyone else encountered this problem?
Hi, I spent a little bit of time going through the module, and I think there's probably nothing wrong. Here's my take (see the code explanation below):
Basically, the model apparently does NOT save the code in your folder UNTIL the end of the run, unlike for tokenizer and embed_model, which are saved at the start.
So, in conclusion, my laptop is probably just slow (and I need to rerun the training :/)
Hope this helps anyone else who may be facing this problem (and who can't sleep at 3am at night wondering whether your model will be saved after 6 hours of training)!
# How the model is saved eventually def _save_model(self): # Other stuff self.model.save_pretrained(self.config.save_model_path) # self._save_model() is called in : class SenticGCNTrainer: def train(self): repeat_result = self._train() # Other important stuff self._save_model() # Only saved eventually, after full training is called # Other important stuff def _train(self, train_dataloader: Union[DataLoader, BucketIterator], val_dataloader: Union[DataLoader, BucketIterator]) -> Dict[str, Dict[str, Union[int, float]]]: # Setting up variables for i in range(self.config.repeats): repeat_tmpdir = self.temp_dir.joinpath(f"repeat{i + 1}") # This is crucial, as it is where the models are actually saved before the code is complete self._reset_params() # Calls self._train_loop. Critically, with directory as repeat_tmpdir max_val_acc, max_val_f1, max_val_epoch = self._train_loop( criterion, optimizer, train_dataloader, val_dataloader, repeat_tmpdir ) # Record repeat run results # Overwrite global stats return repeat_result def _train_loop((self, criterion, optimizer, train_dataloader, val_dataloader, tmpdir: pathlib.Path) -> pathlib.Path: # Setting up of some config variables for epoch in range(self.config.epochs): global_step += 1 self.model.train() # To check what this is # Other config steps to get stuff if val_acc > max_val_acc: # Saving variables self.model.save_pretrained(tmpdir) # Saved to tmpdir, NOT save_model_path # Code for early stopping return max_val_acc, max_val_f1, max_val_epoch # The big question now, is what is repeat_tmpdir? with tempfile.TemporaryDirectory() as tmpdir: self.temp_dir = pathlib.Path(tmpdir) """ Doing a little more tracing... prefix = "tmp" suffix = "" dir: """ # Calls a dir from here: def _candidate_tempdir_list(): """Generate a list of candidate temporary directories which _get_default_tempdir will try.""" dirlist = [] # First, try the environment. for envname in 'TMPDIR', 'TEMP', 'TMP': dirname = _os.getenv(envname) if dirname: dirlist.append(dirname) # Failing that, try OS-specific locations. if _os.name == 'nt': dirlist.extend([_os.path.expanduser(r'~\AppData\Local\Temp'), _os.path.expandvars(r'%SYSTEMROOT%\Temp'), r'c:\temp', r'c:\tmp', r'\temp', r'\tmp']) else: dirlist.extend(['/tmp', '/var/tmp', '/usr/tmp']) # As a last resort, the current directory. try: dirlist.append(_os.getcwd()) except (AttributeError, OSError): dirlist.append(_os.curdir) return dirlist """After getting the dirlist, it creates a binary file by getting the absolute path of the directory and writing into a binary file."""
(To AISG staff, please confirm if my suspicions are correct, thanks!)