SenticGCN Trained M...

Clear all

# Question SenticGCN Trained Model not Appearing

7 Posts
2 Users
0 Likes
101 Views
Posts: 4
Member
Topic starter
(@yen)
Active Member
Joined: 2 months ago

Hi, I've been training my SenticGCN model, and while my embed_models and tokenizers folders are populated, it seems like my models folder has not been populated at all...

Here is my sentic_gcn_config.json

From this, my model is supposed to be saved and updated into ./models/senticgcn/

However, when I enter that directory, it's empty, with no models saved at all

This image, for example, shows that the model is supposed to be saved to the filepath, but when I go to the filepath, it's empty.

I've run the code for a few hours already, and the folder is still not populated. Is there something I'm doing wrong, or something I'm not aware of? 🙁

Topic Tags
6 Replies
Posts: 35
AISG Staff
(@raymond_aisg)
Eminent Member
Joined: 6 months ago

Hi,

From your screenshot, I'm assuming that you are using Windows to run the training script.

Could you kindly try replacing the save_model_path config with an absolute path to see if the weights could be saved? (e.g. C:\Users\user\Desktop\AISG\models\senticgcn\).

Please also note that the default config is based on Unix which uses backslash whereas Windows uses forward slash.

Hope this helps.

Posts: 4
Member
Topic starter
(@yen)
Active Member
Joined: 2 months ago

Hi, thank you for your suggestion! I tried using an absolute file path, but the model still isn't being saved. The tokenizer, and the embedding model, however, are being saved (the folders were updated); it's just the model that isn't. Has anyone else encountered this problem?

AISG Staff
(@raymond_aisg)
Joined: 6 months ago

Eminent Member
Posts: 35

@yen Hi,

Thanks for trying out the suggestions.

The tokenizer and embedding models are pre-trained models that are directly downloaded from the Huggingface hub and are part of the setup required for training.

Posts: 4
Member
Topic starter
(@yen)
Active Member
Joined: 2 months ago

Hi, I spent a little bit of time going through the module, and I think there's probably nothing wrong. Here's my take (see the code explanation below):

Basically, the model apparently does NOT save the code in your folder UNTIL the end of the run, unlike for tokenizer and embed_model, which are saved at the start.

So, in conclusion, my laptop is probably just slow (and I need to rerun the training :/)

Hope this helps anyone else who may be facing this problem (and who can't sleep at 3am at night wondering whether your model will be saved after 6 hours of training)!

# How the model is saved eventually
def _save_model(self):
# Other stuff
self.model.save_pretrained(self.config.save_model_path)

# self._save_model() is called in :

class SenticGCNTrainer:
def train(self):
repeat_result = self._train()
# Other important stuff
self._save_model()  # Only saved eventually, after full training is called
# Other important stuff

def _train(self,
BucketIterator],
BucketIterator]) -> Dict[str,
Dict[str,
Union[int,
float]]]:
# Setting up variables
for i in range(self.config.repeats):
repeat_tmpdir = self.temp_dir.joinpath(f"repeat{i + 1}")  # This is crucial, as it is where the models are actually saved before the code is complete
self._reset_params()
# Calls self._train_loop. Critically, with directory as repeat_tmpdir
max_val_acc, max_val_f1, max_val_epoch = self._train_loop(
)
# Record repeat run results
# Overwrite global stats
return repeat_result

# Setting up of some config variables
for epoch in range(self.config.epochs):
global_step += 1
self.model.train()  # To check what this is

# Other config steps to get stuff
if val_acc > max_val_acc:
# Saving variables
self.model.save_pretrained(tmpdir)  # Saved to tmpdir, NOT save_model_path

# Code for early stopping
return max_val_acc, max_val_f1, max_val_epoch

# The big question now, is what is repeat_tmpdir?
with tempfile.TemporaryDirectory() as tmpdir:
self.temp_dir = pathlib.Path(tmpdir)

"""
Doing a little more tracing...
prefix = "tmp"
suffix = ""
dir:
"""

# Calls a dir from here:
def _candidate_tempdir_list():
"""Generate a list of candidate temporary directories which
_get_default_tempdir will try."""

dirlist = []

# First, try the environment.
for envname in 'TMPDIR', 'TEMP', 'TMP':
dirname = _os.getenv(envname)
if dirname: dirlist.append(dirname)

# Failing that, try OS-specific locations.
if _os.name == 'nt':
dirlist.extend([_os.path.expanduser(r'~\AppData\Local\Temp'),
_os.path.expandvars(r'%SYSTEMROOT%\Temp'),
r'c:\temp', r'c:\tmp', r'\temp', r'\tmp'])
else:
dirlist.extend(['/tmp', '/var/tmp', '/usr/tmp'])

# As a last resort, the current directory.
try:
dirlist.append(_os.getcwd())
except (AttributeError, OSError):
dirlist.append(_os.curdir)

return dirlist

"""After getting the dirlist, it creates a binary file by getting the absolute path of the directory and writing into a binary file."""


AISG Staff
(@raymond_aisg)
Joined: 6 months ago

Eminent Member
Posts: 35

@yen Hi,

Your observation is correct, as stated in the paper, the full train loop is run 10 times and the best model out of the 10 runs is saved at the end. Intermediate model weights are saved in a temp folder between train runs as indicated here,

For a quick test run to check if it's possible to save the final model, first reduce the number of repeats to 1 for a single run,

Next reduce the epoch to a small figure like 1 or 2 here,

Run the training script again and you should be able to quickly observe if the model will save to the folder indicated in the save_model_path config.

Lastly, for the quick test above, could you try running the training script in debug mode with a breakpoint at the following line and observe if the script will trigger,

As indicated in the model card, our training on an A100 40GB GPU with the SemEval14/15/16 datasets takes only around an hour. If you are training with CPU, please ensure that you have enough system RAM and hard disk resources available throughout the training duration.

Hope this helps.

Posts: 4
Member
Topic starter
(@yen)
Active Member
Joined: 2 months ago

(To AISG staff, please confirm if my suspicions are correct, thanks!)

Share: