Hello, may I know how i can combine the LSR relation extraction model together with the allenNLP coreference resolution and NER model? I am having difficulty understanding the format of the instance to be passed into the LSR preprocessor and confused about where the other 2 models come into the pipeline to output the final list.
May I also know if and how I will have to utilise the DocRED json files to train the model/ output the data? Any help would be greatly appreciated. Thank you in advance.
Regarding training the model, you can pass the json files to the trainer via the `--train_file` args and specify the `--output_dir` args to indicate the folder for the training metrics as well as the saved trained weights.
@raymond_aisg Hi raymond, thank you for the prompt and helpful response. However, Im sorry to say that I am still confused about the input instance dictionary and exactly what I need to pass into the preprocessor of the LSR model. Would I even need to use the 2 allennlp models as mentioned in the LSR model card? Or are those the models I need to use in order to produce the instance input needed for the LSR preprocessor? Thank you very much.
As mentioned in the model card, the LSR model by itself does not include the capabilities for performing coreference resolution and named entity recognition required for input to the LSR's preprocessor. As such, in order to build a functioning demo, we utilize the pre-trained NER and Coref models from Allennlp to generate predictions required for the DocRED input format.
Please note that it is also possible to utilize other Coref and NER models as long as their prediction output is post-processed to meet the DocRED format.
For an example of how the Allennlp Coref and NER predictions are utilized, we have a Text2DocRED pipeline initialize the Coref and NER model from Allennlp, and their inference call is performed at the `predict` API endpoint here,
The `preprocess` method of the Text2DocRED pipeline parses the prediction output of both the Coref and NER models to fit the DocRED format before returning it back to the prediction endpoint. If a different Coref and NER model is used, both prediction output parsers will need to be updated.
Thank you very much for you response. It has been extremely beneficial in helping me implement the AI brick in my project.
Moving from that however, I have found an optimisation problem related to my use of the text to docred pipeline, specifically in passing in document data through both the allenNLP models in VSCode. When I ran the text strings input in both the sgnlp and allennlp docs website, the output was returned to me almost instantly. However, when I ran the predict function of the allennlp coref and ner predictors in my python file in vscode, it will take up to 20seconds to give me the desired output. Below are the messages printed in the terminal as i ran the program:
error loading _jsonnet (this is expected on Windows), treating C:\Users\Admin\AppData\Local\Temp\tmpz3_ikzx5\config.json as plain json Some weights of BertModel were not initialized from the model checkpoint at SpanBERT/spanbert-large-cased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
May I know how I can optimise the processes of this file such that i can run the alllennlp model prediction and hence the text to docred pipeline up to speed as the one in the demo web app? Thank you very much.
Judging from the inference speed you've mentioned, I'm assuming you are performing the inferences with CPU on a laptop or desktop meaning the models are loaded onto the system RAM. The performance you are experiencing is typical for consumer-grade computers.
For SGnlp demo, the models are deployed on the cloud using server-grade hardware, usually with multiple CPU cores dedicated solely for performing inference. I can't say for sure what hardware Allennlp demo uses to deploy their model, but it is typical (and recommended) to use GPU to perform inference for applications that requires multiple concurrent inferences (which is a must given the amount of traffic Allennlp demo webpages received).
To speed up the inference speed for a Deep Learning model, you would require GPU hardware as GPUs are designed for parallel processing and optimized for vector operations, thus being able to significantly speed up Deep Learning model's inference and training speed.
A simple explanation on why GPU is needed for Deep Learning,
If you wish to optimize the models for a little bit more inference speed performance, you can also try looking at techniques like Dynamic Quantization or Pruning. Please note that these techniques usually come with the tradeoff of lower model accuracy for higher latency reductions and may require re-training the model.
Thank you for the very detailed explanation in addressing my problem. I have checked my PC and it actually contains both CPU and a nvidia rtx 3060ti gpu. May I know how to check which processing unit on my computer is the model using to perform inferences? And if it is using my cpu, how I would be able to use my gpu to perform inferences instead? I am running my code on vscode instead of jupyter notebook or google colab if it helps. Thank you.
After this step, you should be able to run the `nvidia-smi` command on your terminal to view your GPU.
For example, the following shows the output of the `nvidia-smi` command showing an Nvidia Tesla V100 GPU setup with CUDA version 11.7.
2. Update Pytorch version according to your CUDA version
Go to https://pytorch.org/ and select the config which correspond to the setup on your PC and install the correct Pytorch package
3. Test that the PyTorch on your environment is able to detect your GPU via the following example, if the CUDA installation is setup properly and the PyTorch version is installed correctly, you should see the `is_available()` method returns True and the `device_count()` method returns 1.
```
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
```
4. Since the SGnlp package by default does not utilize GPU, there is a few places which needs to be updated.
First, for the Coref and NER models, you need to specify the CUDA device when the models are initalized.
According to Allennlp documentation, you can define the CUDA device via the `cuda_device` argument for the `from_path` method. If you have only one GPU on your PC, you should set this value to 0, by default this value is -1 which indicate CPU should be used.
Next is the LSR model, since the LSR model utilize the HuggingFace Transformers package, the model object is essentially a PyTorch module, hence you can simply use the `.to("cuda")` method to move the model onto the GPU.
Example from the model usage code,
```
model = LsrModel.from_pretrained('https://storage.googleapis.com/sgnlp/models/lsr/v2/pytorch_model.bin', config=config).to("cuda")
```
Try running the inference example code again to check for the inference speed up.
P.S.: Moving all 3 models to the GPU requires all 3 models to reside on the GPU vRAM, 3060Ti only has 8 GB of vRAM which might not be sufficient to load all 3 models. If you encounter out of memory issues, try loading only 1 or 2 models on the GPU to see if they fit. You can check the model memory consumption using the same `nvidia-smi` command shown above to see how much memory is consumed on the GPU.
It turns out the file took so long run mainly because of the loading of the allennlp models, although i have changed the models to run on my gpu already. Now the inference takes only a few seconds!
Thank you so much for all the detailed replies and the wonderful help that you have given me for the past few days. I truly do appreciate the time and effort you took to assist me with my understanding.
@raymond_aisg Hi raymond, sorry to bother you again. The lsr model worked great on my desktop with a gpu!
However, I wanted to make my ML app portable and possible to be run on other machines, such as my laptop which does not have a cuda compatible gpu. Hence, I reverted my settings from the models as well as the lsr preprocessor from using cuda gpu to using cpu.
However, It still showed me the runtime error saying that the tensors in the tensor_doc are of 2 devices, gpu and cpu, as shown below.
May I know how it would be possible to rectify this error in order to get the lsr model running on only cpu again? Thank you!
Unfortunately, just from the code from the screenshot I am unable to exactly pinpoint where the exact affected tensor/module may be.
But there are 2 things you can try,
First, based on your first screenshot line 84, i'm assuming you are using the TextInputToDocredPipeline class, please check that you are not casting the NER and COREF model to a CUDA device,
Next, if the above does not solve your issue, you can try casting the LSR model to CPU,
```
model = LsrModel.from_pretrained('https://storage.googleapis.com/sgnlp/models/lsr/v2/pytorch_model.bin', config=config).to("cpu")
```
Lastly, if the above still doesn't work, you might wish to debug thru your solution and for each tensor/module, use the PyTorch `is_cuda` method to determine which tensor/module is being sent to the CUDA device.
Yes i have ensured that allennlp models, as well as the lsr model are not casted to a cuda device. I believe the problem came from the tensors in tensor_doc dictionary, as shown below
I have run the is_cuda method through all the values and the values which are in tensors all returned false
To make it clear, I believe that the error may have arised from tensor_doc variable, as the runtime error traced back to output = model(**tensor_doc) as shown below:
However, even when the tensors dont seem to be on cuda, the same runtime error appeared. May I know how it would be possible to circumvent this problem? Thank you!
Hmmm this is quite tricky as the default implementation should already work with CPU.
One last thing to check, could you kindly try to see if uninstalling your PyTorch installation and reinstalling it with a version that is specific to support CPU will help?
I uninstalled the current pytorch and torchvision modules and installed the ones with +cpu at the back. However, the same runtime error still appeared.
May I check with you if the same error also appears when running the code on your PC which does not have a GPU?
As the original implementation was already working with CPU, could you also try to check out the original code in a separate location and see if the code still works?
As I do not have full visibility with the full changes, it is kinda hard for me to debug, would you happen to have a repo with your changes for me to have a look at and see if I can replicate the error?
Sorry for the late reply as I was busy with NS for the past few days.
I have tried to use my laptop to run the code but another problem seems to appear with the srsly package derived from allennlp, which happens before the original torch problem.
The url links to my client and server github repositories are listed below:
Please do note that the uploaded frontend repository should be working as intended, and the backend repo is the functional version which cuda is enabled for torch and working only on my desktop with a cuda compatible gpu. The problem originally arose from the '/extract' route where the preprocessor and lsr models are.
Thanks for sharing your code. I'm only able to investigate the LSR portion of the code base as I'm not familiar with the other portion of the solution.
I've copied the code from the `/extract` route function and tried replicating the issue on my local environment. I've encountered an issue when trying to install the dependencies from the `requirements.txt`, but I suspect this is simply because I'm running the code from MacOS and I was able to bypass the error by removing the `+cpu` suffix.
Once I've fixed the environment, I was able to run the code `/extract` route function without issues. I've attached the code I've copied out for your reference.
I do not think there is anything out of place with your implementation, which leaves only the dependencies which might have issues. May I suggest that you create a new virtual environment and reinstall all your required dependencies prior to re-running your solution to see if it fixes your issue.
Thank you so much for taking time out of your weekend to help me with my issue. Im elated to say that creating a new virtual environment has indeed worked for me! I really can't thank you enough. If you are free and its not too much trouble, is it ok if you can explain why installing the existing dependencies in a new virtual environment works so that i may use this knowledge in other projects? I really appreciate your help.
There are many, many, possible reasons why setting up a new environment works. For example, there could be some poorly implemented packages that cache certain configs during the first run, then when the dependent package is updated, the cache wasn't updated resulting in unexpected errors. (my guess is this is most likely what happened in this case as the previous PyTorch installation expects CUDA support, overwriting the PyTorch version to a non-CUDA version results in some mismatch configs.)
Also, dependency resolution is a very hard problem to solve and the complexity goes up when a solution requires a lot of 3rd party packages. So when a package is updated in a virtual environment, the package management software (e.g. PIP, Conda) needs to reevaluate every single installed package dependency to ensure that their respective dependencies are also met (NP-Hard problem). This process tends to be problematic resulting in some dependency conflict that is unsolvable, the majority of the time the package management will flag it out when they encountered such issues, but quite often this kind of issue does not surface.
In my experience, re-install dependencies in a new virtual environment is the equivalent of 'turning the PC off and on again' to resolve unexplainable issues when developing in Python.