This guide helps you setting up the required enviroment
Step 1: Install Anaconda
Anaconda is already installed in the cluster. You should install Anaconda in your laptop for programming your models and minimal test. For large-scale testing and training you can use the cluster.
Simply follow the instructions according to your case:
All of the following steps requires you to open the command line interface (CLI) and execute the outlined instructions. To open the terminal to access the command line interface as follows:
Anaconda allows the creation of Python virtual environments. The main purpose of Python virtual environments is to create an isolated environment for Python projects. This means that each project can have its own dependencies, regardless of what dependencies every other project has.
In Anaconda, virtual environments are referred as conda environments.
PS: Great cheat sheet, covering possible operations for manipulating conda environments and packages: Cheat Sheet.
Step 2: Creating conda environments
Run the following command to create a conda env:
$ conda create -n myenv python=3.9 ipykernel numpy scipy scikit-learn pandas tqdm jupyter matplotlib gensim flask flask_cors ipympl -c defaults -c conda-forge
Note that we specified python=3.9 but other python versions are availble to install.
Step 3: Activate conda environments
Since you may have multiple conda environemnts, you need to activate the environment in your current shell/terminal:
$ conda activate myenv
The easiest and cleanest way to install PyTorch is through Anaconda. Therefore, first you should create a conda environment (check the latest version of Python supported by PyTorch) and activate it.
Step 1: Install PyTorch
Go to the PyTorch website and scroll-down to find some sliders that can be used to generated the conda install command. Choose Linux, Conda and choose the latest CUDA release (11.3 at the moment of writing). Then copy and execute the command.
Installing on Linux/Windows without GPU:
$ conda install pytorch torchvision torchaudio cpuonly -c pytorch
Installing on Linux/Windows with GPU:
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
NOTE: If it didn’t work, you should check PyTorch guide https://pytorch.org/get-started/locally/ or if you do not have an NVIDIA GPU, install the CPU version instead.
Installing on Mac with CPU-version:
$ conda install pytorch torchvision torchaudio -c pytorch
Step 2: Install HuggingFace
You need to install the following libraries:
$ pip install transformers $ pip install accelerate $ pip install ipywidgets $ pip install bertviz
Step 3: Install Spacy
$ conda install spacy $ python -m spacy download en_core_web_sm
Step 4: Install Opensearch client
$ pip install opensearch-py
Step 5: Tokenizers
To install the BPE and WPE tokenizers run this command:
$ pip install tokenizers
For the BERT tokenizers you need to download one these files and store it in your working directory:
'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt" 'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt" 'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt" 'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt" 'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt" 'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt" 'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt" 'bert-base-german-cased': "https://int-deepset-models-bert.s3.eu-central-1.amazonaws.com/pytorch/bert-base-german-cased-vocab.txt" 'bert-large-uncased-whole-word-masking': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-vocab.txt" 'bert-large-cased-whole-word-masking': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-whole-word-masking-vocab.txt" 'bert-large-uncased-whole-word-masking-finetuned-squad': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-finetuned-squad-vocab.txt" 'bert-large-cased-whole-word-masking-finetuned-squad': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-whole-word-masking-finetuned-squad-vocab.txt" 'bert-base-cased-finetuned-mrpc': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-mrpc-vocab.txt"
Step 1: JupyterHub
Let’s say you have an Anaconda environment called
myenv. To run things on JupyterHub, you need to install the ipykernel from
$ conda activate myenv
$ python -m ipykernel install --user --name myenv --display-name "myenv"
On the command above, you should change the name and display name to the name of your environment.
This creates a new IPython kernel for your env and stores a kernel spec file in:
Step 2: Start Jupyter Lab locally
With your environment activated, start jupyter lab in a terminal with the command:
For more information about Jupyter Lab visit this link.
Step 3: Check python version from inside Jupyter/JupyterHub
Check which version is running on Jupyter notebook
from platform import python_version print(python_version())
Check version inside your Python program
import sys print(sys.version)
Check version in command line or shell
Once you have created your Python Environment as described above, you can set up your favourite development environment. PyCharm and VSCode are popular choices offering different advantages. VSCode is lighter while PyCharm is better for teams. Use your favourite one:
The main advices to ensure everything runs smoothly are the following:
$ conda create -n pyserini python=3.8 ipykernel cython numpy scipy scikit-learn pandas tqdm tensorflow $ conda activate pyserini $ conda install faiss-cpu -c pytorch $ pip install pyserini $ python -m ipykernel install --user --name pyserini --display-name "Python (pyserini)"
Enter this in the first cell of your notebook