# Environment Setup

## Python Installation - Local Setup

First, you need to setup Python on your laptop/computer. For that, go to [Python Environment Setup](https://wiki.novasearch.org/wiki/lab_setup) and follow the steps to setup your Python environment

In [None]:
# Check your Python Version
from platform import python_version
print(python_version())

# Python Basics

This tutorial will cover the basics. Please find below a list of more comprehensive references to learn Python.

Python References: 

 * Official Python 3.9 Reference (https://docs.python.org/3.9/index.html)
 * [Python Programming And Numerical Methods: A Guide For Engineers And Scientists](https://pythonnumericalmethods.berkeley.edu/notebooks/Index.html)

In [None]:
# Test your environment by checking if you can import the following libraries

import numpy as np
import matplotlib
import pandas as pd

## Python Types

Python is a Dynamically Typed Language. This means that you do not need to explicitly write the type of your variables, the Python interpreter will infer it for you.

Those familiar with JAVA, C, C++ or any other statically typed language, will find that this is very convenient. However, it comes with a set of pitfalls (or features!) that one must be aware of.

In [None]:
## Examples in C++
# int a = 10;
# int a = 10.2; // Can you guess what do we see if we print the value of a?
# std::string s;
# std::list<std::string> list_of_strings = {"my", "list", "of", "strings"};

## Same examples in Python
a = 10
a = 10.2
s = "a string"
s = ["my", "list", "of", "strings"]

## You can also mix types in a Collection
s = ["my", "list", "of", 10, 20.4, False]
print(s)
print([type(e) for e in s])

The advantages are clear now. But what about the disadvantages?

In [None]:
def super_complicated_function(a):
    print("Trust me, it's a super complicated function.")
    
    # Do something with a
    a = a**4 + 0.5
    
    return a

Now, let's say that you have this function in your code and you want to execute it

In [None]:
super_complicated_function(a=2)

In [None]:
super_complicated_function("passing a string")

In this case, you only have one function, it's easy to read the function and find out what the type should be. But in general, this is a cumbersome process.

### Python Type Hints

Fortunately, Python supports Type Hints, and they are very useful!

https://docs.python.org/3/library/typing.html

In [None]:
# Adding type hints to the previous function
def super_complicated_function_typified(a : int) -> float:
    print("Trust me, it's a super complicated function.")
    
    # Do something with a
    a = a**4 + 0.5
    
    return a

In [None]:
super_complicated_function_typified(a=2)

## Basic data types

### Numbers

Integers and floats work as you would expect from other languages:

In [None]:
x = 3
y = 3.0
print(x, type(x), y, type(y))

In [None]:
print(x + 1)   # Addition
print(x - 1)   # Subtraction
print(x * 2)   # Multiplication
print(x ** 2)  # Exponentiation

In [None]:
y = 2.5
print(type(y))
print(y, y + 1, y * 2, y ** 2)

Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.

Python also has built-in types for long integers and complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3.9/library/stdtypes.html#numeric-types-int-float-long-complex).

#### Boolean Operations and Comparisons

Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (`&&`, `||`, etc.):

In [None]:
t, f = True, False
print(type(t))

Now we let's look at the operations:

In [None]:
print(t and f) # Logical AND;
print(t or f)  # Logical OR;
print(not t)   # Logical NOT;
print(t != f)  # Logical XOR;

In [None]:
print(4 == 4) # Equality comparison (this operator can be overloaded)
print(np.pi > 3)
print(type(t) is bool) # Comparison using object identity

### Strings

Textual data in Python is handled with *str* objects, or strings. Strings are immutable sequences. 

For more details and a complete list of all suported Strings operations check the [Official Strings Documentation](https://docs.python.org/3.9/library/stdtypes.html#text-sequence-type-str).

In [None]:
word1 = 'hello'   # String literals can use single quotes
word2 = "world"   # or double quotes
print(word1, len(word2))

In [None]:
str_concatenated = word1 + ' ' + word2  # String concatenation
print(str_concatenated)

In [None]:
str_formatted1 = '{} {} {}'.format(word1, word2, 12)  # string formatting
str_formatted2 = f'Let me say {word1} to the {word2} at least {12} times.' # string formatting (better alternative)
print(str_formatted1)
print(str_formatted2)

String objects have a bunch of useful methods; for example:

In [None]:
s = "hello world"
print(s.capitalize())  # Capitalize a string
print(s.upper())       # Convert a string to uppercase;
print(s.rjust(20))      # Right-justify a string, padding with spaces
print(s.center(20))     # Center a string, padding with spaces
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another
print('  world '.strip(), len('  world '), len('  world '.strip()))  # Strip leading and trailing whitespace
print(s.title())  # Convert a string to a "Titlecased" string

You can find a list of all string methods in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#string-methods).

# Python Sequences

Official documentation on [Python Sequences](https://docs.python.org/3.9/library/stdtypes.html#sequence-types-list-tuple-range).

Summary of the supported operations over Python Sequences:

![image.png](attachment:3578174c-0620-49c5-b6cc-30c6144fce4e.png)

## Lists

Official documentation on [Python Lists](https://docs.python.org/3.9/library/stdtypes.html#list).

In [None]:
list1 = [10,20,25,40]
list2 = ["ten", "twenty", "thirty", "forty"]


In [None]:
len(list1)

In [None]:
for i in range(len(list2)):
    print(i)

In [None]:
# List comprehensions (very handy)
print([x.upper() for x in list2])
print([x**2 for x in list1 if x <= 25]) # List comprehension with a condition.

In [None]:
for (a,b) in zip(list1,list2):
    print(a, "->", b)


In [None]:
for i, (a,b) in enumerate(zip(list1,list2)):
    print(f"Index {i} :",a, "->", b)

In [None]:
print("First element: ", list1[0])
print("Last element: ", list1[-1])
print("List: ", list1)
print("Reverse list: ", list1[::-1])


In [None]:
list1[1:3]

In [None]:
list1.append(60) 
print(list1)

In [None]:
list1.index(20) # This is not efficient. If you need lookups, use Dictionaries instead (below)

In [None]:
list1.index(30) # This is not efficient. If you need lookups, use Dictionaries instead (below)

In [None]:
a = ["Geeks", "For", "Geeks"]
print(" ".join(a))

In [None]:
## Exercise 1:
# Write list comprehension that computes the intersection of two lists

x = ["apple", "pear", "cherry", "watermelon", "orange"]
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]

# Sets

Official documentation on [Python Sets](https://docs.python.org/3.9/library/stdtypes.html#set-types-set-frozenset).

In [None]:
my_set = set()

In [None]:
my_set.add(3)
my_set.add(5)
my_set.add(3)
my_set.add(2)
my_set.add(7)


In [None]:
if 3 in my_set:
    print('Hurray!!')

In [None]:
for i in my_set:
    print(i)

In [None]:
set2 = set([2, 2, 2, 2, 2, 3, 9]) # Creating a set from a List

In [None]:
my_set.intersection(set2), my_set & set2 # Set Intersection

In [None]:
my_set.difference(set2), my_set - set2 # Set Difference

In [None]:
my_set.union(set2), my_set | set2 # Set Union

# Dictionaries

Official documentation on [Python Dictionaries](https://docs.python.org/3.9/library/stdtypes.html#mapping-types-dict).

In [None]:
animals={}
animals['fox'] = 'mammal'
animals['snake'] = 'reptile'
animals['whale'] = 'mammal'
animals['fly'] = 'insect'

In [None]:
for key in animals:
    print(key)
    print(animals[key])

In [None]:
for item in animals.items(): # Iterating through dictionary entries (key,values)
    print(item)

In [None]:
del animals["snake"] # Removing an element

for key, value in animals.items():
    print(key)
    print(value)

# Functions

Python functions are defined using the `def` keyword. For example:

In [None]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))

It is good practice to provide default values for arguments:

In [None]:
def hello(name, loud=False):
    if loud:
        print('HELLO, {}'.format(name.upper()))
    else:
        print('Hello, {}!'.format(name))

hello('Bob')
hello('Fred', loud=True)

## Functions and Local vs. Global Variables

**NOTE**: Be very careful with Local vs. Global Variables!

Remember the "y" variable that was defined as y = 2.5? You don't right? Exactly ....

In [None]:
def example_function(a, b, c):
    y = a + b + c
    print(f'The value y within the function is {y}')
    return y

d = example_function(1, 2, 3)
print(f'The value of y outside the function is {y}')

## Returning multiple values

In [None]:
def fn_multiple_return_values(a, b, c):
    return a + b, c

arg1, arg2 = fn_multiple_return_values(1, 2, 3)
print(arg1, arg2)

# Classes

Official Tutorial on [Python Classes](https://docs.python.org/3.9/tutorial/classes.html).

In [None]:
class Greeter:

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print(f'HELLO, {self.name.upper()}!')
        else:
            print(f'Hello, {self.name}!')
            
    # Defining the '+' operator through overloading
    def __add__(self, o):
        print(f'Hello, {self.name} and {o.name}!')

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

In [None]:
g = Greeter('Fred')
g2 = Greeter('Jane')
g + g2

# Numpy

https://numpy.org/doc/1.23/reference/index.html#reference

In [None]:
m2 = np.linspace(1,9,3)
print(m2)

In [None]:
m1 = np.arange(1,10,1)
print(m1)

In [None]:
np.tile(m1,[2,1])

In [None]:
m4=np.ones([2,3])
print(m4)

In [None]:
np.shape(m4)

In [None]:
np.size(m4)

In [None]:
np.reshape(m4, [1,6])

In [None]:
np.min(m4)

In [None]:
np.max(m4)

In [None]:
np.sum(m4, axis = 0)

In [None]:
m4/np.sum(m4, axis = 0)

In [None]:
np.sum(m4, axis = 1)

In [None]:
m4.T/np.sum(m4, axis = 1)

In [None]:
np.sum(m4)

In [None]:
m4/np.sum(m4)

In [None]:
m5 = np.random.rand(2,3)

In [None]:
m5>0.5

In [None]:
np.where(m5>0.5)

In [None]:
np.multiply(m4,m5), np.multiply(m4,m5).shape

In [None]:
np.dot(m4, m5.T), np.dot(m4, m5.T).shape

In [None]:
np.power(m5, m4)

In [None]:
## Exercise 2: Create a Random 4x4 Numpy Matrix, and normalized it using z-score normalization
# Z-score normalization: https://en.wikipedia.org/wiki/Standard_score


## Matrix operations broadcasting
https://numpy.org/doc/stable/user/basics.broadcasting.html

In [None]:
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
print(a)

In [None]:
b = np.array([1.0, 2.0, 3.0])
c = np.array([1.0, 2.0, 3.0, 4.0])
print(b)

In [None]:
result = a + b
print(result)
print(np.shape(result))

In [None]:
result = a + c
print(result)
print(np.shape(result))

The following figure illustrates this process:

![image.png](attachment:75360a15-01b1-4926-b5ce-c5d3df83bd85.png)

A more complicated example:

In [None]:
aa = np.ones([2, 4])
bb = np.ones([3, 4])/2
print(aa)
print()
print(bb)

In [None]:
print(np.shape(aa[:,None]))
print(np.shape(bb))

In [None]:
cc = np.multiply(aa[:,None],bb)
print(cc)
print(np.shape(cc))

In [None]:
dd = np.sum(cc, axis = 2)
print(np.shape(dd))
print(dd)

# Plotting

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
t = np.arange(0.0, 2.0, 0.01)  
s = np.sin(2*np.pi*t)  

plt.plot(t, s, label="my function s", color="darkcyan")  

# List of available colors: https://matplotlib.org/stable/gallery/color/named_colors.html

plt.ylabel('yaxis') 
plt.xlabel('xaxis') 

plt.title(r'function $s(t)=2\pi\cdot t$')
plt.legend()
plt.show()

## Subplots

In [None]:
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)

# Make the first plot
plt.plot(x, y_sin, c="darkcyan")
plt.title('Sin')

# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos, c="orange")
plt.title('Cos')

# Show the figure.
plt.show()

For a more comprehensive list of `subplot` functions, please refer to the [documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot).

# A very brief sneak peek at what you will learn in this course :)

In [None]:
import gensim
from gensim.models import Word2Vec
from gensim.models.word2vec import FAST_VERSION
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pickle

In [None]:
with open('words_clusters_w2v.pickle', 'rb') as f:
    data = pickle.load(f)


In [None]:
words = data["words"]
word_clusters = data["word_clusters"]
embeddings_w2v_2d = data["embeddings_w2v_2d"]
embeddings_w2v_300d = data["embeddings_w2v_300d"]

In [None]:
print(words)

In [None]:
embeddings_w2v_300d.shape, embeddings_w2v_2d.shape

In [None]:
%matplotlib widget

def tsne_plot_similar_words(title, labels, embedding_clusters, word_clusters, alpha):
    figsize = (16,10)
    plt.figure(figsize=(figsize))
    colors = cm.tab20(np.linspace(0, 1, len(labels))) # Use matplotlib colormaps
    for label, embeddings, words, color in zip(labels, embedding_clusters, word_clusters, colors):

        x = embeddings[:, 0] # Get the x-coordinate
        y = embeddings[:, 1] # Get the y-coordinate
        
        plt.scatter(x, y, c=[color], alpha=alpha, label=label)
        
        for i, word in enumerate(words):
            plt.annotate(word, alpha=0.5, xy=(x[i], y[i]), xytext=(5, 2),
                         textcoords='offset points', ha='right', va='bottom', size=10)
    plt.legend(loc=4)
    plt.title(title)
    plt.grid(True)
    plt.show()



tsne_plot_similar_words('Word embeddings clusters - Word2Vec trained with the Google News Corpus (3 million words & phrases)',
                        labels=words, embedding_clusters=embeddings_w2v_2d, word_clusters=word_clusters, alpha=0.9)

In [None]:
## Exercise 3: Read the questions below and try to answer them with what you know and the plot above.

# a) What can you tell from each clusters? Do they make sense? Can you come up with an explanation for such grouping?
# b) Is simple string matching enough to find all relevant documents? (hint: inspect the 'war' and 'peace' clusters)
# c) How to build a system that accounts for typos on search keywords? (hint: inspect the 'Portugal' cluster)

# References

This notebook uses adapted content from the following courses/materials:

 * Python Official Tutorial (https://docs.python.org/3.9/tutorial/).
 * Numpy Documentation (https://numpy.org/).
 * Stanford course CS231n: Convolutional Neural Networks for Visual Recognition (https://cs231n.github.io/).