# Python basics

__Installation__

```$ sudo apt-get install python```

```$ sudo apt-get install pip```

__Option1: Install system-wide__

```sudo pip install -r requirements.txt```

__Option 2: virtualenv__
    
Install virtualenv via pip:

```$ sudo pip install virtualenv```

Test your installation

```$ virtualenv --version```

Create a virtual environment for a project:

```$ cd my_project_folder```

```$ virtualenv my_project```

virtualenv my_project will create a folder in the current directory which will contain the Python executable files, and a copy of the pip library which you can use to install other packages. The name of the virtual environment (in this case, it was my_project) can be anything; omitting the name will place the files in the current directory instead.

This creates a copy of Python in whichever directory you ran the command in, placing it in a folder named my_project.

You can also use the Python interpreter of your choice (like python2.7).

```$ virtualenv -p /usr/bin/python2.7 my_project```

Install packages as usual, for example:

```$ pip install -r requirements```

If you are done working in the virtual environment for the moment, you can deactivate it:
    
```$ deactivate```

This puts you back to the system’s default Python interpreter with all its installed libraries.

## General

### Official python documentation

RTFM: https://docs.python.org/2.7/tutorial/datastructures.html

### Informations about functions

In jupyter or the interactive shell, run 

```>>> help(function)```

or

```>>> ?function```

to get information about the function.

### Informations about packages/classes

In [6]:
dir(list)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__setslice__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

## Lists and arrays
(it is assumed that basic datatypes (int, float, str, etc.) are well-known already)

### Creating a list

```my_list = []```

or 

```my_list = list()```

or

```my_list = ["a", 2, 15, 3.1415]```


### References and values

When you create a new list my_list, you’re storing the list in your computer memory, and the address of that list is stored in the my_list variable. The variable my_list doesn’t contain the elements of the list. It contains a reference to the list. If we copy a list with the equal sign only like this my_list_copy = my_list, you’ll have the reference copied in the my_list_copy variable instead of the list values. So, if you want to copy the actual values, you can use the list(my_list) function or slicing [:].

In [7]:
my_list = ["a", 2, 15, 3.1415]
my_copy = my_list
my_copy[0] = "b"
print my_list

['b', 2, 15, 3.1415]


### List Slicing

Slicing is a powerful tool to select subsets or elements from a list (or an np.array, as we will see later).
Indexing starts at 0!

In [8]:
my_list[0] # select the first element

'b'

In [9]:
my_list[:2] # select the first two elements

['b', 2]

In [10]:
my_list[-2:] # select the last two elments

[15, 3.1415]

In [11]:
my_list[1:3] # select element 1 to 2

[2, 15]

In [12]:
my_list[::2] # select every second element

['b', 15]

In [13]:
my_list[::-1] # reverse

[3.1415, 15, 2, 'b']

### numpy

Powerful, fast package for scientific computing.

Multidimensional, efficient array implementations with a lot of built-in methods.

__Important__: Elements of numpy arrays always have the same type

In [14]:
import numpy as np

### Creating arrays

Similar to lists, we can create empty numpy arrays:
    
```my_array = np.array([]) # from an empty list```

```my_array = np.array([[1,2,3],[4,5,6]]) # from a list with elements```

```my_array = np.zeros((2,3), dtype=np.int) # creates an array with six elements, sets them to zero```

```my_array = np.empty((2,3), dtype=np.int) # reserve space, but do not set values (marginally faster than zeros)```

In [15]:
my_array = np.array([[3.,2,3],[4,5,6]])
print my_array # all values are casted to int
print
print "Shape: ", my_array.shape
print "Dtype: ", my_array.dtype

[[ 3.  2.  3.]
 [ 4.  5.  6.]]

Shape:  (2, 3)
Dtype:  float64


In [16]:
my_array_copy = my_array # like lists, only reference assigned!
my_array_copy[0,0] = 1337
print my_array

[[ 1337.     2.     3.]
 [    4.     5.     6.]]


### Slicing in multidimensional numpy arrays

In [17]:
my_array[1,2] # get third element of second row

6.0

In [18]:
my_array[:,1] # get all elements of second column

array([ 2.,  5.])

In [19]:
my_array[1,:] # get all elements of second row

array([ 4.,  5.,  6.])

In [20]:
my_array[1,::-1] # get all elements of second row in reverse order

array([ 6.,  5.,  4.])

### Some operations

In [21]:
a = np.array([
                [1,0,0],
                [0,1,0],
                [0,0,1]
            ])

b = np.array([
                [1,2,3],
                [4,5,6],
                [7,8,9]
        
            ])

print "elementwise operations\n", a*b
print
print "matrix product\n", np.dot(a,b)

elementwise operations
[[1 0 0]
 [0 5 0]
 [0 0 9]]

matrix product
[[1 2 3]
 [4 5 6]
 [7 8 9]]


#### One more dimension, please

Images in python are usually represented using numpy-arrays.

The Y-axis is the first, X-axis the second, and the color-channel-axis is the third dimension:

<img src="./pictures/chair-layers.png" />
[source] https://mmeysenburg.github.io/image-processing/02-opencv-images/

In [22]:
import cv2 # python-wrapper for the OpenCV (Open Computer Vision) package
img = cv2.imread("./pictures/Lenna.png")
print img.shape
print img.dtype

(512, 512, 3)
uint8


In [23]:
%matplotlib notebook
import matplotlib
import matplotlib.pyplot as plt # powerful plotting package

fig = plt.figure()
plt.imshow(img)
plt.axis('off')
plt.show()

<IPython.core.display.Javascript object>

What's wrong? OpenCV represents the color-channels as B-G-R, matplotlib as R-G-B!

In [24]:
# Swap the red and the blue channels manually
tmp = img[:,:,0].copy()
img[:,:,0] = img[:,:,2]
img[:,:,2] = tmp

fig = plt.figure()
plt.imshow(img)
plt.show()

<IPython.core.display.Javascript object>

In [25]:
# get one specific pixel value - Remember, R-G-B
img[300,300,:]

array([213,  88,  87], dtype=uint8)

In [26]:
# get a 2d slice of the image, with all color channels
fig = plt.figure()
sub_img = img[200:300, 200:400,:]
plt.imshow(sub_img)
plt.show()

<IPython.core.display.Javascript object>

In [27]:
# get a 2d-slice of the image, with only the red channel
fig = plt.figure()
sub_img = img[200:300, 200:400,0]
plt.imshow(sub_img)
plt.show()

<IPython.core.display.Javascript object>

In [28]:
# set the blue-channel to zero
fig = plt.figure()
img[:,:,2] = 0
plt.imshow(img)
plt.show()

<IPython.core.display.Javascript object>

## Plotting

"...make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding."
-- F. J. Anscombe, 1973

<img src="./pictures/AllDinosGrey.png" />

[source] https://www.autodeskresearch.com/publications/samestats

### Matplotlib and seaborn

Matplotlib: Already satisfies all plotting needs, even for publication quality plots
    
Seaborn: Makes them even nicer, brings some additional tools for statistical data visualization

In [29]:
import matplotlib.pyplot as plt
import numpy as np

random_data = np.array([np.random.randint(0,10,50), np.random.randint(0,10,50)])
random_data

array([[2, 9, 9, 5, 5, 2, 8, 0, 9, 2, 2, 6, 5, 1, 4, 9, 7, 0, 8, 9, 7, 4,
        1, 4, 0, 3, 2, 1, 9, 2, 2, 3, 4, 6, 3, 3, 1, 4, 9, 0, 0, 0, 2, 6,
        4, 3, 1, 1, 5, 8],
       [2, 6, 7, 8, 4, 2, 7, 5, 3, 9, 7, 3, 2, 6, 6, 2, 4, 4, 3, 5, 3, 6,
        9, 5, 8, 8, 8, 3, 6, 2, 0, 0, 4, 0, 1, 9, 9, 9, 1, 6, 4, 2, 6, 0,
        6, 0, 7, 9, 3, 1]])

In [30]:
fig = plt.figure()
plt.plot(random_data[0], label='random0')
plt.plot(random_data[1], label='random1')
plt.legend(loc=1, frameon=True)
plt.show()

<IPython.core.display.Javascript object>

In [31]:
import seaborn as sns
sns.set()

fig = plt.figure()
plt.plot(random_data[0], label='random0')
plt.plot(random_data[1], label='random0')
plt.legend(loc=1, frameon=True)
plt.show()

<IPython.core.display.Javascript object>

In [32]:
fig = plt.figure()
plt.hist([random_data[0],random_data[1]], label=["random0", "random1"])
plt.legend(frameon=True)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f41404a9b90>

In [33]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10,7))
ax = fig.add_subplot(111, projection='3d')
for idx,(c, z) in enumerate(zip(['g', 'b'], [10, 0])):
    xs = np.arange(len(random_data[0]))
    ys = random_data[idx]

    # You can provide either a single color or an array. To demonstrate this,
    # the first bar of each set will be colored cyan.
    cs = [c] * len(xs)
    ax.bar(xs, ys, zs=z, zdir='y', color=cs, alpha=0.8)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()

<IPython.core.display.Javascript object>

In [34]:
import pandas as pd
 
source = "https://raw.githubusercontent.com/PointCloudLibrary/data/master/tutorials/ism_train_cat.pcd"
cat_df = pd.read_csv(source, skiprows=11, delimiter=" ", names=["x","y","z"], encoding='latin_1')
cat_df.head()

fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111, projection='3d', facecolor='white') 
ax.scatter(cat_df.x, cat_df.y, cat_df.z)
ax.view_init(azim=180, elev=10)
plt.axis('equal')
plt.ylim((-100,100))
plt.xlim((-100,100))
plt.show()


<IPython.core.display.Javascript object>

In [35]:
# Load iris data
fig = plt.figure()
iris = sns.load_dataset("iris")
sns.swarmplot(x="species", y="petal_length", data=iris)
plt.show()

<IPython.core.display.Javascript object>

In [36]:
fig = plt.figure()
sns.boxplot(data=iris)
plt.show()

<IPython.core.display.Javascript object>

In [37]:
sns.pairplot(data=iris)
plt.show()

<IPython.core.display.Javascript object>

In [38]:
sns.pairplot(data=iris, hue='species')
plt.show()

<IPython.core.display.Javascript object>

### Best practices
Make sure that 
* all of your plots have consistent style in a report/paper/document
* all text is large enough
* colors are suited also for colorblind people (http://www.vischeck.com/vischeck/vischeckImage.php)
* colored plots can still be understood if printed in grayscale
* contain a title or a caption
* axes are labeled AND show what unit it is in

## pandas Loading and handling tabular data(sets)
Sources:
* https://pandas.pydata.org/pandas-docs/stable/10min.html#min

In [39]:
import pandas as pd
import seaborn as sns
iris = sns.load_dataset('iris') # seaborn gives you sample datasets directly as pandas dataframe
print type(iris)
iris.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [40]:
# ... but of course you can just parse a csv-file
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
print type(iris)
iris.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [41]:
iris.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [42]:
iris.dtypes

sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object

In [43]:
iris[0:2]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa


In [44]:
iris.sepal_width.head()

0    3.5
1    3.0
2    3.2
3    3.1
4    3.6
Name: sepal_width, dtype: float64

In [45]:
iris.sepal_width.describe()

count    150.000000
mean       3.057333
std        0.435866
min        2.000000
25%        2.800000
50%        3.000000
75%        3.300000
max        4.400000
Name: sepal_width, dtype: float64

In [46]:
iris[iris.sepal_width > 3.0]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa
10,5.4,3.7,1.5,0.2,setosa
11,4.8,3.4,1.6,0.2,setosa


In [47]:
iris[iris.sepal_width > 3.0].count()

sepal_length    67
sepal_width     67
petal_length    67
petal_width     67
species         67
dtype: int64

In [48]:
features = iris.columns[:4]
iris[features].head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [49]:
iris.groupby(iris.species).mean()

Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
setosa,5.006,3.428,1.462,0.246
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


In [50]:
for idx, row in iris.iterrows():
    if idx > 10:
        break
    print idx,":",row.species

0 : setosa
1 : setosa
2 : setosa
3 : setosa
4 : setosa
5 : setosa
6 : setosa
7 : setosa
8 : setosa
9 : setosa
10 : setosa
