diff --git a/assignment-1/submission/18340986009/README.md b/assignment-1/submission/18340986009/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fa66f00c0a8bb1084d3920b47714534b3a660dfe
--- /dev/null
+++ b/assignment-1/submission/18340986009/README.md
@@ -0,0 +1,159 @@
+# KNN Classification
+
+This report includes two parts:
+1. Find a KNN model that maximize accuracy rate with given dataset. (Distribution type of each class = Gaussian, distribution parameters chosen at random)
+2. Assess how distribution parameters affects model accuracy using the model built in part 1.
+
+
+## 1. Model Generation
+
+### 1.1 Overview of Mock Data
+
+Generate 3 classes of 2-dimension Gaussian Distribution.
+
+$
+  N_0 = 150 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\50\end{bmatrix},\sigma^{2} = \begin{bmatrix}60 & -50\\\\-50 & 140\end{bmatrix})
+$
+
+$
+  N_1 = 250 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}60\\\\20\end{bmatrix},\sigma^{2} = \begin{bmatrix}130 & 10\\\\10 & 100\end{bmatrix})
+$
+
+$
+  N_2 = 100 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}20\\\\60\end{bmatrix},\sigma^{2} = \begin{bmatrix}120 & 20\\\\20 & 90\end{bmatrix})
+$
+
+Mock Data 1 Overview:
+
+<img src="img/Figure 1.png" width=450 height=300/>
+
+500 points are then split randomly into training set (80%) and testing set (20%).
+
+### 1.2 Model Accuracy with Different K and Distance Method
+
+Since a rule of thumb is to let $K = \sqrt{N}$, where $ N = N_0 + N_1 + N_2$, we first try some Ks around $\sqrt{400} = 20$ using both Euclidean and Manhattan distance.
+
+|       \      | K = 10 | K = 15 | K = 20 | K = 25 | K = 30 |
+| ------------ |:------:|:------:|:------:|:------:|:------:|
+| **Euclidean**    |83.0|82.0|83.0|81.0|80.0|
+| **Manhattan**    |83.0|82.0|81.0|81.0|81.0|
+
+The KNN model with $K = 10$ gives the best prediction result of 83% for both distance methods, so we consider choosing $K_{0} = 10$ as a starting point for model optimization. Below is a scatter plot showing the prediction result of the chosen model ($K = 10$, Euclidean Distance). Each red dot represents a mis-classification.
+
+*Noticed model accuracy using different distance method doesn't show much difference for this dataset. 
+
+<img src="img/Figure 2.png" width=450 height=300/>
+
+### 1.3 Model Optimization
+
+General Idea: $K_{i+1} = \lceil{K_{i} + Step_{i+1}}\rceil$
+
+Detailed steps:
+
+ - For each $K_{i+1}$, calculate its accuracy rate $R_{i+1}$.
+ - If $R_{i+1} > R_{0}$, a better model is find. End our optimization. Else:
+     - If $R_{i+1} > R_{i}$, let $Step_{i+1} =  \frac{1}{C} Step_{i} $, where $C = (R_{i+1} - R_{i}) / R_{i}$. 
+     Which is, if model accuracy improves, continue in this direction with a smaller step. The step size is negatively related to the percentage of improvement.
+     - If $R_{i+1} <= R_{i}$, let $Step_{i+1} = - \frac{1}{2} Step_{i}$. 
+   Which is, if the new K does not improve model accuracy, try a smaller step in reverse direction.
+
+The model from 1.2 gives K = 10 and Euclidean distance. Using this model as the starting point, define the first step $Step_{0} = \frac{1}{100}N = 5$.
+
+Optimization process:
+    
+|       \      | K = 10 | K = 5 | K = 8 |
+| ------------ |:------:|:------:|:------:|
+| **Accuracy rate (%)**    |83.0|83.0|85.0|
+
+ After three iterations, a higher accuracy rate of 85% is reached when K is adjusted to 8. Thus, our final KNN model will use K = 8 and Euclidean distance.
+
+Prediction result evaluation:
+
+<img src="img/Figure 3.png" width=450 height=300/>
+
+Compared with the model before optimization, two points on the top is now classified correctly.
+
+## 2. Distribution Parameters & Model Accuracy
+
+From inuition, we hypothesis that any change that results in a more balanced mixture of all classes will make classification harder, thereby decrease model accuracy. Below, we modify the parameters of Gaussian distributions to test our hypothesis.
+
+### 2.1 Change of Variance and Covariance
+
+Let the means stay the same. Modify the variance-covariance matrix for each class to increase overlapping between each class:
+
+$
+  N_0 = 150 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\50\end{bmatrix},\sigma^{2} = \begin{bmatrix}300 & 0\\\\0 & 200\end{bmatrix}) 
+$
+
+$
+  N_1 = 250 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}60\\\\20\end{bmatrix},\sigma^{2} = \begin{bmatrix}250 & 0\\\\0 & 150\end{bmatrix})
+$
+
+$
+  N_2 = 100 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}20\\\\60\end{bmatrix},\sigma^{2} = \begin{bmatrix}150 & 0\\\\0 & 150\end{bmatrix})
+$
+
+Mock Data 2 Overview:
+
+<img src="img/Figure 4.png" width=450 height=300/>
+
+Prediction result evaluation:
+
+<img src="img/Figure 5.png" width=450 height=300/>
+
+Accuracy of our model drop from 85% to 79% as expected.
+
+### 2.2 Change of Mean
+
+Let other parameters stay the same, decrease the distance between the means of each class to increase overlapping:
+
+$
+  N_0 = 150 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\50\end{bmatrix},\sigma^{2} = \begin{bmatrix}60 & -50\\\\-50 & 140\end{bmatrix})
+$
+
+$
+  N_1 = 250 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\40\end{bmatrix},\sigma^{2} = \begin{bmatrix}130 & 10\\\\10 & 100\end{bmatrix}) 
+$
+
+$
+  N_2 = 100 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}40\\\\60\end{bmatrix},\sigma^{2} = \begin{bmatrix}120 & 20\\\\20 & 90\end{bmatrix})
+$
+
+Mock Data 3 Overview:
+
+<img src="img/Figure 6.png" width=450 height=300/>
+
+Prediction result evaluation:
+
+<img src="img/Figure 7.png" width=450 height=300/>
+
+Accuracy of our model drop from 85% to 73% as expected.
+
+### 2.3 N & Model Accuracy
+
+In attempts to increase model accuracy, we try double the Ns in proportion to Data 3. With $N_{total} = 1000$, we expect some increase on model accuracy.
+
+Mock Data 4 Overview:
+
+<img src="img/Figure 8.png" width=450 height=300/>
+
+Prediction result evaluation:
+
+<img src="img/Figure 9.png" width=450 height=300/>
+
+Model accuracy decreases from 73% to 62.5% even though our data size doubled. This suggests sample size contributes much less to model accuracy compared with distribution parameters. This makes sense because if the data labeled by different categories does indeed come from the same distribution, increasing N should provide more evidence of the similarity between these different categories.
+
+## Summary
+
+The main takeaways for this exercise:
+
+Model accuracy depends more on distribution parameters and the choice of K. Distance method have little influence on model accuracy, and whether an increase of N improves model accuracy or not depends on if the true distributions of all categories are significantly different (Might be able to use p-value from a statistical test to evaluate).
diff --git a/assignment-1/submission/18340986009/img/Figure 1.png b/assignment-1/submission/18340986009/img/Figure 1.png
new file mode 100644
index 0000000000000000000000000000000000000000..32d5ded9c9d662bf7eacaede5e9316ba1d545335
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 1.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 2.png b/assignment-1/submission/18340986009/img/Figure 2.png
new file mode 100644
index 0000000000000000000000000000000000000000..c7e7752721f808ea5ca19a56a7e642badb1617fd
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 2.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 3.png b/assignment-1/submission/18340986009/img/Figure 3.png
new file mode 100644
index 0000000000000000000000000000000000000000..5a3fd62c0681f995d32c1ea794258095239261ee
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 3.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 4.png b/assignment-1/submission/18340986009/img/Figure 4.png
new file mode 100644
index 0000000000000000000000000000000000000000..9c1e05f712b290be595b12c812476c72e0f0002d
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 4.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 5.png b/assignment-1/submission/18340986009/img/Figure 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..e49ec9595ac9c813a2e6044375c534bb669b3a7c
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 5.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 6.png b/assignment-1/submission/18340986009/img/Figure 6.png
new file mode 100644
index 0000000000000000000000000000000000000000..11a84369882f65a2a3e46237e51fe479d4f14b88
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 6.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 7.png b/assignment-1/submission/18340986009/img/Figure 7.png
new file mode 100644
index 0000000000000000000000000000000000000000..ee33c60766eb907d5b8992c24ca3806c297d9fc8
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 7.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 8.png b/assignment-1/submission/18340986009/img/Figure 8.png
new file mode 100644
index 0000000000000000000000000000000000000000..a3f42ac859f2ef35448cb16f0412df387ba8e7a8
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 8.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 9.png b/assignment-1/submission/18340986009/img/Figure 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..0de5d1f658bdd5860681cfee20432e8074f39a1d
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 9.png differ
diff --git a/assignment-1/submission/18340986009/source.py b/assignment-1/submission/18340986009/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..410b588394c97d15227671e94f6e24e6cbb46882
--- /dev/null
+++ b/assignment-1/submission/18340986009/source.py
@@ -0,0 +1,249 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+# In[1]:
+
+
+import sys
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+# ## Define Global Functions
+
+# In[139]:
+
+
+# Generate Training and Testing Sets
+def generate(Ns, Means, Covs, train_frac):
+
+    # Generate 2-D data of N class
+    data = list()
+    label = list()
+    
+    for i in range(0,len(Ns)):
+        Ci = np.random.multivariate_normal(Means[i], Covs[i], Ns[i])
+        data.append(Ci)
+        label.append([i]*Ns[i])
+
+    data = np.array([v for subl in data for v in subl])
+    label = np.array([v for subl in label for v in subl])
+    
+    #Assign random number
+    idx = np.arange(sum(Ns))
+    np.random.shuffle(idx)
+    
+    data = data[idx]
+    label = label[idx]
+
+    # Split into training and testing set
+    split_point = int(label.size * train_frac)
+    train_data, test_data = data[:split_point,], data[split_point:,]
+    train_label, test_label = label[:split_point,], label[split_point:,]
+    
+    np.save("data.npy",((train_data, train_label), 
+                        (test_data, test_label)))
+
+    return train_data, train_label, test_data, test_label
+
+
+# Read in saved data
+def read():
+    (train_data, train_label), (test_data, test_label) = np.load(
+        "data.npy", allow_pickle = True)
+    return (train_data, train_label), (test_data, test_label)
+
+
+# Create scatter plot of different categories
+def display(data, colorby, name, title):
+    colors = ['red','grey','blue']
+    datas =[[],[],[]]
+    
+    for i in range(len(data)):
+        datas[colorby[i]].append(data[i])
+        
+    for i in range(len(datas)):
+        each = np.array(datas[i])
+        if len(each) == 0:
+            continue
+        plt.scatter(each[:, 0], each[:, 1], 
+                    marker = 'o',
+                    color = colors[i],
+                    alpha = 0.7)
+        
+    plt.xlabel("X1")
+    plt.ylabel("X2")
+    plt.title(title)
+    plt.savefig(f'img/{name}')
+    plt.show()
+
+
+# ## Define Class KNN
+
+# In[140]:
+
+
+class KNN:
+
+    def __init__(self):
+        
+        self.K = None
+        self.Dist = None
+        self.data = None
+        self.label = None
+    
+    
+    # Calculate distance between two given points
+    def get_distance(self, x, y, dist_type = "Euclidean"):
+        dist = 0.0
+        if "Euclidean" == dist_type:
+            distance = 0.0
+            for i in range(len(x)):
+                distance += (x[i] - y[i])**2
+            dist = np.sqrt(distance)
+            
+        if "Manhattan" == dist_type:
+            distance = 0.0
+            for i in range(len(x)):
+                distance += np.abs(x[i] - y[i])
+            dist = distance
+            
+        return dist
+    
+    
+    # Make a prediction for one point
+    def predict_for_one(self, K, Dist, target, train_data, train_label):
+        # Calculate distances between target point and other points
+        dists = []
+        neighbors = []
+
+        for i in range(len(train_data)):
+            dist = self.get_distance(target, train_data[i], Dist)
+            dists.append((train_data[i], train_label[i], dist))
+
+        # Get the K nearest neighbors
+        dists.sort(key = lambda e: e[-1])
+        neighbors = dists[1:K+1]
+
+        # Make prediction based on conditional probabilities
+        neighbors_class = [e[-2] for e in neighbors]
+        prediction = max(neighbors_class, key = neighbors_class.count)
+
+        return prediction
+
+    
+    # Calculate model accuracy
+    def calc_accuracy(self, K, Dist, train_data, train_label):
+        predictions = []
+        # Make predictions for the training data
+        for i in range(len(train_label)):
+            target = train_data[i]
+            prediction = self.predict_for_one(
+                K, Dist, target, train_data, train_label
+            )
+            predictions.append(prediction)
+        
+        correct = 0
+        for i in range(len(predictions)):
+            if train_label[i] == predictions[i]:
+                correct += 1
+        accuracy = correct / len(predictions) * 100
+
+        return accuracy
+        
+    
+    # Find the Optimal K & Distance combination 
+    def fit(self, K_list, Dist_list, train_data, train_label):
+    
+        # Loop through the given options for K and distance methods
+        accuracy_list = []
+        for i in range(len(Dist_list)):
+            Dist = Dist_list[i]
+            dum_list = []
+            for j in range(len(K_list)):
+                K = K_list[j]
+                accuracy = self.calc_accuracy(
+                K, Dist, train_data, train_label
+                )
+                dum_list.append(accuracy)
+            accuracy_list.append(dum_list)
+        
+        # Find the K & Distance method that gives the highest accuracy
+        ac_array = np.array(accuracy_list)
+        global_max = max([max(subl) for subl in accuracy_list])
+        params = np.where(ac_array == global_max)
+        
+        # Assign the optimal parameters to KNN object
+        # Randomly choice one if there exist more than one highest accuracy
+        Dist_idx = np.random.choice(np.array(params[0]))
+        K_idx = np.random.choice(np.array(params[1]))
+        
+        self.Dist = Dist_list[Dist_idx]
+        self.K = K_list[K_idx]
+        self.data = train_data
+        self.label = train_label
+        
+        return ac_array
+        
+    
+    def predict(self, test_data):
+        # If test data has been inputed & Model has been obtained 
+        predictions = []
+            # For every point(target) in test data
+        for i in range(len(test_data)):
+            target = test_data[i]
+            prediction = self.predict_for_one(
+                self.K, self.Dist, 
+                target, 
+                self.data, 
+                self.label)
+            predictions.append(prediction)
+            
+        return np.array(predictions)
+
+
+# ## Start of Program
+
+# In[143]:
+
+
+if __name__ == '__main__':
+
+    if len(sys.argv) > 1 and sys.argv[1] == "g":
+        generate(
+            Ns = [100, 250, 150],
+
+            Means = [[50,50], 
+                     [60,20], 
+                     [20,60]], 
+
+            Covs = [[[60,-50],[-50,140]], 
+                    [[130,10],[10,100]], 
+                    [[120,20],[20,90]]],
+
+            train_frac = 0.8
+        )
+        
+    elif len(sys.argv) > 1 and sys.argv[1] == "d":
+        (train_data, train_label), (test_data, test_label) = read()
+        
+        display(train_data, train_label, 
+                'train', 'Scatter Plot of Training Data')
+        display(test_data, test_label, 
+                'test', 'Scatter Plot of Testing Data')
+    else:
+        (train_data, train_label), (test_data, test_label) = read()
+
+        model = KNN()
+        
+        model.fit(
+            K_list = [15, 20, 25], 
+            Dist_list = ["Euclidean", "Manhattan"],
+            train_data = train_data, 
+            train_label = train_label)
+
+        res = model.predict(test_data)
+
+        print("acc =",np.mean(np.equal(res, test_label)))
+        
+