in progress
from sklearn.decomposition import PCA
x_reduced = PCA(n_components = 3).fit_transform(iris.data)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
x = iris.data[:, 1] #X-Axis - petal length
y = iris.data[:, 2] #Y-Axis - petal width
species = iris.target #species
x_reduced = PCA(n_components = 3).fit_transform(iris.data)
#SCATTERPLOT 3D
fig = plt.figure()
ax = Axes3D(fig)
ax.set_title('Iris Dataset by PCA', size = 14)
ax.scatter(x_reduced[:, 0], x_reduced[:, 1], x_reduced[:. 2], c = species)
ax.set_xlabel('First Eigenvector')
ax.set_ylabel('Second Eigenvector')
ax.set_zlabel('Third Eigenvector')
ax.w_xaxis.set_ticklabels(())
ax.w_yaxis.set_ticklabels(())
ax.w_xaxis.set_ticklabels(())
指一系列机器学习方法。最基础的任务是判断新观测数据属于两个类别中的哪一个。在学习阶段,这类分类器把训练数据映射到叫作决策空间(decision space)的多维空间,创建叫作决策边界的分离面,把决策空间分为两个区域。可分为SVR(Support Vector Regression,支持向量回归)和SVC(Support Vector Classification,支持向量分类)。

指重复反馈过程的活动,其目的通常是为了接近并到达所需的目标或结果。每一次对过程的重复被称为一次“迭代”,而每一次迭代得到的结果会被用来作为下一次迭代的初始值。

# Import EarlyStopping
from keras.callbacks import EarlyStopping
# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)
# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience = 2)
# Fit the model
model.fit(predictors, target, epochs=30, validation_split= 0.3, callbacks=[early_stopping_monitor])
Since PCA uses the absolute variance of a feature to rotate the data, a feature with a broader range of values will overpower and bias the algorithm relative to the other features. To avoid this, we must first normalize our data. There are a few methods to do this, but a common way is through standardization, such that all features have a mean = 0 and standard deviation = 1 (the resultant is a z-score).
random_state就是为了保证程序每次运行都分割一样的训练集合测试集。否则,同样的算法模型在不同的训练集和测试集上的效果不一样。 当你用sklearn分割完测试集和训练集,确定模型和初始参数以后,你会发现程序每运行一次,都会得到不同的准确率,无法调参。这个时候就是因为没有加random_state。加上以后就可以调参了。

查准率(precision)与查全率(recall):
F1:基于查准率与查全率的调和平均:

MaxPooling. This passes a (2, 2) moving window over the image and downscales the image by outputting the maximum value within the window.Conv2D. This adds a third convolutional layer since deeper models, i.e. models with more convolutional layers, are better able to learn features from images.Dropout. This prevents the model from overfitting, i.e. perfectly remembering each image, by randomly setting 25% of the input units to 0 at each update during training.Flatten. As its name suggests, this flattens the output from the convolutional part of the CNN into a one-dimensional feature vector which can be passed into the following fully connected layers.Dense. Fully connected layer where every input is connected to every output.Dropout. Another dropout layer to safeguard against overfitting, this time with a rate of 50%.Flask is super easy and used for a lot of API development in data engineering and for productionizing machine learning models.
Start: Pick something you are interested in And solve a problem (Can be soccer betting for all I know)
Step 4 Automate data collection, transformation Upload, and analysis