in progress
from sklearn.decomposition import PCA
x_reduced = PCA(n_components = 3).fit_transform(iris.data)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
x = iris.data[:, 1] #X-Axis - petal length
y = iris.data[:, 2] #Y-Axis - petal width
species = iris.target #species
x_reduced = PCA(n_components = 3).fit_transform(iris.data)
#SCATTERPLOT 3D
fig = plt.figure()
ax = Axes3D(fig)
ax.set_title('Iris Dataset by PCA', size = 14)
ax.scatter(x_reduced[:, 0], x_reduced[:, 1], x_reduced[:. 2], c = species)
ax.set_xlabel('First Eigenvector')
ax.set_ylabel('Second Eigenvector')
ax.set_zlabel('Third Eigenvector')
ax.w_xaxis.set_ticklabels(())
ax.w_yaxis.set_ticklabels(())
ax.w_xaxis.set_ticklabels(())
指一系列机器学习方法。最基础的任务是判断新观测数据属于两个类别中的哪一个。在学习阶段,这类分类器把训练数据映射到叫作决策空间(decision space)的多维空间,创建叫作决策边界的分离面,把决策空间分为两个区域。可分为SVR(Support Vector Regression,支持向量回归)和SVC(Support Vector Classification,支持向量分类)。
指重复反馈过程的活动,其目的通常是为了接近并到达所需的目标或结果。每一次对过程的重复被称为一次“迭代”,而每一次迭代得到的结果会被用来作为下一次迭代的初始值。
# Import EarlyStopping
from keras.callbacks import EarlyStopping
# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)
# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience = 2)
# Fit the model
model.fit(predictors, target, epochs=30, validation_split= 0.3, callbacks=[early_stopping_monitor])
Since PCA uses the absolute variance of a feature to rotate the data, a feature with a broader range of values will overpower and bias the algorithm relative to the other features. To avoid this, we must first normalize our data. There are a few methods to do this, but a common way is through standardization, such that all features have a mean = 0 and standard deviation = 1 (the resultant is a z-score).
random_state就是为了保证程序每次运行都分割一样的训练集合测试集。否则,同样的算法模型在不同的训练集和测试集上的效果不一样。 当你用sklearn分割完测试集和训练集,确定模型和初始参数以后,你会发现程序每运行一次,都会得到不同的准确率,无法调参。这个时候就是因为没有加random_state。加上以后就可以调参了。
查准率(precision)与查全率(recall):
F1:基于查准率与查全率的调和平均:
MaxPooling
. This passes a (2, 2) moving window over the image and downscales the image by outputting the maximum value within the window.Conv2D
. This adds a third convolutional layer since deeper models, i.e. models with more convolutional layers, are better able to learn features from images.Dropout
. This prevents the model from overfitting, i.e. perfectly remembering each image, by randomly setting 25% of the input units to 0 at each update during training.Flatten
. As its name suggests, this flattens the output from the convolutional part of the CNN into a one-dimensional feature vector which can be passed into the following fully connected layers.Dense
. Fully connected layer where every input is connected to every output.Dropout
. Another dropout layer to safeguard against overfitting, this time with a rate of 50%.Flask is super easy and used for a lot of API development in data engineering and for productionizing machine learning models.
Start: Pick something you are interested in And solve a problem (Can be soccer betting for all I know)
Step 4 Automate data collection, transformation Upload, and analysis