如何使用keras加载下载好的数据集

https://blog.csdn.net/houchaoqun_xmu/article/details/78492718

2017年11月10日 09:57:06 Houchaoqun_XMU 阅读数:15683

 

前言:

 

keras 源码中下载MNIST的方式是 path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz'),数据源是通过 url = https://s3.amazonaws.com/img-datasets/mnist.npz 进行下载的。访问该 url 地址被墙了,导致 MNIST 相关的案例都卡在数据下载的环节。本文主要提供解决方案,让需要的读者可以跑案例的代码感受一下。

 

本文的贡献主要包括如下:

 

1)提供 mnist_npz 数据集;

2)分析了关于 mnist 几个相关的源代码;

3)提供了一种能够顺利运行 keras 源码中 example 下 mnist 的相关案例;

4)找到了另外几种解决方案,提供了相关的链接。

 

numpy.load(path)

 

numpy.load() 函数起到很重要的作用。它可以读取 .npy .npz 等文件类型,并返回对应的数据类型。

1)如果文件类型是 .pny 则返回一个1维数组。

2)如果文件类型是 .npz 则返回一个类似字典的数据类型,包含 {filename: array} 键值对。如,本例中的键值对如下所示:

  1. f = np.load(path)

  2. x_train, y_train = f['x_train'], f['y_train']

  3. x_test, y_test = f['x_test'], f['y_test']

  4. f.close()

详情请参考:https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

 

 

原始 .\keras\examples\mnist_mlp.py

  1. # -*- coding: utf-8 -*-

  2. '''Trains a simple deep NN on the MNIST dataset.

  3.  

  4. Gets to 98.40% test accuracy after 20 epochs

  5. (there is *a lot* of margin for parameter tuning).

  6. 2 seconds per epoch on a K520 GPU.

  7. '''

  8.  

  9. from __future__ import print_function

  10.  

  11. import keras

  12. from keras.datasets import mnist

  13. from keras.models import Sequential

  14. from keras.layers import Dense, Dropout

  15. from keras.optimizers import RMSprop

  16.  

  17.  

  18. batch_size = 128

  19. num_classes = 10

  20. epochs = 20

  21.  

  22. # the data, shuffled and split between train and test sets

  23. (x_train, y_train), (x_test, y_test) = mnist.load_data()

  24.  

  25. x_train = x_train.reshape(60000, 784)

  26. x_test = x_test.reshape(10000, 784)

  27. x_train = x_train.astype('float32')

  28. x_test = x_test.astype('float32')

  29. x_train /= 255

  30. x_test /= 255

  31. print(x_train.shape[0], 'train samples')

  32. print(x_test.shape[0], 'test samples')

  33.  

  34. # convert class vectors to binary class matrices

  35. y_train = keras.utils.to_categorical(y_train, num_classes)

  36. y_test = keras.utils.to_categorical(y_test, num_classes)

  37.  

  38. model = Sequential()

  39. model.add(Dense(512, activation='relu', input_shape=(784,)))

  40. model.add(Dropout(0.2))

  41. model.add(Dense(512, activation='relu'))

  42. model.add(Dropout(0.2))

  43. model.add(Dense(10, activation='softmax'))

  44.  

  45. model.summary()

  46.  

  47. ###

  48. # 1)categorical_crossentropy(output, target, from_logits=False):

  49. # 计算输出张量和目标张量的Categorical crossentropy(类别交叉熵),目标张量与输出张量必须shape相同。

  50. # 多分类的对数损失函数,与softmax分类器相对应的。

  51. #

  52. # 2)RMSprop()

  53. # AdaGrad算法的改进。鉴于神经网络都是非凸条件下的,RMSProp在非凸条件下结果更好,改变梯度累积为指数衰减的移动平均以丢弃遥远的过去历史。

  54. # reference:http://blog.csdn.net/bvl10101111/article/details/72616378

  55. #

  56. model.compile(loss='categorical_crossentropy',

  57. optimizer=RMSprop(),

  58. metrics=['accuracy'])

  59.  

  60. history = model.fit(x_train, y_train,

  61. batch_size=batch_size,

  62. epochs=epochs,

  63. verbose=1,

  64. validation_data=(x_test, y_test))

  65. score = model.evaluate(x_test, y_test, verbose=0)

  66. print('Test loss:', score[0])

  67. print('Test accuracy:', score[1])

 

.\keras\keras\datasets\mnist.py - load_data()

  1. # -*- coding: utf-8 -*-

  2. from ..utils.data_utils import get_file

  3. import numpy as np

  4.  

  5. def load_data(path='mnist.npz'):

  6. """Loads the MNIST dataset.

  7.  

  8. # Arguments

  9. path: path where to cache the dataset locally

  10. (relative to ~/.keras/datasets).

  11.  

  12. # Returns

  13. Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.

  14.  

  15. # numpy.load()

  16. # numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII')

  17. # 1) Load arrays or pickled objects from .npy, .npz or pickled files

  18. # 2)

  19. # reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

  20.  

  21. """

  22. path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz')

  23. f = np.load(path)

  24. x_train, y_train = f['x_train'], f['y_train']

  25. x_test, y_test = f['x_test'], f['y_test']

  26. f.close()

  27. return (x_train, y_train), (x_test, y_test)

 

下载 mnist.npz 数据集

 

本文使用的 mnist.npz 数据集是通过一个 japan 的服务器下载得到的,在此免费分享给大家。如果下载有问题的话,可以留言哈。

 

下载链接:https://pan.baidu.com/s/1jH6uFFC 密码: dw3d

 

 

改造 mnist_mlp.py

 

方法1:

mnist_mlp.py 源码是使用如下命令获取数据集:

  1. # the data, shuffled and split between train and test sets

  2. (x_train, y_train), (x_test, y_test) = mnist.load_data()

调用的是 .\keras\keras\datasets\mnist.py 脚本中的 def load_data(path='mnist.npz') 函数,也就是因为网址被墙了导致不能正常运行的原因。本文事先下好了 mnist.npz 数据集,然后改动了一些代码使之正常运行。换句话说,本文使用的是“读取本地数据集”的方法,步骤如下:

1)下载好 mnist_npz 数据集,并将其放于 .\keras\examples 目录下。

2)改动后的 mnist_mlp.py 代码如下:

  1. # -*- coding: utf-8 -*-

  2. '''Trains a simple deep NN on the MNIST dataset.

  3.  

  4. Gets to 98.40% test accuracy after 20 epochs

  5. (there is *a lot* of margin for parameter tuning).

  6. 2 seconds per epoch on a K520 GPU.

  7. '''

  8.  

  9. from __future__ import print_function

  10.  

  11. import keras

  12. from keras.datasets import mnist

  13. from keras.models import Sequential

  14. from keras.layers import Dense, Dropout

  15. from keras.optimizers import RMSprop

  16.  

  17. batch_size = 128

  18. num_classes = 10

  19. epochs = 20

  20.  

  21. # the data, shuffled and split between train and test sets

  22. # (x_train, y_train), (x_test, y_test) = mnist.load_data()

  23.  

  24. import numpy as np

  25. path='./mnist.npz'

  26. f = np.load(path)

  27. x_train, y_train = f['x_train'], f['y_train']

  28. x_test, y_test = f['x_test'], f['y_test']

  29. f.close()

  30.  

  31. x_train = x_train.reshape(60000, 784).astype('float32')

  32. x_test = x_test.reshape(10000, 784).astype('float32')

  33. x_train /= 255

  34. x_test /= 255

  35. print(x_train.shape[0], 'train samples')

  36. print(x_test.shape[0], 'test samples')

  37.  

  38. # convert class vectors to binary class matrices

  39. # label为0~9共10个类别,keras要求格式为binary class matrices

  40.  

  41. y_train = keras.utils.to_categorical(y_train, num_classes)

  42. y_test = keras.utils.to_categorical(y_test, num_classes)

  43.  

  44. # add by hcq-20171106

  45. # Dense of keras is full-connection.

  46. model = Sequential()

  47. model.add(Dense(512, activation='relu', input_shape=(784,)))

  48. model.add(Dropout(0.2))

  49. model.add(Dense(512, activation='relu'))

  50. model.add(Dropout(0.2))

  51. model.add(Dense(num_classes, activation='softmax'))

  52.  

  53. model.summary()

  54.  

  55. model.compile(loss='categorical_crossentropy',

  56. optimizer=RMSprop(),

  57. metrics=['accuracy'])

  58.  

  59. history = model.fit(x_train, y_train,

  60. batch_size=batch_size,

  61. epochs=epochs,

  62. verbose=1,

  63. validation_data=(x_test, y_test))

  64. score = model.evaluate(x_test, y_test, verbose=0)

  65. print('Test loss:', score[0])

  66. print('Test accuracy:', score[1])

 

运行效果如下所示:

  1. 60000 train samples

  2. 10000 test samples

  3. _________________________________________________________________

  4. Layer (type) Output Shape Param #

  5. =================================================================

  6. dense_1 (Dense) (None, 512) 401920

  7. _________________________________________________________________

  8. dropout_1 (Dropout) (None, 512) 0

  9. _________________________________________________________________

  10. dense_2 (Dense) (None, 512) 262656

  11. _________________________________________________________________

  12. dropout_2 (Dropout) (None, 512) 0

  13. _________________________________________________________________

  14. dense_3 (Dense) (None, 10) 5130

  15. =================================================================

  16. Total params: 669,706

  17. Trainable params: 669,706

  18. Non-trainable params: 0

  19. _________________________________________________________________

  20. Train on 60000 samples, validate on 10000 samples

  21. Epoch 1/20

  22. 2017-11-09 23:06:16.881800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)

  23.  

  24. ... ...

  25.  

  26. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0387 - acc: 0.9888 - val_loss: 0.0706 - val_acc: 0.9814

  27. Epoch 8/20

  28. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0341 - acc: 0.9899 - val_loss: 0.0789 - val_acc: 0.9827

  29. Epoch 9/20

  30. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0304 - acc: 0.9911 - val_loss: 0.0851 - val_acc: 0.9833

  31. Epoch 10/20

  32. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0290 - acc: 0.9918 - val_loss: 0.0867 - val_acc: 0.9818

  33. Epoch 11/20

  34. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0264 - acc: 0.9924 - val_loss: 0.0881 - val_acc: 0.9833

  35. Epoch 12/20

  36. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0261 - acc: 0.9928 - val_loss: 0.1095 - val_acc: 0.9801

  37. Epoch 13/20

  38. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0246 - acc: 0.9931 - val_loss: 0.1012 - val_acc: 0.9830

  39. Epoch 14/20

  40. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0233 - acc: 0.9935 - val_loss: 0.1116 - val_acc: 0.9812

  41. Epoch 15/20

  42. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0223 - acc: 0.9942 - val_loss: 0.1016 - val_acc: 0.9832

  43. Epoch 16/20

  44. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0214 - acc: 0.9943 - val_loss: 0.1053 - val_acc: 0.9832

  45. Epoch 17/20

  46. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0178 - acc: 0.9950 - val_loss: 0.1095 - val_acc: 0.9838

  47. Epoch 18/20

  48. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0212 - acc: 0.9949 - val_loss: 0.1158 - val_acc: 0.9822

  49. Epoch 19/20

  50. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0197 - acc: 0.9951 - val_loss: 0.1112 - val_acc: 0.9831

  51. Epoch 20/20

  52. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0203 - acc: 0.9951 - val_loss: 0.1097 - val_acc: 0.9833

  53. Test loss: 0.109655842465

  54. Test accuracy: 0.9833

方法2:参考该【博文

 

 (x_train, y_train), (x_test, y_test) = mnist.load_data(path='/home/duchao/下载/mnist.npz')

Reference:

 

keras 中文文档:http://keras-cn.readthedocs.io/en/latest/

阅读源码遇到的一些TF、keras函数及问题:http://blog.csdn.net/jsliuqun/article/details/64444302

python读取mnist数据集:https://blog.mythsman.com/2016/01/25/1/

 

本文链接:https://my.lmcjl.com/post/11819.html

展开阅读全文

4 评论

留下您的评论.