المراجع¶ Open the notebook in Colab
- Bahdanau et al., 2014
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Bishop, 1995
Bishop, C. M. (1995). Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1), 108–116.
- Bishop, 2006
Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
- Bollobas, 1999
Bollobás, B. (1999). Linear analysis. Cambridge University Press, Cambridge.
- Brown & Sandholm, 2017
Brown, N., & Sandholm, T. (2017). Libratus: the superhuman ai for no-limit poker. IJCAI (pp. 5226–5228).
- Campbell et al., 2002
Campbell, M., Hoane Jr, A. J., & Hsu, F.-h. (2002). Deep blue. Artificial intelligence, 134(1-2), 57–83.
- Csiszar, 2008
Csiszár, I. (2008). Axiomatic characterizations of information measures. Entropy, 10(3), 261–273.
- Edelman et al., 2007
Edelman, B., Ostrovsky, M., & Schwarz, M. (2007). Internet advertising and the generalized second-price auction: selling billions of dollars worth of keywords. American economic review, 97(1), 242–259.
- Ginibre, 1965
Ginibre, J. (1965). Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), 440–449.
- Goodfellow et al., 2016
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
- Goodfellow et al., 2014
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems (pp. 2672–2680).
- Hebb & Hebb, 1949
Hebb, D. O., & Hebb, D. (1949). The organization of behavior. Vol. 65. Wiley New York.
- Hochreiter & Schmidhuber, 1997
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.
- Hu et al., 2018
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
- Jia et al., 2018
Jia, X., Song, S., He, W., Wang, Y., Rong, H., Zhou, F., … others. (2018). Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes. arXiv preprint arXiv:1807.11205.
- Karras et al., 2017
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
- Koller & Friedman, 2009
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press.
- Kolter, 2008
Kolter, Z. (2008). Linear algebra review and reference. Available online: http.
- LeCun et al., 1998
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., & others. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
- Li, 2017
Li, M. (2017). Scaling Distributed Machine Learning with System and Algorithm Co-design (Doctoral dissertation). PhD Thesis, CMU.
- Lin et al., 2010
Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., … others. (2010). Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge.
- McCulloch & Pitts, 1943
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115–133.
- Morey et al., 2016
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic bulletin & review, 23(1), 103–123.
- Neyman, 1937
Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 236(767), 333–380.
- Pennington et al., 2017
Pennington, J., Schoenholz, S., & Ganguli, S. (2017). Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. Advances in neural information processing systems (pp. 4785–4795).
- Petersen et al., 2008
Petersen, K. B., Pedersen, M. S., & others. (2008). The matrix cookbook. Technical University of Denmark, 7(15), 510.
- Reed & DeFreitas, 2015
Reed, S., & De Freitas, N. (2015). Neural programmer-interpreters. arXiv preprint arXiv:1511.06279.
- Rumelhart et al., 1988
Rumelhart, D. E., Hinton, G. E., Williams, R. J., & others. (1988). Learning representations by back-propagating errors. Cognitive modeling, 5(3), 1.
- Shannon, 1948
Shannon, C. E. (1948 , 7). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.
- Silver et al., 2016
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … others. (2016). Mastering the game of go with deep neural networks and tree search. nature, 529(7587), 484.
- Srivastava et al., 2014
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.
- Strang, 1993
Strang, G. (1993). Introduction to linear algebra. Vol. 3. Wellesley-Cambridge Press Wellesley, MA.
- Sukhbaatar et al., 2015
Sukhbaatar, S., Weston, J., Fergus, R., & others. (2015). End-to-end memory networks. Advances in neural information processing systems (pp. 2440–2448).
- VanLoan & Golub, 1983
Van Loan, C. F., & Golub, G. H. (1983). Matrix computations. Johns Hopkins University Press.
- Wasserman, 2013
Wasserman, L. (2013). All of statistics: a concise course in statistical inference. Springer Science & Business Media.
- Watkins & Dayan, 1992
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3-4), 279–292.
- Xiong et al., 2018
Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., & Stolcke, A. (2018). The microsoft 2017 conversational speech recognition system. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5934–5938).
- You et al., 2017
You, Y., Gitman, I., & Ginsburg, B. (2017). Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888.
- Zhu et al., 2017
Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).