Data investigation methodology (HPC)

Type: Normative

Department: system design

Curriculum

SemesterCreditsReporting
85Setoff

Lectures

SemesterAmount of hoursLecturerGroup(s)
832Associate Professor Lyashkevych V. Y.

Laboratory works

SemesterAmount of hoursGroupTeacher(s)
832

Опис навчальної дисципліни

The educational discipline is designed in such a way as to provide the participants with the necessary knowledge necessary to master the basic concepts of data, the particularities of using data, the use of technologies for working with data and the technology and methodology of data research, and to solve various problems in the field of data science and artificial intelligence systems.

The discipline presents an overview of the basic tools for working with data, knowledge, tools that are needed to solve typical tasks when using, setting up environments and technologies for working with data to solve problems in the field of data science.

Recommended Literature

  1. Christopher M. Bishop (2018) Pattern Recognition and Machine Learning, 738p. 
  2. Sarah Guido (2016) Introduction to Machine Learning with Python: A Guide for Data Scientists, 400p. 
  3. EMC Education Services (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, 432p. 
  4. Cole Nussbaumer Knaflic (2015) Storytelling with Data: A Data Visualization Guide for Business Professionals, 288p. 
  5. Peter Bruce (2017) Statistics for Data Scientists: 50 Essential Concepts, 298p
  6. Data Mining: The Complete Guide. – Columbia Engineering, 2023. URL: https://bootcamp.cvn.columbia.edu/blog/data-mining-guide/ 
  7. Paul Crickard. Data Engineering with Python – Birmingham: Packt Publishing, 2020. – 337 p. – ISBN 978-1-83921-418-9. 
  8. Wang L., Fu X. Data Mining with Computational Intelligence. –Springer, 2005. –280 p. 
  9. Wes McKinney. Python for Data Analysis – Sebastopol: O’Reilly Media, 2018. – 522 p. – ISBN 978-1-491-95766-0.
  10. Joakim Sundnes. Introduction to Scientific Programming with Python – Lysaker: Simula SpringerBriefs, 2020, Volume 6. – ISBN: 978-3-030-50355-0. (eBook)
  11. Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser. Data Structures & Algorithms in Python. Wiley: Courier Westford, 2013. – 748 p. (eBook)
  12. Massimo di Pierro. Annotated Algorithms in Python – Chicago: Experts4Solutions, 2017. – 227 p. – ISBN: 978-0-9911604-0-2.
  13. Allen B. Downey. Think Stats. Exploratory Data Analysis in Python – Needham: Green Tea Press, 2014. – 244 p.
  14. Jake VanderPlas. Python Data Science Handbook – Sebastopol: O`Reilly Media, 2017. – 517 p. – ISBN: 978-1-491-91205-8.
  15.  Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: concepts and techniques – Waltham: Elsevier, 2012. – 703 p. 
  16.  Peter Bruce, Andrew Bruce, Peter Gedeck. Practical Statistics for Data Scientists. – Sebastopol: O`Reilly, 2020. – 329 p. – ISBN: 978-1-492-07294-2.
  17.  Brian Godsey. Think Like a Data Scientist. – Shelter Island: Manning Publications, 2017. – 299 p. – ISBN: 9781633430273.
  18.  Meher Krishna Patel. Pandas Guide. – May, 2020. – 62 p.
  19.  Aurelien Geron. Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow. – Sebastopol: O`Reilly, 2019. – 482 p. – ISBN: 978-1-492-03264-9.
  20.  Lewandowska, A.; Joachimiak-Lechman, K.; Kurczewski, P. A Dataset Quality Assessment—An Insight and Discussion on Selected Elements of Environmental Footprints Methodology. Energies 2021, 14, 5004. https://doi.org/10.3390/en14165004
  21.  Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. Data Quality Assessment / Communications of the ACM, Volume 45, Issue 4, April 2002 pp. 211–218. – https://doi.org/10.1145/505248.506010 
  22. J. Bicevskis, Z. Bicevska, A. Nikiforova and I. Oditis, “An Approach to Data Quality Evaluation,” 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2018, pp. 196-201, doi: 10.1109/SNAMS.2018.8554915.
  23. Mats Bergdahl, Manfred Ehling, Eva Elvers and others. Handbook on Data Quality Assessment Methods and Tools. – Wiesbaden, 2007. – 139 p.
  24. The Ultimate Guide to Basic Data Cleaning: Atlan, 2014. – 66 p.
  25. Dr. Ossama Embarak. Data Analysis and Visualization Using Python – Abu Dhabi: Apress Media LLC, 2018. – 374 p. – ISBN-13 (pbk): 978-1-4842-4108-0. 
  26.  Dimensionality reduction [Режим доступу]: http://bioconductor.org/books/3.15/OSCA.basic/dimensionality-reduction.html 
  27.  Data exploration with alluvial plots [Режим доступу]: https://www.datisticsblog.com/2018/10/intro_easyalluvial/#features 
  28. Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow: O’Reilly, 2017. – 718 p.
  29. Rezaul Karim, Mahedi Kaysar. Large Scale Machine Learning with Spark: Packt Publishing, 2016. – 472 p. 
  30. Andrew Ng. Machine Learning Yarning. – [Електронний ресурс]. – Режим доступу: https://nessie.ilab.sztaki.hu/~kornai/2020/AdvancedMachineLearning/Ng_MachineLearningYearning.pdf 
  31. Bishop C. M. Pattern Recognition and Machine Learning. Springer, 2006. 
  32.  Mohammed J. Zaki, Wagner Meira Jr. Data Mining and Analysis. Fundamental Concepts and Algorithms. Cambridge University Press, 2014. 
  33. Charu C. Aggarwal. Recommender Systems: Springer, 2016. – 518 p.
  34. Kishan G. Mehrotra Chilukuri K. Mohan HuaMing Huang. Anomaly Detection Principles and Algorithms: Springer. – 2017. – 229 p. – DOI: https://doi.org/10.1007/978-3-319-67526-8  
  35. Machine Learning in Computer Vision / N. Sebe, Ira Cohen, Ashutosh Garg, Thomas S. Huang// Springer, 2005. – 249 p. – Режим доступу:
    http://silverio.net.br/heitor/disciplinas/eeica/papers/Livros/[Sebe]%20-%20Machine%20Learning%20in%20Computer%20Vision.pdf  
  36. Mark Richards. Software Architecture Patterns. – Sebastopol: O`Reilly Media, 2015. – 45 p. – ISBN: 978-1-491-92424-2.
  37. Tomcy John, Pankaj Misra. Data Lake for Enterprises. – Packt Publishing, 2017. – 855p. 
  38. Viktor Mayer-Schonberger, Kenneth Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work and Think, 256 p. 
  39. Alex Holmes. Hadoop in Practice: Manning Publications, 2012. – 537 p. – Режим доступу: https://ia600201.us.archive.org/7/items/HadoopInPractice/Hadoop%20in%20Practice.pdf 
  40. Apache HBase Team. Apache HBase ™ Reference Guide. – [Електронний ресурс]. – Режим доступу: https://hbase.apache.org/apache_hbase_reference_guide.pdf
  41. Google. Cloud Bigtable. – [Електронний ресурс]. – Режим доступу: https://cloud.google.com/bigtable

Силабус:

Завантажити силабус