Data investigation methodology (HPC)
Type: Normative
Department: system design
Curriculum
Semester | Credits | Reporting |
8 | 5 | Setoff |
Lectures
Semester | Amount of hours | Lecturer | Group(s) |
8 | 32 | Associate Professor Lyashkevych V. Y. |
Laboratory works
Semester | Amount of hours | Group | Teacher(s) |
8 | 32 |
Опис навчальної дисципліни
The educational discipline is designed in such a way as to provide the participants with the necessary knowledge necessary to master the basic concepts of data, the particularities of using data, the use of technologies for working with data and the technology and methodology of data research, and to solve various problems in the field of data science and artificial intelligence systems.
The discipline presents an overview of the basic tools for working with data, knowledge, tools that are needed to solve typical tasks when using, setting up environments and technologies for working with data to solve problems in the field of data science.
Recommended Literature
- Christopher M. Bishop (2018) Pattern Recognition and Machine Learning, 738p.
- Sarah Guido (2016) Introduction to Machine Learning with Python: A Guide for Data Scientists, 400p.
- EMC Education Services (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, 432p.
- Cole Nussbaumer Knaflic (2015) Storytelling with Data: A Data Visualization Guide for Business Professionals, 288p.
- Peter Bruce (2017) Statistics for Data Scientists: 50 Essential Concepts, 298p
- Data Mining: The Complete Guide. – Columbia Engineering, 2023. URL: https://bootcamp.cvn.columbia.edu/blog/data-mining-guide/
- Paul Crickard. Data Engineering with Python – Birmingham: Packt Publishing, 2020. – 337 p. – ISBN 978-1-83921-418-9.
- Wang L., Fu X. Data Mining with Computational Intelligence. –Springer, 2005. –280 p.
- Wes McKinney. Python for Data Analysis – Sebastopol: O’Reilly Media, 2018. – 522 p. – ISBN 978-1-491-95766-0.
- Joakim Sundnes. Introduction to Scientific Programming with Python – Lysaker: Simula SpringerBriefs, 2020, Volume 6. – ISBN: 978-3-030-50355-0. (eBook)
- Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser. Data Structures & Algorithms in Python. Wiley: Courier Westford, 2013. – 748 p. (eBook)
- Massimo di Pierro. Annotated Algorithms in Python – Chicago: Experts4Solutions, 2017. – 227 p. – ISBN: 978-0-9911604-0-2.
- Allen B. Downey. Think Stats. Exploratory Data Analysis in Python – Needham: Green Tea Press, 2014. – 244 p.
- Jake VanderPlas. Python Data Science Handbook – Sebastopol: O`Reilly Media, 2017. – 517 p. – ISBN: 978-1-491-91205-8.
- Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: concepts and techniques – Waltham: Elsevier, 2012. – 703 p.
- Peter Bruce, Andrew Bruce, Peter Gedeck. Practical Statistics for Data Scientists. – Sebastopol: O`Reilly, 2020. – 329 p. – ISBN: 978-1-492-07294-2.
- Brian Godsey. Think Like a Data Scientist. – Shelter Island: Manning Publications, 2017. – 299 p. – ISBN: 9781633430273.
- Meher Krishna Patel. Pandas Guide. – May, 2020. – 62 p.
- Aurelien Geron. Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow. – Sebastopol: O`Reilly, 2019. – 482 p. – ISBN: 978-1-492-03264-9.
- Lewandowska, A.; Joachimiak-Lechman, K.; Kurczewski, P. A Dataset Quality Assessment—An Insight and Discussion on Selected Elements of Environmental Footprints Methodology. Energies 2021, 14, 5004. https://doi.org/10.3390/en14165004
- Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. Data Quality Assessment / Communications of the ACM, Volume 45, Issue 4, April 2002 pp. 211–218. – https://doi.org/10.1145/505248.506010
- J. Bicevskis, Z. Bicevska, A. Nikiforova and I. Oditis, “An Approach to Data Quality Evaluation,” 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2018, pp. 196-201, doi: 10.1109/SNAMS.2018.8554915.
- Mats Bergdahl, Manfred Ehling, Eva Elvers and others. Handbook on Data Quality Assessment Methods and Tools. – Wiesbaden, 2007. – 139 p.
- The Ultimate Guide to Basic Data Cleaning: Atlan, 2014. – 66 p.
- Dr. Ossama Embarak. Data Analysis and Visualization Using Python – Abu Dhabi: Apress Media LLC, 2018. – 374 p. – ISBN-13 (pbk): 978-1-4842-4108-0.
- Dimensionality reduction [Режим доступу]: http://bioconductor.org/books/3.15/OSCA.basic/dimensionality-reduction.html
- Data exploration with alluvial plots [Режим доступу]: https://www.datisticsblog.com/2018/10/intro_easyalluvial/#features
- Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow: O’Reilly, 2017. – 718 p.
- Rezaul Karim, Mahedi Kaysar. Large Scale Machine Learning with Spark: Packt Publishing, 2016. – 472 p.
- Andrew Ng. Machine Learning Yarning. – [Електронний ресурс]. – Режим доступу: https://nessie.ilab.sztaki.hu/~kornai/2020/AdvancedMachineLearning/Ng_MachineLearningYearning.pdf
- Bishop C. M. Pattern Recognition and Machine Learning. Springer, 2006.
- Mohammed J. Zaki, Wagner Meira Jr. Data Mining and Analysis. Fundamental Concepts and Algorithms. Cambridge University Press, 2014.
- Charu C. Aggarwal. Recommender Systems: Springer, 2016. – 518 p.
- Kishan G. Mehrotra Chilukuri K. Mohan HuaMing Huang. Anomaly Detection Principles and Algorithms: Springer. – 2017. – 229 p. – DOI: https://doi.org/10.1007/978-3-319-67526-8
- Machine Learning in Computer Vision / N. Sebe, Ira Cohen, Ashutosh Garg, Thomas S. Huang// Springer, 2005. – 249 p. – Режим доступу:
http://silverio.net.br/heitor/disciplinas/eeica/papers/Livros/[Sebe]%20-%20Machine%20Learning%20in%20Computer%20Vision.pdf - Mark Richards. Software Architecture Patterns. – Sebastopol: O`Reilly Media, 2015. – 45 p. – ISBN: 978-1-491-92424-2.
- Tomcy John, Pankaj Misra. Data Lake for Enterprises. – Packt Publishing, 2017. – 855p.
- Viktor Mayer-Schonberger, Kenneth Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work and Think, 256 p.
- Alex Holmes. Hadoop in Practice: Manning Publications, 2012. – 537 p. – Режим доступу: https://ia600201.us.archive.org/7/items/HadoopInPractice/Hadoop%20in%20Practice.pdf
- Apache HBase Team. Apache HBase ™ Reference Guide. – [Електронний ресурс]. – Режим доступу: https://hbase.apache.org/apache_hbase_reference_guide.pdf
- Google. Cloud Bigtable. – [Електронний ресурс]. – Режим доступу: https://cloud.google.com/bigtable