Methods and technologies of data processing (HPC)

Type: Normative

Department: system design

Curriculum

SemesterCreditsReporting
53.5Setoff

Lectures

SemesterAmount of hoursLecturerGroup(s)
532Associate Professor Lyashkevych V. Y.

Laboratory works

SemesterAmount of hoursGroupTeacher(s)
532

Опис навчальної дисципліни

The course is designed to provide participants with the necessary knowledge to master basic concepts related to algorithms, data processing methods and tools, building data pipelines, using metrics and data evaluation tools, data architecture, and interpreting structured and unstructured data. That is why the discipline provides an overview of basic concepts and tools for data processing, as well as tools that are needed to solve typical tasks when building data pipelines, analyzing and visualizing data.

Recommended Literature

  1. Paul Crickard. Data Engineering with Python – Birmingham: Packt Publishing, 2020. – 337 p. – ISBN 978-1-83921-418-9. 
  2. Wes McKinney. Python for Data Analysis – Sebastopol: O’Reilly Media, 2018. – 522 p. – ISBN 978-1-491-95766-0.
  3. Joakim Sundnes. Introduction to Scientific Programming with Python – Lysaker: Simula SpringerBriefs, 2020, Volume 6. – ISBN: 978-3-030-50355-0. (eBook)
  4. Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser. Data Structures & Algorithms in Python. Wiley: Courier Westford, 2013. – 748 p. (eBook)
  5. Numpy community. Numpy User Guide. Release 1.18.4: May 24, 2020. – 166 p. 
  6. Dr. Ossama Embarak. Data Analysis and Visualization Using Python – Abu Dhabi: Apress Media LLC, 2018. – 374 p. – ISBN-13 (pbk): 978-1-4842-4108-0.
  7. Massimo di Pierro. Annotated Algorithms in Python – Chicago: Experts4Solutions, 2017. – 227 p. – ISBN: 978-0-9911604-0-2.
  8. Allen B. Downey. Think Stats. Exploratory Data Analysis in Python – Needham: Green Tea Press, 2014. – 244 p.
  9. Jake VanderPlas. Python Data Science Handbook – Sebastopol: O`Reilly Media, 2017. – 517 p. – ISBN: 978-1-491-91205-8.
  10.  The Ultimate Guide to Basic Data Clearning: Atlan, 2014. – 66 p.
  11.  Jiawei Han, Micheline Kamber, Jian Pei. Data Mining : concepts and techniques – Waltham: Elsevier, 2012. – 703 p. 
  12.  Peter Bruce, Andrew Bruce, Peter Gedeck. Practical Statistics for Data Scientists. – Sebastopol: O`Reilly, 2020. – 329 p. – ISBN: 978-1-492-07294-2.
  13.  Brian Godsey. Think Like a Data Scientist. – Shelter Island: Manning Publications, 2017. – 299 p. – ISBN: 9781633430273.
  14.  Meher Krishna Patel. Pandas Guide. – May, 2020. – 62 p.
  15.  Aurelien Geron. Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow. – Sebastopol: O`Reilly, 2019. – 482 p. – ISBN: 978-1-492-03264-9.
  16.  Lewandowska, A.; Joachimiak-Lechman, K.; Kurczewski, P. A Dataset Quality Assessment—An Insight and Discussion on Selected Elements of Environmental Footprints Methodology. Energies 2021, 14, 5004. https://doi.org/10.3390/en14165004
  17.  Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. Data Quality Assessment / Communications of the ACM, Volume 45, Issue 4, April 2002 pp. 211–218. – https://doi.org/10.1145/505248.506010 
  18. J. Bicevskis, Z. Bicevska, A. Nikiforova and I. Oditis, “An Approach to Data Quality Evaluation,” 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2018, pp. 196-201, doi: 10.1109/SNAMS.2018.8554915.
  19.  Mats Bergdahl, Manfred Ehling, Eva Elvers and others. Handbook on Data Quality Assessment Methods and Tools. – Wiesbaden, 2007. – 139 p.
  20.  Mark Richards. Software Architecture Patterns. – Sebastopol: O`Reilly Media, 2015. – 45 p. – ISBN: 978-1-491-92424-2.
  21.  Dimensionality reduction [Режим доступу]: http://bioconductor.org/books/3.15/OSCA.basic/dimensionality-reduction.html 
  22.  Data exploration with alluvial plots [Режим доступу]: https://www.datisticsblog.com/2018/10/intro_easyalluvial/#features 
  23.  Khaled El Emam, Lucy Mosquera, Richard Hoptroff. Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data: O`Reilly, 2020
  24. Amazon. Lambda Architecture for Batch and Stream Processing. – AWS, 2018. – 12 p.
  25. Tomcy John, Pankaj Misra. Data Lake for Enterprises. – Packt Publishing, 2017. – 855p.

Силабус:

Завантажити силабус