Watch Working with Big Data in Python

2018
1 Season

Working with Big Data in Python is a comprehensive course offered by Packt Publishing that delves into the intricacies of working with large sets of data using Python. The course provides an in-depth understanding of the various tools and libraries utilized to handle big data, making it an important asset for anyone who wants to master the art of big data processing.

The course is designed to cater to the needs of all levels of Python developers, right from beginners who are just starting with Python to those who already have some experience in working with Python programming. The instructor of the course is a data expert who takes you through the various concepts and tools needed to handle big data in Python, including data processing, cleaning, visualizing, and analyzing large datasets.

Working with Big Data in Python is a hands-on course that provides you with real-world examples from industry, making it easier for you to understand how big data can be used in practical applications. The course covers various big data frameworks such as Hadoop, Spark, and Dask, giving you a complete understanding of the big data ecosystem and how different tools fit together. The course also includes sections on machine learning, deep learning, and neural networks, providing you with an insight into how these techniques can be used to analyze and uncover hidden patterns within big data.

The course begins with an introduction to big data and its applications in various industries, including healthcare, finance, and retail. It then dives into Python programming basics, covering topics such as data types, functions, loops, and conditional statements. The course then covers Python libraries such as NumPy, pandas, matplotlib, and seaborn, providing an in-depth look at how they can be used to process, clean, and visualize large datasets.

The course then moves on to big data frameworks such as Hadoop and Spark, providing an overview of their architecture and how they can be used to process large datasets. The course covers Apache Hadoop components such as HDFS (Hadoop Distributed File System) and YARN (Yet Another Resource Negotiator). The course also covers Spark components such as Spark SQL, Spark Streaming, and Spark MLlib, providing an insight into how Spark can be used for big data processing and machine learning.

The course then covers Dask, a parallel computing library in Python designed to handle big datasets. The course provides an overview of Dask's architecture and how it can be used to process large datasets in parallel. The course also covers machine learning algorithms such as linear regression, decision trees, and random forests, providing an insight into how these techniques can be used to analyze and predict trends and patterns within big data.

The course then moves on to deep learning and neural networks, providing an overview of these techniques and how they can be used to analyze big data. The course covers popular deep learning frameworks such as TensorFlow and Keras, providing an insight into how these frameworks can be used to build deep learning models for big data.

Working with Big Data in Python is a comprehensive course that covers all aspects of big data processing using Python. The course provides an in-depth understanding of big data frameworks, libraries, and techniques, making it an important asset for anyone who wants to work with big data. The course is designed for all levels of Python developers and provides real-world examples from industry, making it easier for learners to apply the knowledge gained in practical applications. Overall, Working with Big Data in Python is an essential course for anyone interested in mastering the art of big data processing using Python.

Working with Big Data in Python is a series that ran for 1 seasons (19 episodes) between February 20, 2018 and on Packt Publishing

Filter by Source

Rent or Buy

Amazon

Seasons

21. Predicting Up Votes Using pyspark.ml

February 20, 2018

The popularity of Reddit comments depends on lots of aspects, can a predictive model help understand how to have popular posts? Let us explore and find an answer to this in this video.

20. Preparing Data for Prediction Task Using spark.ml

February 20, 2018

Spark.ml requires dataframe columns to be of type Vector. Spark dataframe columns can store data of many different types.

19. Loading Data from MongoDB in Spark, Transform into Pandas DF

February 20, 2018

Data stored in MongoDB needs to be made available to Spark in a data structure that is valid.

18. Making Reddit Data Available to PySpark

February 20, 2018

JSON data often exists in data dumps rather than being extracted from an API incrementally. Reddit is a popular site for posting and commenting.

17. Connecting to MongoDB with PySpark

February 20, 2018

If our data resides in MongoDB, we need to extract it into a Spark data structure to analyze it.

15. Data Structures in Spark

February 20, 2018

Spark data structures are key to building effective processing pipelines; understand the difference between RDDs and dataframes.

14. What Is Spark and When Do We Need It?

February 20, 2018

Modern datasets are challenging to process as our memory and processing needs are large and variable. Spark helps to scale analysis over a cluster of processors.

13. Querying Weather Data from MongoDB

February 20, 2018

Raw data doesn't provide insights; develop pipeline aggregation operations to summarise and filter data in an iterative fashion.

11. Grabbing Weather Data via OpenWeather API

February 20, 2018

Web APIs are a common source of data, learn how to leverage pyMongo and requests to extract useful information from API data.

10. Using Operators, Updates, and Aggregations

February 20, 2018

Using what you have learnt about finding and matching documents also learn how to use operators and the aggregate function to calculate aggregated statistics and update documents.

9. Return Codes and Exceptions

February 20, 2018

MongoDB is a highly scalable database capable of many simultaneous connections. Sometimes this causes errors in operations. Learn how to deal with these errors.

8. Inserting and Finding Documents

February 20, 2018

MongoDB provides a rich syntax to precisely control which data is returned from a query using the query and projection operators. Learn how to specify these arguments.

7. Using pyMongo Cursors

February 20, 2018

Often our queries return large numbers of documents through cursors. Let us learn how cursors work, so we can deal with these queries without a large memory footprint.

6. Setting Up pyMongo

February 20, 2018

pyMongo is the Python API for MongoDB. Let us learn how to get up and running with pyMongo.

5. Setting Up MongoDB and Running Our First MongoDB Query

February 20, 2018

Get up and running with fundamental MongoDB operations like creating a database and storing/retrieving documents through the Mongo shell.

4. MongoDB Indices and Datatypes

February 20, 2018

MongoDB, like SQL can use indexed data to speed up common queries.

3. From Tabular Data to JSON Documents

February 20, 2018

Often, we consider data in tabular form, but JSON data is the data format of modern web applications. JavaScript Object Notation is a natural format for big data applications.

2. What Is MongoDB and Why Should I Use It?

February 20, 2018

Let us explore and discover what a non-relational database is. MongoDB is a non-relational database.

1. The Course Overview

February 20, 2018

This video provides an overview of the entire course.

Description

Where to Watch Working with Big Data in Python

Working with Big Data in Python is available for streaming on the Packt Publishing website, both individual episodes and full seasons. You can also watch Working with Big Data in Python on demand at Amazon.