Working with Big Data in Python

Watch Working with Big Data in Python

  • 2018
  • 1 Season

Working with Big Data in Python is a comprehensive course offered by Packt Publishing that delves into the intricacies of working with large sets of data using Python. The course provides an in-depth understanding of the various tools and libraries utilized to handle big data, making it an important asset for anyone who wants to master the art of big data processing.

The course is designed to cater to the needs of all levels of Python developers, right from beginners who are just starting with Python to those who already have some experience in working with Python programming. The instructor of the course is a data expert who takes you through the various concepts and tools needed to handle big data in Python, including data processing, cleaning, visualizing, and analyzing large datasets.

Working with Big Data in Python is a hands-on course that provides you with real-world examples from industry, making it easier for you to understand how big data can be used in practical applications. The course covers various big data frameworks such as Hadoop, Spark, and Dask, giving you a complete understanding of the big data ecosystem and how different tools fit together. The course also includes sections on machine learning, deep learning, and neural networks, providing you with an insight into how these techniques can be used to analyze and uncover hidden patterns within big data.

The course begins with an introduction to big data and its applications in various industries, including healthcare, finance, and retail. It then dives into Python programming basics, covering topics such as data types, functions, loops, and conditional statements. The course then covers Python libraries such as NumPy, pandas, matplotlib, and seaborn, providing an in-depth look at how they can be used to process, clean, and visualize large datasets.

The course then moves on to big data frameworks such as Hadoop and Spark, providing an overview of their architecture and how they can be used to process large datasets. The course covers Apache Hadoop components such as HDFS (Hadoop Distributed File System) and YARN (Yet Another Resource Negotiator). The course also covers Spark components such as Spark SQL, Spark Streaming, and Spark MLlib, providing an insight into how Spark can be used for big data processing and machine learning.

The course then covers Dask, a parallel computing library in Python designed to handle big datasets. The course provides an overview of Dask's architecture and how it can be used to process large datasets in parallel. The course also covers machine learning algorithms such as linear regression, decision trees, and random forests, providing an insight into how these techniques can be used to analyze and predict trends and patterns within big data.

The course then moves on to deep learning and neural networks, providing an overview of these techniques and how they can be used to analyze big data. The course covers popular deep learning frameworks such as TensorFlow and Keras, providing an insight into how these frameworks can be used to build deep learning models for big data.

Working with Big Data in Python is a comprehensive course that covers all aspects of big data processing using Python. The course provides an in-depth understanding of big data frameworks, libraries, and techniques, making it an important asset for anyone who wants to work with big data. The course is designed for all levels of Python developers and provides real-world examples from industry, making it easier for learners to apply the knowledge gained in practical applications. Overall, Working with Big Data in Python is an essential course for anyone interested in mastering the art of big data processing using Python.

Working with Big Data in Python is a series that is currently running and has 1 seasons (19 episodes). The series first aired on February 20, 2018.

Filter by Source

Seasons
Predicting Up Votes Using pyspark.ml
21. Predicting Up Votes Using pyspark.ml
February 20, 2018
The popularity of Reddit comments depends on lots of aspects, can a predictive model help understand how to have popular posts? Let us explore and find an answer to this in this video.
Preparing Data for Prediction Task Using spark.ml
20. Preparing Data for Prediction Task Using spark.ml
February 20, 2018
Spark.ml requires dataframe columns to be of type Vector. Spark dataframe columns can store data of many different types.
Loading Data from MongoDB in Spark, Transform into Pandas DF
19. Loading Data from MongoDB in Spark, Transform into Pandas DF
February 20, 2018
Data stored in MongoDB needs to be made available to Spark in a data structure that is valid.
Making Reddit Data Available to PySpark
18. Making Reddit Data Available to PySpark
February 20, 2018
JSON data often exists in data dumps rather than being extracted from an API incrementally. Reddit is a popular site for posting and commenting.
Connecting to MongoDB with PySpark
17. Connecting to MongoDB with PySpark
February 20, 2018
If our data resides in MongoDB, we need to extract it into a Spark data structure to analyze it.
Data Structures in Spark
15. Data Structures in Spark
February 20, 2018
Spark data structures are key to building effective processing pipelines; understand the difference between RDDs and dataframes.
What Is Spark and When Do We Need It?
14. What Is Spark and When Do We Need It?
February 20, 2018
Modern datasets are challenging to process as our memory and processing needs are large and variable. Spark helps to scale analysis over a cluster of processors.
Querying Weather Data from MongoDB
13. Querying Weather Data from MongoDB
February 20, 2018
Raw data doesn't provide insights; develop pipeline aggregation operations to summarise and filter data in an iterative fashion.
Grabbing Weather Data via OpenWeather API
11. Grabbing Weather Data via OpenWeather API
February 20, 2018
Web APIs are a common source of data, learn how to leverage pyMongo and requests to extract useful information from API data.
Using Operators, Updates, and Aggregations
10. Using Operators, Updates, and Aggregations
February 20, 2018
Using what you have learnt about finding and matching documents also learn how to use operators and the aggregate function to calculate aggregated statistics and update documents.
Return Codes and Exceptions
9. Return Codes and Exceptions
February 20, 2018
MongoDB is a highly scalable database capable of many simultaneous connections. Sometimes this causes errors in operations. Learn how to deal with these errors.
Inserting and Finding Documents
8. Inserting and Finding Documents
February 20, 2018
MongoDB provides a rich syntax to precisely control which data is returned from a query using the query and projection operators. Learn how to specify these arguments.
Using pyMongo Cursors
7. Using pyMongo Cursors
February 20, 2018
Often our queries return large numbers of documents through cursors. Let us learn how cursors work, so we can deal with these queries without a large memory footprint.
Setting Up pyMongo
6. Setting Up pyMongo
February 20, 2018
pyMongo is the Python API for MongoDB. Let us learn how to get up and running with pyMongo.
Setting Up MongoDB and Running Our First MongoDB Query
5. Setting Up MongoDB and Running Our First MongoDB Query
February 20, 2018
Get up and running with fundamental MongoDB operations like creating a database and storing/retrieving documents through the Mongo shell.
MongoDB Indices and Datatypes
4. MongoDB Indices and Datatypes
February 20, 2018
MongoDB, like SQL can use indexed data to speed up common queries.
From Tabular Data to JSON Documents
3. From Tabular Data to JSON Documents
February 20, 2018
Often, we consider data in tabular form, but JSON data is the data format of modern web applications. JavaScript Object Notation is a natural format for big data applications.
What Is MongoDB and Why Should I Use It?
2. What Is MongoDB and Why Should I Use It?
February 20, 2018
Let us explore and discover what a non-relational database is. MongoDB is a non-relational database.
The Course Overview
1. The Course Overview
February 20, 2018
This video provides an overview of the entire course.
Description
Where to Watch Working with Big Data in Python
Working with Big Data in Python is available for streaming on the Packt Publishing website, both individual episodes and full seasons. You can also watch Working with Big Data in Python on demand at Amazon.
  • Premiere Date
    February 20, 2018