🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (2024)

1. Introduction

As a software developer I have had to face different challenges throughout my career. The use of Python as a programming language is becoming more and more widespread, and it serves as a basis for web development, AI, crypto, etc.

Big Data, according to this Oracle article, encompasses the phenomenon of “larger and more complex data sets.” In software development with Big Data, the volume of data and the speed of processing are fundamental. I consider that a large part of the development life cycle depends on these two, and due to their nature, it is to be expected that these processes are carried out on equipment whose hardware allows the rapid and efficient management of volumes ranging from thousands to billions of data.

However, in the middle of 2024, I think that circ*mstances have changed in favor of developers, so in this article, I will show you how you can do Big Data on an everyday laptop (- $300 USD) using open-source tools and optimization techniques.

2. Realistic minimum software requirements

Hewlett-Packard, also known as HP, is a multinational company considered one of the leaders in technology and the creation of computer equipment. They explain here what, according to them, are the minimum requirements that data science demands to do big data. The list of requirements they share is clear:

  • Min. 16GB of RAM memory
  • A GPU with a minimum of 4GB of memory (they emphasize the use of NVIDIA as an option for GPUs)
  • Intel® Core™ i7, i9, and Xeon®2 processor, with a minimum of 4 cores and a base speed of 2.0GHz
  • Windows 11 or Ubuntu operating system

However, not everyone can have such equipment at their fingertips. In my personal case, I code with a laptop ASUS Vivobook that I bought for $295 USD, which has:

  • 8GB RAM
  • An Intel i5-1135G7 4-core processor
  • An integrated Intel Iris Xe Graphics GPU.

So, in this article, we will take these requirements as the minimum (and we will even see if we can reduce them even further) for the development of Big Data.

3. Work tools

These are the tools we are going to use:

  • Pandas and NumPy
  • Dask
  • Google Co

3.1. Pandas and NumPy

Pandas and NumPy are two Python libraries popularly used in data science. They are used for data manipulation and scientific computing, respectively. We will use these because they can efficiently handle data structures and multidimensional arrays, which will help us deal with large amounts of data.

3.2. Dask

Dask is a library very similar in its use to Pandas and Numpy, with the difference that it is focused on large-scale distributed data processing. We will use it since we are interested in its efficiency, being able to process large amounts of data sets.

3.3. Google Collaboratory (Colab)

For a last use case, we will use the service Google Collaboratory to run Python code from the web browser. We will use it for its ability to access GPUs for free and TPUs for the use of the aforementioned libraries. It also has subscription plans for access to more powerful cloud computers. Alternatively, you can do the code locally in case you have the necessary hardware and want to do the test anyway.

Finally, this is how our work environment will look like:

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (1)

4. Exercise

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (2)

We are going to do a simple ETL using Google Colab and the mentioned libraries.

4.1. Create a Google Colab Notebook

We will create a Google Colab notebook using the following link.

4.2. Import modules

4.3. Create dataset

We are going to create an example dataset for the exercise. For this, we will create a new code block in the notebook and execute the following script:

This will create a new dummy dataset called restaurant_reviews.csv.

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (3)

4.3.1. Check dataset size

In another block of code, we are going to execute the following script to validate the size of the created dataset.

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (4)

4.4. Extract using Pandas & Dask

In another block of code, we are going to perform data loading to a data frame in Python, we are going to compare the loading speeds of both Pandas and Dask:

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (5)

We conclude that, for this case, Dask strongly outperforms pandas in extracting information from the dataset.

4.5. Transform & clean data using Pandas & NumPy

Now, we will perform some data cleaning and transformation using both Pandas and Numpy functions. We will start using Pandas functions:

Then, we will use NumPy functions to continue with the data frame transformation process:

4.6. Load data into DB via API

Now, we are going to simulate an upload process using a fake normalization template. This will allow us to convert each data frame entry into a request to a rest dummy api that will simulate the upload of data to a server.

5. Conclusions

In summary, this article serves as a comprehensive guide for developers who want to address Big Data challenges efficiently on affordable laptops.

By using Python, Google Colab, and core libraries like Pandas, NumPy, and Dask, users can successfully manage huge data sets with ease. Performance comparison, data cleansing processes, and simulated data loads underscore the practicality and affordability of these tools, allowing developers to do complicated tasks seamlessly.

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (2024)
Top Articles
Best VPNs for China Still Working in November 2023
DWP PIP Rates 2024 | How Much Have They Increased?
Po Box 7250 Sioux Falls Sd
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Avonlea Havanese
Obituary (Binghamton Press & Sun-Bulletin): Tully Area Historical Society
Words From Cactusi
Best Theia Builds (Talent | Skill Order | Pairing + Pets) In Call of Dragons - AllClash
Barstool Sports Gif
Acbl Homeport
Azeroth Pilot Reloaded - Addons - World of Warcraft
Bros Movie Wiki
Springfield Mo Craiglist
Love In The Air Ep 9 Eng Sub Dailymotion
Midlife Crisis F95Zone
Craftology East Peoria Il
Eva Mastromatteo Erie Pa
Mzinchaleft
Palm Coast Permits Online
NHS England » Winter and H2 priorities
Bj Alex Mangabuddy
Unity - Manual: Scene view navigation
Governor Brown Signs Legislation Supporting California Legislative Women's Caucus Priorities
Hampton University Ministers Conference Registration
Jordan Poyer Wiki
How to Make Ghee - How We Flourish
Walmart Pharmacy Near Me Open
Beaufort 72 Hour
Kroger Feed Login
4Oxfun
JVID Rina sauce set1
Marokko houdt honderden mensen tegen die illegaal grens met Spaanse stad Ceuta wilden oversteken
Ou Football Brainiacs
Miles City Montana Craigslist
Angel Haynes Dropbox
Publix Christmas Dinner 2022
Craftsman Yt3000 Oil Capacity
Motor Mounts
Kamzz Llc
4083519708
Second Chance Apartments, 2nd Chance Apartments Locators for Bad Credit
Pain Out Maxx Kratom
6576771660
Here's Everything You Need to Know About Baby Ariel
Lady Nagant Funko Pop
Crigslist Tucson
Devotion Showtimes Near Showplace Icon At Valley Fair
552 Bus Schedule To Atlantic City
Diccionario De Los Sueños Misabueso
Sam's Club Fountain Valley Gas Prices
Latest Posts
Article information

Author: Delena Feil

Last Updated:

Views: 6216

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.