Install Python libraries on EMR clusters (2024)

Short description

You can install Python libraries using a bootstrap action.

EMR uses puppet, the deployment mechanism used by Apache BigTop, to configure and initialize applications on instances. Instance-controller is EMR's software component that runs on every instance of the cluster. Instance-controller initializes and then provisions instances based on the instance configuration.

The instance-controller runs the provision-node script at /usr/share/aws/emr/node-provisioner/bin/provision-node to start NodeProvisioner at cluster startup. NodeProvisioner provisions all of the EMR distribution's applications for the node and cluster configuration. NodeProvisioner is treated as a final bootstrap action that runs after all other bootstrap actions are run on each node of the cluster.

Resolution

In the latest EMR clusters, bootstrap actions run before Amazon EMR installs any applications specified at cluster creation. The bootstrap action runs before cluster nodes begin processing data. If you add nodes to a running cluster, then bootstrap actions also run on those nodes in the same way. You can create custom bootstrap actions and specify applications to install when you create your cluster. For more information, see Create bootstrap actions to install additional software.

Troubleshoot libraries installed by bootstrap actions that are overridden by default libraries

Libraries installed using bootstrap actions might be overridden by Amazon EMR default libraries. The bootstrap script runs before cluster creation and before node provisioning. So, libraries might be overridden by the default version.

To avoid this issue, create a delayed bootstrap action or a second stage bootstrap action as running code. Or, install packages after receiving the message NODEPROVISIONSTATE SUCCESSFUL.

The following bootstrap script upgrades the library after the application provisioning stage. You can add this script as a bootstrap script that runs in the background and exits successfully so that cluster provisioning continues. This script continues to monitor node provisioning and upgrades the library after provisioning.

The following example script upgrades the NumPy version:

#!/bin/bashwhile true; doNODEPROVISIONSTATE=\` sed -n '/localInstance [{]/,/[}]/{/nodeProvisionCheckinRecord [{]/,/[}]/ { /status: / { p } /[}]/a } /[}]/a}' /emr/instance-controller/lib/info/job-flow-state.txt | awk ' { print \$2 }'\` if [ "\$NODEPROVISIONSTATE" == "SUCCESSFUL" ]; then sleep 10; echo "Running my post provision bootstrap" #your code here #Below example lines #sudo python3 -m pip uninstall numpy==1.16.5 (this is default version of numpy) #sudo python3 -m pip install --upgrade numpy==1.20.1 (new version of numpy) exit; fisleep 10;done

Note: In some cases, YARN containers running a Python package might not use an updated package that can be installed using the preceding resolution. If the container isn't running an updated package, you see module not found errors when trying to install. This is because the YARN NodeManager process is responsible for launching containers. The NodeManager's containers might already be running or allocated before the NODEPROVISIONSTATE is successful. This issue is often seen in multi-tenant clusters that have frequent auto scaling.

You can avoid module not found errors by polling the state of the nodemanager service. Then, run the desired bootstrap action as soon as the nodemanager starts.

Install Python libraries on EMR clusters (2024)
Top Articles
Crypto Exchange Failures: A Deep Dive into Infamous Incidents [2023]
Growing Demand for Plant-based Protein is Driving Innovations & Growth of Meat Substitutes Market
Parke County Chatter
Tryst Utah
Missing 2023 Showtimes Near Cinemark West Springfield 15 And Xd
Kobold Beast Tribe Guide and Rewards
Practical Magic 123Movies
Otis Department Of Corrections
BULLETIN OF ANIMAL HEALTH AND PRODUCTION IN AFRICA
Encore Atlanta Cheer Competition
Gameplay Clarkston
Best Theia Builds (Talent | Skill Order | Pairing + Pets) In Call of Dragons - AllClash
Skip The Games Norfolk Virginia
Nestle Paystub
Pollen Count Central Islip
Iron Drop Cafe
Nwi Arrests Lake County
Quest Beyondtrustcloud.com
The Cure Average Setlist
Clear Fork Progress Book
Parentvue Clarkston
Gayla Glenn Harris County Texas Update
Amih Stocktwits
How To Find Free Stuff On Craigslist San Diego | Tips, Popular Items, Safety Precautions | RoamBliss
FAQ's - KidCheck
Vera Bradley Factory Outlet Sunbury Products
Nottingham Forest News Now
Summoners War Update Notes
Tamil Movies - Ogomovies
Why comparing against exchange rates from Google is wrong
Fedex Walgreens Pickup Times
The Ultimate Guide to Obtaining Bark in Conan Exiles: Tips and Tricks for the Best Results
How to Play the G Chord on Guitar: A Comprehensive Guide - Breakthrough Guitar | Online Guitar Lessons
Powerspec G512
Austin Automotive Buda
Nobodyhome.tv Reddit
Why Gas Prices Are So High (Published 2022)
Page 5662 – Christianity Today
Restored Republic May 14 2023
Gateway Bible Passage Lookup
Paul Shelesh
Is Ameriprise A Pyramid Scheme
Ups Authorized Shipping Provider Price Photos
Autozone Battery Hold Down
Eat Like A King Who's On A Budget Copypasta
Youravon Com Mi Cuenta
Phone Store On 91St Brown Deer
Who uses the Fandom Wiki anymore?
4Chan Zelda Totk
Ssss Steakhouse Menu
Invitation Quinceanera Espanol
Latest Posts
Article information

Author: Lidia Grady

Last Updated:

Views: 5915

Rating: 4.4 / 5 (65 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Lidia Grady

Birthday: 1992-01-22

Address: Suite 493 356 Dale Fall, New Wanda, RI 52485

Phone: +29914464387516

Job: Customer Engineer

Hobby: Cryptography, Writing, Dowsing, Stand-up comedy, Calligraphy, Web surfing, Ghost hunting

Introduction: My name is Lidia Grady, I am a thankful, fine, glamorous, lucky, lively, pleasant, shiny person who loves writing and wants to share my knowledge and understanding with you.