Empowering Nonprofits with Python: A Pro Bono Initiative

Chapter 1: Introduction to the Initiative

Not long ago, I received a message on LinkedIn from a representative of a nonprofit organization.

Support Nonprofits with Python Programming

Photo by Katt Yukawa on Unsplash

The individual reached out after discovering my blog on LinkedIn. She mentioned her commitment to doing a pro bono project each month, although web scraping was outside her skill set. Recognizing this as a valuable opportunity to contribute, and knowing I had relevant code ready, I readily agreed to assist.

The nonprofit had compiled a list of about 6,000 organizations, and the initial requirement was to find the URLs for these entities. The subsequent step involved extracting text from the home page of each website, while the nonprofit would handle the text analysis themselves.

Following the download of an Excel file containing the organization names, I initiated my project with the necessary imports:

from googlesearch import search

import glob

import pandas as pd

import time

pd.set_option('display.max_columns', None)

Next, I loaded the spreadsheet using pandas:

df = pd.read_excel('c:/users/denni/downloads/March 2022 CJNP.xlsx')

df.tail()

The next phase involved creating a function to retrieve the nonprofit's URL based on its name and state:

def getURL(name, location):

try:

term = name + ' ' + location

for url in search(term, num_results=1):

return url

except:

return ''

I then iterated through the dataframe, providing the name and location to the function, and updating the dataframe with the retrieved URLs:

for index, row in df.iterrows():

if index > 4 and index < 5638:

URL = getURL(row['Name'], row['State'])

df.at[index, 'URL'] = URL

if index % 10 == 0 and index > 1:

print(index, row['Name'], URL)

time.sleep(15)

print(index)

time.sleep(1)

Once this process was finished, I employed Selenium to extract the text from the pages. Below are my necessary imports (some of which were copied from another Jupyter notebook):

import pandas as pd

import shutil

import os, re, requests, urllib

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.firefox.options import Options

from selenium.webdriver.common.by import By

from datetime import datetime, date, timedelta

import datetime

from dateparser.search import search_dates

from bs4 import BeautifulSoup

import glob as glob

I loaded the file that contained the URLs I had previously gathered:

df = pd.read_excel('c:/users/denni/downloads/nonprofit.xlsx')

df.tail()

Finally, I looped through all the URLs to scrape the text from each page:

for index, row in df.iterrows():

if index > -1:

try:

driver.get(row['URL'])

text = driver.find_element(By.XPATH, "/html/body").text

df.at[index, 'Content'] = text

if index % 10 == 0:

print(index, row['URL'], len(text))

except:

pass

And just like that, the project was complete!

More insights can be found at PlainEnglish.io. Subscribe to our free weekly newsletter, follow us on Twitter and LinkedIn, and join our Community Discord for further engagement.

Chapter 2: Video Insights

This tutorial video walks through the process of renewing membership online, providing a practical guide that complements the content of this article.

This webcast discusses workers' compensation tailored for nonprofits, offering valuable information for organizations looking to navigate this complex topic.

provocationofmind.com

Empowering Nonprofits with Python: A Pro Bono Initiative

Chapter 1: Introduction to the Initiative

Chapter 2: Video Insights

Share the page:

Recent Post:

Let Go of the Need to 'Fix' Yourself for Personal Growth

Understanding the Challenges of Testing for Prions

Revolutionizing Energy Production through Bitcoin Mining