Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • Creating a Map of Major Ports in China using Python and Mapbox's Map
  • Introduction
  • Step 1: Installing Required Libraries
  • Step 2: Web Scraping to Get Major Ports Data
  • Step 3: Storing Data with Pandas
  • Step 4: Setting Up Folium and OpenStreetMap (OSM)
  • Step 5: Creating a Mapbox Account
  • Step 6: Generating a Mapbox Public Key
  • Conclusion

Was this helpful?

  1. Geospatial Analysis
  2. GSA at Work

Web Scraping and Mapping

Creating a Map of Major Ports in China using Python and Mapbox's Map

Sedar Sahin, May 2023

Introduction

In this tutorial, we will learn how to create an interactive map showcasing the major ports in China. Port locations become especially important if you ship goods to or from China. Therefore, for those you who (will) work as a Data Scientist (or Analyst) in a logictics, trading, or manufacturing company, it is good to know where the these ports are. These skills you will learn in this practice will help you visualize not only (and taking the step the further and apply geospatial analyses) the locations of ports, but any kind of establishment with latitude and longitude info for stakeholders.

We will use Python's libraries like Requests, BeautifulSoup, Pandas, and Folium to gather the data, manipulate it, and visualize it on the map. Initially, we'll encounter a small challenge with the map's language, but we'll overcome it by utilizing Mapbox's English base map.

Prerequisites:

  • Basic knowledge of Python

  • Familiarity with web scraping using BeautifulSoup

  • Understanding of data manipulation with Pandas

  • Basic awareness of Folium library for map visualization

Step 1: Installing Required Libraries

First things first, ensure you have Python installed on your system. Then, install the necessary libraries using pip if you haven't already:

pip install requests beautifulsoup4 pandas folium

Step 2: Web Scraping to Get Major Ports Data

Once we have the necessary pakages install, we can now start working on our favorite Python interface, any IDE of your choice or a Jupyter Notebook.

Here is the list of Ports in the tabular form:

NR
Port

1

Dalian

2

Yingkou

3

Jinzhou

4

Qinhuangdao

5

Tianjin

6

Yantai

7

Weihai

8

Qingdao

9

Rizhao

10

Lianyungang

11

Nantong

12

Zhenjiang

13

Jiangyin

14

Nanjing

15

Shanghai

16

Ningbo

17

Zhoushan

18

Jiujiang

19

Taizhou (North of Wenzhou)

20

Wenzhou

21

Taizhou (South of Wenzhou)

22

Changle

23

Quanzhou

24

Xiamen

25

Shantou

26

Jieyang

27

Guangzhou

28

Zhuhai

29

Shenzhen

30

Zhanjiang

31

Beihai

32

Fangchenggang

33

Haikou

34

Basuo

To retrieve coordinates we need to click on each one of the ports on the page. The below screenshot shows the details about the Dalian Port, and its coordinates in two different places on the page both of which highlighted with red rectangles:

The way Wikipedia structured its url is:

Using this information we wil be creating urls to access to each port's page in order to extract their coordinates. We can use apps like Excel, Numbers, Google Sheets or simply a text editor to achieve this.

The following table shows the ports with their corresponding Wikipedia pages:

NR
Port
URL Wiki

1

Dalian

2

Yingkou

3

Jinzhou

4

Qinhuangdao

5

Tianjin

6

Yantai

7

Weihai

8

Qingdao

9

Rizhao

10

Lianyungang

11

Nantong

12

Zhenjiang

13

Jiangyin

14

Nanjing

15

Shanghai

16

Ningbo

17

Zhoushan

18

Jiujiang

19

Taizhou,_Zhejiang

20

Wenzhou

21

Taizhou (South of Wenzhou)

NA

22

Changle

23

Quanzhou

24

Xiamen

25

Shantou

26

Jieyang

27

Guangzhou

28

Zhuhai

29

Shenzhen

30

Zhanjiang

31

Beihai

32

Fangchenggang

33

Haikou

34

Basuo

Please note that the Port #19 was replaced by "Taizhou,_Zhejiang" in order to retrieve its coordinates and #21 was skipped because there is no geographical coordinate data available on Wikipedia for this port with this name.

Keep in mind extracting URLs can be automated via scraping as well. For the sake of this tutorial, I wanted to use scraping only for one job, which is to extracting coordinates of the ports.

We have links of the ports, it is time to retrieve their coordinates. To do that we first "inspect" the HTML (no need to worry about it for this tutorial) to see via which element/attribute the latitude and longitute information is being stored.

The image below shows that latitude and longitudes are stored in HTML attribute (i.e. class) latitude and longitude respectively.

Now we have all the information, including the names of the ports, their urls, and know how to access to their coordinates, we can now roll up our sleeves and work on our Python script to combine them.

# Import Libraries

# Web Scraping
import requests
from bs4 import BeautifulSoup

# Data Manipulation
import pandas as pd

# Regular Expression Operations (to be used in degree conversion)
import re

# Maps
import folium

Since we will be scraping multiple (33 to be exact) pages it is good to create a function to do the heavy work:

def get_latlon_from_wikipedia(url):
    
    # access to the url
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception("Failed to fetch the page.")

    # parse the page
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # retrieve latitude and longitude info
    latitude_span = soup.find('span', {'class': 'latitude'})
    longitude_span = soup.find('span', {'class': 'longitude'})

    if not latitude_span:
        raise Exception("Latitude data not found on the page.")
        
    if not longitude_span:
        raise Exception("Longitude data not found on the page.")
    
    # assign the coordinates to latitude and longitude variables
    latitude = latitude_span.text.strip()
    longitude = longitude_span.text.strip()
    
    # store the latitude and longitude info in a list
    latlon = [latitude, longitude]
    
    return latlon

Next, we need to define a list which will store all ports' urls that we have extracted earlier. This is simply a copy and paste step.

url_list=[
'https://en.wikipedia.org/wiki/Dalian',
'https://en.wikipedia.org/wiki/Yingkou',
'https://en.wikipedia.org/wiki/Jinzhou',
'https://en.wikipedia.org/wiki/Qinhuangdao',
'https://en.wikipedia.org/wiki/Tianjin',
'https://en.wikipedia.org/wiki/Yantai',
'https://en.wikipedia.org/wiki/Weihai',
'https://en.wikipedia.org/wiki/Qingdao',
'https://en.wikipedia.org/wiki/Rizhao',
'https://en.wikipedia.org/wiki/Lianyungang',
'https://en.wikipedia.org/wiki/Nantong',
'https://en.wikipedia.org/wiki/Zhenjiang',
'https://en.wikipedia.org/wiki/Jiangyin',
'https://en.wikipedia.org/wiki/Nanjing',
'https://en.wikipedia.org/wiki/Shanghai',
'https://en.wikipedia.org/wiki/Ningbo',
'https://en.wikipedia.org/wiki/Zhoushan',
'https://en.wikipedia.org/wiki/Jiujiang',
'https://en.wikipedia.org/wiki/Taizhou,_Zhejiang',
'https://en.wikipedia.org/wiki/Wenzhou',
'https://en.wikipedia.org/wiki/Taizhou (South of Wenzhou)',
'https://en.wikipedia.org/wiki/Changle',
'https://en.wikipedia.org/wiki/Quanzhou',
'https://en.wikipedia.org/wiki/Xiamen',
'https://en.wikipedia.org/wiki/Shantou',
'https://en.wikipedia.org/wiki/Jieyang',
'https://en.wikipedia.org/wiki/Guangzhou',
'https://en.wikipedia.org/wiki/Zhuhai',
'https://en.wikipedia.org/wiki/Shenzhen',
'https://en.wikipedia.org/wiki/Zhanjiang',
'https://en.wikipedia.org/wiki/Beihai',
'https://en.wikipedia.org/wiki/Fangchenggang',
'https://en.wikipedia.org/wiki/Haikou',
'https://en.wikipedia.org/wiki/Basuo',
]

Time for running our function to extract coordinates from each port's page. We will be storing them in a dictionary called lat_log_dict

# create a dictionary to store the coordinates
lat_log_dict = {}

# counter
cnt = 0

# loop through the list of urls and extract and store the coordinates

for url in url_list:
    # copy url
    wikipedia_url = url
    
    # extract the name of the port
    port = wikipedia_url.split("/")[-1]
    
    try:
        latlon = get_latlon_from_wikipedia(wikipedia_url)
        # print(f" {cnt} - Lat/Lon for {port}:", latlon) # to display current port and its coordinates
        
        lat_log_dict[port] = latlon
        
    except Exception as e:
        print("Error:", str(e))
    cnt+=1
        
lat_log_dict

Output:

{'Dalian': ['38°54′N', '121°36′E'],
 'Yingkou': ['40°37′30″N', '122°13′08″E'],
 'Jinzhou': ['41°07′44″N', '121°08′53″E'],
 'Qinhuangdao': ['39°53′18″N', '119°31′13″E'],
 'Tianjin': ['39°08′01″N', '117°12′19″E'],
 'Yantai': ['37°27′53″N', '121°26′52″E'],
 'Weihai': ['37°30′48″N', '122°07′14″E'],
 'Qingdao': ['36°04′01″N', '120°22′58″E'],
 'Rizhao': ['35°25′01″N', '119°31′37″E'],
 'Lianyungang': ['34°35′48″N', '119°13′17″E'],
 'Nantong': ['31°58′52″N', '120°53′38″E'],
 'Zhenjiang': ['32°11′17″N', '119°25′26″E'],
 'Jiangyin': ['31°50′20″N', '120°17′42″E'],
 'Nanjing': ['32°03′39″N', '118°46′44″E'],
 'Shanghai': ['31°13′43″N', '121°28′29″E'],
 'Ningbo': ['29°51′37″N', '121°37′28″E'],
 'Zhoushan': ['29°59′08″N', '122°12′27″E'],
 'Jiujiang': ['29°39′40″N', '115°57′14″E'],
 'Taizhou,_Zhejiang': ['28°39′21″N', '121°25′15″E'],
 'Wenzhou': ['27°59′38″N', '120°41′57″E'],
 'Changle': ['25°55′N', '119°33′E'],
 'Quanzhou': ['24°52′28″N', '118°40′33″E'],
 'Xiamen': ['24°28′47″N', '118°05′20″E'],
 'Shantou': ['23°21′14″N', '116°40′55″E'],
 'Jieyang': ['23°33′04″N', '116°22′22″E'],
 'Guangzhou': ['23°07′48″N', '113°15′36″E'],
 'Zhuhai': ['22°16′18″N', '113°34′37″E'],
 'Shenzhen': ['22°32′29″N', '114°03′35″E'],
 'Zhanjiang': ['21°16′12″N', '110°21′27″E'],
 'Beihai': ['21°28′52″N', '109°07′12″E'],
 'Fangchenggang': ['21°41′12″N', '108°21′17″E'],
 'Haikou': ['20°01′07″N', '110°20′56″E'],
 'Basuo': ['19°05′31″N', '108°40′16″E']}

Step 3: Storing Data with Pandas

With the data extracted, we'll use the Pandas library to organize and store the port details in a structured format, making it easier to work with the data.

# create a dataframe
df = pd.DataFrame(data=lat_log_dict)
df = df.T
df.reset_index(inplace=True)

# update the column names
df = df.rename(columns={'index':'Port', 0:'Lat (DMS)',1:'Lon (DMS)'})

# diplay first 3 rows
df[:3]

Output:

We now have coordinates for all the Major ports in China. All the coordinates are degree-minutes-seconds format. Python's mapping package, Folium, however, works with decimal degress. Therefore, before we proceed to map the ports we need to convert the latitude and longitudes values to decimal degrees

Conversion Formula:

decimal degrees = degrees + (minutes / 60) + (seconds/3600)

To do that we will create another function called dms2dd that will convert each coordinate to decimal degrees.

def dms2dd(s):
    # example: s = """ 0°11'23.29"S """
    
    if '″' not in s:
        s = s[:-1] + '0″'+ s[-1]
    
    degrees, minutes, seconds, direction = re.split('[°′″\"]+', s)
    
    dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
    
    if direction in ('S','W'):
        dd*= -1
    return dd

We will now apply the function using pandas' apply method to latitude and longitude data:

df['Lat'] = df['Lat (DMS)'].apply(dms2dd)
df['Lon'] = df['Lon (DMS)'].apply(dms2dd)
df.head(10)

Output:

Step 4: Setting Up Folium and OpenStreetMap (OSM)

We'll begin visualizing the port locations using the Folium library with the 'OpenStreetMap (OSM)' base map. It will give us a basic map view but may present location names in Chinese, which can be a bit challenging for non-Chinese speakers.

# Tiles from OpenStreetMap (it does not offer English names for local cities, use MapBox instead)
tiles = 'OpenStreetMap'

# put geo. locations to a list 
location_list_df = df[['Lat','Lon']].values.tolist()


# Display the ports

# create basemap
map_obj = folium.Map(location=[df.Lat.mean(), df.Lon.mean()],
                    tiles=tiles,
                    zoom_start=4)

# place ports on the map
for loc in range(len(location_list_df)):
    folium.Marker(location=location_list_df[loc],
                  popup=df.iloc[loc].Port,
#                   icon = folium.Icon(color='blue', icon=f'info-sign') # if wants to use an icon
                  icon = folium.DivIcon(
                  html=('<svg height="100" width="100">'
                        '<circle cx="10" cy="10" r="5" stroke="red" stroke-width="3" fill="yellow" />'
                        f'<text x="15" y="15" font-size: 5pt fill="black">{df.iloc[loc].Port}</text>'
                        '</svg>')
                    )
                 ).add_to(map_obj)

# save the map
# map_obj.save('china_ports_osm.html')

# display the map
map_obj

Output:

Due to HTML limitations the map pasted above is not interactive. Once you run the code you will be able interact with it on your computer as shown below

As you can see in the map that all names are in local language, meaning unless you speak Chinese, we have no way of telling where, for instance, the city of Dilian is. Therefore we need to translate the names into English. At the time of this writing (May 2023), OpenStreetMap does not support/offer English translations for countries where English is not the official language.

Step 5: Creating a Mapbox Account

Step 6: Generating a Mapbox Public Key

Once you have your Mapbox account set up, you will generate a public key, or used the default, to access the Mapbox maps in our Python script. For this tutorial, I created an access token called china_ports

Step 7: Using Mapbox's English Base Map

We'll modify our Python script to use the Mapbox's Map with the English base layer. This will allow us to display location names in English, and enhance the user experience. The map will have interactive features (once run), allowing users to zoom in, zoom out, and click on port markers to get additional information.

# Tiles from MapBox 
mapbox_access_token = '<YOUR ACCESS TOKEN>' # example: mapbox_access_token='pk.eyJ1I...'

# Select a tile style for Mapbox map
tileset_ID_str = "streets-v12" # Mapbox Styles: https://docs.mapbox.com/api/maps/styles/
tilesize_pixels = "256"
tiles = f"https://api.mapbox.com/styles/v1/mapbox/{tileset_ID_str}/tiles/{tilesize_pixels}/{{z}}/{{x}}/{{y}}@2x?access_token={mapbox_access_token}" 

# Display the ports
# create basemap
location_list_df = df[['Lat','Lon']].values.tolist()

# place ports on the map
map_obj = folium.Map(location=[df.Lat.mean(), df.Lon.mean()],
                    tiles=tiles,
                    zoom_start=4,
                    attr='Mapbox')

for loc in range(len(location_list_df)):
    folium.Marker(location=location_list_df[loc],
                  popup=df.iloc[loc].Port,
#                   icon = folium.Icon(color='blue',icon=f'info-sign')
                  icon = folium.DivIcon(
                  html=('<svg height="100" width="100">'
                        '<circle cx="10" cy="10" r="5" stroke="red" stroke-width="3" fill="yellow" />'
                        f'<text x="15" y="15" font-size: 5pt fill="black">{df.iloc[loc].Port}</text>'
                        '</svg>')
                    )
                 ).add_to(map_obj)
                
# save the map 
map_obj.save('china_ports_mbx.html')

# display the map
map_obj

Output:

Once again due to HTML limitations the map pasted above is not interactive. Once you run the code you will be able interact with it on your computer. For interactive Mapbox Map please click the following button

Conclusion

Congratulations! We've successfully created an interactive map displaying major ports in China using Python, BeautifulSoup, Pandas, and Folium with Mapbox's English base map. You can now explore and share this map with others, enhancing their understanding of the major ports in China.

Last updated 4 months ago

Was this helpful?

Because we don't have access to such data with port names along with their latitude and longitude, we need to create it. We will fetch the list of major ports in China from the (see below) using Python's requests and BeautifulSoup libraries. To do this, we'll write a Python script to scrape the data.

"" + <Port Name>

To overcome the language issue, we'll create an . Mapbox offers an English base map, which will make our map more user-friendly non-Chinese speakers.

Wikipedia page
https://en.wikipedia.org/wiki/
account on Mapbox
https://en.wikipedia.org/wiki/Dalian
https://en.wikipedia.org/wiki/Yingkou
https://en.wikipedia.org/wiki/Jinzhou
https://en.wikipedia.org/wiki/Qinhuangdao
https://en.wikipedia.org/wiki/Tianjin
https://en.wikipedia.org/wiki/Yantai
https://en.wikipedia.org/wiki/Weihai
https://en.wikipedia.org/wiki/Qingdao
https://en.wikipedia.org/wiki/Rizhao
https://en.wikipedia.org/wiki/Lianyungang
https://en.wikipedia.org/wiki/Nantong
https://en.wikipedia.org/wiki/Zhenjiang
https://en.wikipedia.org/wiki/Jiangyin
https://en.wikipedia.org/wiki/Nanjing
https://en.wikipedia.org/wiki/Shanghai
https://en.wikipedia.org/wiki/Ningbo
https://en.wikipedia.org/wiki/Zhoushan
https://en.wikipedia.org/wiki/Jiujiang
https://en.wikipedia.org/wiki/Taizhou,_Zhejiang
https://en.wikipedia.org/wiki/Wenzhou
https://en.wikipedia.org/wiki/Changle
https://en.wikipedia.org/wiki/Quanzhou
https://en.wikipedia.org/wiki/Xiamen
https://en.wikipedia.org/wiki/Shantou
https://en.wikipedia.org/wiki/Jieyang
https://en.wikipedia.org/wiki/Guangzhou
https://en.wikipedia.org/wiki/Zhuhai
https://en.wikipedia.org/wiki/Shenzhen
https://en.wikipedia.org/wiki/Zhanjiang
https://en.wikipedia.org/wiki/Beihai
https://en.wikipedia.org/wiki/Fangchenggang
https://en.wikipedia.org/wiki/Haikou
https://en.wikipedia.org/wiki/Basuo
Interactive Map of China via
OpenStreetMap
Maps, geocoding, and navigation APIs & SDKs | Mapbox
Mapbox StreetsMapbox
Logo
Inspecting Coordinates
HTML attibutes for latitude and longitude information
Access Tokens on Mapbox
Lisf of Ports in China (Wikipedia: )
Port of Dalian (Wikipedia: )
Major Ports in China via
Major Ports in China via
https://en.wikipedia.org/wiki/List_of_ports_in_China
https://en.wikipedia.org/wiki/Dalian
OpenStreetMap
Mapbox
Logo