import collections
from itertools import combinations, permutations
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
%matplotlib inline
import matplotlib.pyplot as plt
Three of my best friends and I decided to do the stereotypical thing and backpack around Europe after we graduated college. Three of us studied computer science, and the other one studied mechanical engineering. This is the story of some of the more engineering-y things we did for planning.
Our timeframe was constricted by various individual personal and work obligations, so our first real decision was where we wanted to go. Lance made a list of the top 25 places to go in Europe, and sent out an Excel spreadsheet wherein we all ranked each of the 25 destinations in order.
df = pd.read_excel('Europe_Rankings.xlsx')
df.head()
By adding each place's rankings together and sorting by lowest overall rank, we decided on our top 8 placess.
names = ['Sam', 'Steve', 'Mauricio', 'Lance']
df['Total'] = sum(df[n] for n in names)
df.sort_values('Total')[:9]
At this point, I was curious about how similar we were in our travel preferences. We can graph our preferences in a two-dimensional space by running our individual preferences through PCA.
pca_2 = PCA(2)
vectors = [df[n].tolist()[:25] for n in names]
plot_columns = pca_2.fit_transform(vectors)
plt.title('Travel preferences (2D projection)')
colors = ['red', 'yellow', 'black', 'lime']
for t in zip(colors, plot_columns.tolist(), names):
plt.scatter(t[1][0], t[1][1], c=t[0], marker='o', label=t[2])
plt.legend(loc='lower right')
plt.show()
Now, if you put a pin on each of those cities and look at a map, the optimal route is fairly clear. But that didn't stop me from being curious about this particular instance of the travelling salesman problem.
cities = [(r['City'], r['Country']) for i, r in df.sort_values('Total').iterrows()][:9]
cities[1] = ('Omaha Beach', 'France')
cities.append(('Dublin', 'Ireland')) # we were flying in/out of Dublin
cities
First, we find the latitude and longitude of each destination.
gl = Nominatim()
locations = [gl.geocode(ci + ' ' + co) for ci, co in cities]
coded_cities = list(zip(cities, locations))
Then, we find the distance between each pair of cities. I'm using Vincenty's Formulae to get an as-the-crow-flies distance. Admittedly this isn't perfect, but it should be close enough.
weights = collections.defaultdict(lambda: collections.defaultdict(lambda: np.inf))
for city1, city2 in combinations(coded_cities, 2):
name1, name2 = city1[0], city2[0]
loc1 = (city1[1].latitude, city1[1].longitude)
loc2 = (city2[1].latitude, city2[1].longitude)
weights[name1][name2] = vincenty(loc1, loc2).miles
weights[name2][name1] = vincenty(loc2, loc1).miles
middle_cities = cities[:-1]
DUBLIN = ('Dublin', 'Ireland')
min_path_weight = np.inf
min_path = None
for p in permutations(middle_cities, r=len(middle_cities)):
path_weight = weights[DUBLIN][p[0]]
for i in range(len(p) - 1):
path_weight += weights[p[i]][p[i + 1]]
path_weight += weights[p[-1]][DUBLIN]
if path_weight < min_path_weight:
min_path_weight = path_weight
min_path = p
min_path_weight, [DUBLIN] + list(min_path) + [DUBLIN]
At this point, we needed to figure out what we were going to do in each place (and where we were going to sleep). We employed a divide and conquer strategy, putting each of us in charge of two or three cities. This strategy worked out well, because we know each other well and trust each other. Each person's process was different, but by and large we used HostelWorld, TripAdvisor, and various internet searches to figure out where to stay and what to do.
We had a great, mostly stress-free time. I highly recommend planning a trip this way!