Categories
Coding Data Analysis Pandas Python Tutorial

Analysing Cryptocurrencies with Percentage Differences in Python with Pandas

Carrying on from my last introduction, Using Python and Pandas to Analyse Cryptocurrencies with CoinAPI, I would like to take this even further with some custom functions.

Please do note that the information below is what I find interesting when it comes to analysing data so it may not suit your needs and requirements.

Things we’ll be covering include:

  • Develop a function that will work out the percentage difference between two numbers
  • Add two new columns to our data set that will use the percentage difference function to calculate the difference of the open and close price as well as the high and low price of the day
  • Develop a function that will give us a clear output of data using the days of the week column we developed last time

The Imports for Python

We’ll only be using Pandas so we’ll just import that for this tutorial.

import pandas as pd

Percentage Difference Function

Let’s start with our percentage difference function. This will take two numbers and output the percentage difference between them.

def percentage_difference(high, low):
    if high >= low:
        return ((high - low) / ((high + low) / 2)) * 100
    else:
        return ((low - high) / ((high + low) / 2)) * 100

So what’s happening in the above function? Firstly it is doing a check to see if the first number is higher in value than the second one. The reason for this is because we don’t actually need a negative percentage for our example as we just want to output the percentage difference as a whole.

Next it is doing the necessary calculations to work out the different and will return it once it is called.

Reading a CSV File with Pandas

If you followed along last time then you should already have your cryptocurrency CSV file. If not then please feel free to go through that first so you can follow along now.

To do this we will be using the read_csv() method from Pandas. The only parameter we will need to give is the name of the file we wish to open.

df = pd.read_csv("LTC Day History.csv")

When using Pandas for data analysis it is standard practice to use df, short for DataFrame, to store your DataFrame in so you may see this crop up fairly often.

Adding New Columns

Let’s start by working out the percentage difference of the high and low prices from each day.

To do this we will create an empty array called open_close_difference to place our values and then start a for loop to cycle through our data set.

open_close_difference = []
for x in df.index:
    open_close_difference.append(percentage_difference(df.iloc[x, 1], df.iloc[x, 2]))

The for loop is using the Pandas .index attribute which returns the number of rows in our data set. Inside the loop we are using the iloc accessor to find the row followed by the column. Our x from the loop will be our row number. The 1 and 2 are the index numbers of our Price High and Price Low columns. This may change for you if you have ordered your columns in a different way. These values are then passed to our percentage_difference() function and then returned into our array.

Now that our open_close_difference array has the data stored within it, it’s time to add it to a new column. This is as easy as writing the below.

df["Open Close % Difference"] = open_close_difference

This works the way it does because our existing DataFrame doesn’t have a columns called Open Close % Difference. Pandas knows to create this for us and it will use the information from the array to populate the rows.

The below code will add another new column but this time for the high and low values.

high_low_difference = []
for x in df.index:
    high_low_difference.append(percentage_difference(df.iloc[x, 3], df.iloc[x, 4]))

df["High Low % Difference"] = high_low_difference

Filtering Our New Columns

Now that we have our new columns, we can start filtering them to give us better insights.

Let’s say we would like to work out the mean for the High Low % Difference column for every Wednesday during September. We would use the below to get the date ranges and then applying this to the Day of the Week column so that it is equal to Wednesday.

start_date = df["Start Time"] >= "2020-09-01"
end_date = df["Start Time"] <= "2020-09-30"

filter_dates = df[start_date & end_date]
filter_dates[filter_dates["Day of the Week"] == "Wednesday"]["High Low % Difference"].describe()["mean"]

# [OUTPUT] 6.662062495339

While the above does a good job of returning a single day, it would require six further lines to get a view of the whole week. This view would benefit us as we could see at a glance which does are potentially more volatile.

To get a week view we will develop another function that will output each day of the week and the calculation we would like it to return.

def describe_days(data, column, summary):
    days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    
    for day in days:
        output = data[data["Day of the Week"] == day][column].describe()[summary]
        print(day + ": \t" + str(output))

The function above takes three parameters which are the data, column and summary. The data will be our filter_dates. The column will be any one of our numeric columns. Finally the summary will be either min, max, mean, std as these are found within the describe() method.

There is an array of days which we will use to loop through each of our data. The function will look for all the days, collect the data from the column and then print it out using the describe() method.

Here is our function in action where we are requesting the mean of the High Low % Difference columns.

describe_days(filter_dates, "High Low % Difference", "mean")

# [OUTPUT] Monday: 7.706157419969
# [OUTPUT] Tuesday: 6.333708463529
# [OUTPUT] Wednesday: 6.662062495339
# [OUTPUT] Thursday: 9.156648607491
# [OUTPUT] Friday: 7.726340459178
# [OUTPUT] Saturday: 7.899497831153
# [OUTPUT] Sunday: 8.466911851066

Looking at the above data objectively you can see that Thursday had the greatest percentage difference during September.

Conclusion

I guess what you do with this information is up to you as you may want to check previous months to see any correlation.

There are many things we could add to our functions. Perhaps rounding down those long percentage values?

Let me know your thoughts on how you would like to see these tutorials continuing.

Full Code Example

import pandas as pd

def percentage_difference(high, low):
    if high >= low:
        return ((high - low) / ((high + low) / 2)) * 100
    else:
        return ((low - high) / ((high + low) / 2)) * 100

def describe_days(data, column, summary):
    days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    
    for day in days:
        output = data[data["Day of the Week"] == day][column].describe()[summary]
        print(day + ": \t" + str(output))

df = pd.read_csv("LTC Day History.csv")

# Calculate the percentage difference of two columns
open_close_difference = []
for x in df.index:
    open_close_difference.append(percentage_difference(df.iloc[x, 1], df.iloc[x, 2]))

# Adding the new column
df["Open Close % Difference"] = open_close_difference

# Calculate the percentage difference of two columns
high_low_difference = []
for x in df.index:
    high_low_difference.append(percentage_difference(df.iloc[x, 3], df.iloc[x, 4]))

# Adding the new column
df["High Low % Difference"] = high_low_difference

# Our date ranges
start_date = df["Start Time"] >= "2020-09-01"
end_date = df["Start Time"] <= "2020-09-30"
filter_dates = df[start_date & end_date]

# Getting the mean of the High Low % Difference
describe_days(filter_dates, "High Low % Difference", "mean")