Categories
Coding Data Analysis Pandas Python Tutorial

Using Python and Pandas to Analyse Cryptocurrencies with CoinAPI

I want to go through how you can use Python along with Pandas to analyse different cryptocurrencies using CoinAPI. While trading cryptocurrencies may not be to every bodies fancy, I still feel it’s a good real-world example to get you started.

The Tutorial

I’m not going to go through the process of setting up Python. My hope is you already have a basic understanding of the language. The types of things I will be going over however include the following:

  • Connecting to CoinAPI using Python
  • Getting the historical data of a cryptocurrency
  • Renaming, dropping and reordering columns from the data we retrieve
  • Using DateTime to get the day of the week and store this information as a new column
  • Saving data from the API to a CSV file
  • Taking the information for a CSV file into a Pandas DateFrame
  • Analysing the data to find things such as the mean, median, percentiles and more
  • Filtering the data by different dates

Register an API Key

The first thing you will need to do is register for your free CoinAPI API key. The API is good for only 100 daily requests. This is why we’ll be adding the data from the API to a CSV file. This way we don’t need to connect every time we want to analysis the data.

The Imports for Python

We’ll only be using four imports which will be JSON and Requests for connecting to the API. Pandas for the analysing the data and DateTime to work with dates.

import json
import requests
import pandas as pd
import datetime as dt

Connecting to the CoinAPI API

For my example I will be using Litecoin and the historical daily data CoinAPI has on it.

You can change the structure of the URL to suit your needs. For a Bitcoin example you would just need to change LTC to BTC. The period_id can be set to seconds but for our purposes we’ll just be getting the daily values as this would no doubt exceed the daily limit quite quickly.

First of all you will need to add your own API key within the api_key variable. Next the response variable will attempt to connect to the API. We’ll do a simple status_code check to see if we’re successful or not. The 429 status code comes back from CoinAPI if you have had to many requests for that day. You will need to try again the next day if this is the case.

api_key = "YOUR-API-KEY-GOES-HERE"

url = "https://rest.coinapi.io/v1/ohlcv/LTC/GBP/history?period_id=1DAY&time_start=2015-01-01T00:00:00&time_end=2020-10-31T23:59:00&limit=100000"
headers = {"X-CoinAPI-Key" : api_key}
response = requests.get(url, headers = headers)

if(response.status_code == 429):
    # API response
    print("Too many requests.")

Processing and Storing the Data

Assuming you were able to get access to the API, we can now move on to processing the data.

The first thing we’ll need to do is use the JSON module and get the text response back from CoinAPI and store this in a variable called coin_data.

coin_data  = json.loads(response.text)

Creating the DataFrame

We will now use Pandas to create the DataFrame from our coin_data variable and assign this to ltc_data but you could call this btc_data if you’re working with Bitcoin for example.

ltc_data = pd.DataFrame(coin_data)

Renaming Columns

Now that we have our data stored in a DataFrame we can begin to rename our columns. I personally do this as CoinAPI uses underscores for the columns where I like to use spaces so I can separate it better from the code I’m using.

So here we will call the rename() method from Pandas and use the columns parameter to create a mapper of the column names we wish to change. The left is the current name and the right will be our new one. If you’re happy with a particular column name then you can just leave it and Pandas will just keep it. I’ve set the inplace parameter to True so that our changes are stored in our variable for the next time it’s called.

ltc_data.rename(columns = {
    "time_period_start": "Start Time",
    "time_period_end": "End Time",
    "time_open": "Open Time",
    "time_close": "Close Time",
    "price_open": "Price Open",
    "price_high": "Price High",
    "price_low": "Price Low",
    "price_close": "Price Close",
    "volume_traded": "Volume Traded",
    "trades_count": "Trade Count",
}, inplace = True)

Dropping Columns

For my purposes I don’t feel the End Time, Open Time and Close Time are needed since cryptocurrencies are more or less 24 hours. For this reason I will just remove these from the data set.

To drop columns we will call the Drop() method from Pandas. To drop these three columns we will wrap them inside some squared brackets and list them. We will then set the axis parameter to columns as rows is the default in Pandas and we will also, again, set the inplace to True.

ltc_data.drop(["End Time", "Open Time", "Close Time"], axis = "columns", inplace = True)

Reordering Columns

Unlike when we were renaming our columns, Pandas requires us to include all of the names when reordering them.

To reorder the columns we will call the reindex() method from Pandas. Since we will be passing more information into this method it’s good practice to create an array of columns. This will just help to make our code a little more readable. I have just called this reorder_columns.

reorder_columns = [
    'Start Time',
    'Price Open',
    'Price Close',
    'Price High',
    'Price Low',
    'Volume Traded',
    'Trade Count'
]

Now we will pass the reorder_columns array into the reindex() method. We will set this against the columns parameter. What we are technically doing here by storing this information against itself is “overwriting” the old order with the new. This is required as the reindex() method doesn’t have the inplace parameter as our previous examples have.

ltc_data = ltc_data.reindex(columns = reorder_columns)

Getting Days of the Week from the Date

When I’m viewing the data of cryptocurrencies I like to see what days are the most popular. Since CoinAPI doesn’t give this data we will need to convert our date stamps to days of the week.

But first we will need to convert our Start Time column to a datetime data type. To do this we will call the to_datetime() method from Pandas.

ltc_data["Start Time"] = pd.to_datetime(ltc_data["Start Time"])

What the code above is doing is overwriting the Start Time column, which is currently being stored as a string, and replacing it with its current values but they are now seen as a date data type.

Next we will create a new column and use the dayofweek property from the DateTime module. To create the new column we just need to call the ltc_data and use squared brackets and give the new columns a name. Since this new name won’t exist in our data set Pandas will know to create a new column for us.

ltc_data["Day of the Week"] = ltc_data['Start Time'].dt.dayofweek

Now the DateTime module above will get the day of the week from the date that it has retrieved from the Start Time column. However it stores this information as a number from 0 to 6. 0 = Monday, 1 = Tuesdays and so on. While this is useful from a memory and storage standpoint, it may be a little difficult for us to see the day quickly at a glance.

To convert these day numbers to written days of the week we will use a custom function along with the apply() method from Pandas.

The custom function below is quite straightforward as it just requires one parameter and uses this to go through a last of the days and returns the correct one. Since 0 = Monday our array starts with Monday.

def number_to_day(number):
    days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    return days[number]

Now we will use the number_to_day function along with the apply() method.

From the left we are overwriting our current Day of the Week columns which currently has the days of the week as numbers with our new function. The apply() method is basically going down the whole of the Day of the Week column, getting the value and then passing this to our number_to_day function.

ltc_data["Day of the Week"] = ltc_data["Day of the Week"].apply(number_to_day)

Saving the Data to a CSV File

Once we’re happy with our data we can now save it into a CSV file. Do feel free to reorder the columns again as the Day of the Week we have just added will automatically be position as the last column.

To save our data to a CSV file we just need to use the to_csv() method from Pandas. The first parameter will be the name of our CSV file and I am also setting the index parameter to False. This just stops Pandas from adding another column called index to the CSV file.

ltc_data.to_csv("LTC Day History.csv", index = False)

You will now be able to open the CSV in most spreadsheet software and view the data we retrieved from CoinAPI.

Reading a CSV File with Pandas

Now we are ready to start analysing the data from our CSV file we have just created. To do this we will be using the read_csv() method from Pandas. The only parameter we will need to give is the name of the file we wish to open.

df = pd.read_csv("LTC Day History.csv")

When using Pandas for data analysis it is standard practice to use df, short for DataFrame, to store your DataFrame in so you may see this crop up fairly often.

Using the Describe() Method

A super useful method from Pandas is the Describe() method. This will take our data and workout the following for us:

  • Count – This is the total number of rows found within the DataFrame
  • Mean – The average value of each numeric column
  • Percentiles – The defaults are 25%, 50% and 75%
  • Min and Max – The minimum and maximum values of each numeric column

Now Pandas is excellent at understanding our meaning if we were to execute the below code as Pandas will return the values of each numeric column.

df.describe()

If however we wanted to specify a column we can use squared brackets and enter the column number.

df["Price High"].describe()

We can use our squared brackets further by adding them to the end of the describe() method and requests the information we want to get back. For example the mean.

df["Price High"].describe()["mean"]

So the above code will bring us the mean of the Price High column.

Filtering by Dates

While getting information on the full range of our data set, it would be better to choose between a date range.

The below example will retrieve the mean value of the Price High from our data set for the month of September.

start_date = df["Start Time"] >= "2020-09-01"
end_date = df["Start Time"] <= "2020-09-30"
df[start_date & end_date]["Price High"].describe()["mean"]

Filtering by Dates and Other Columns

Finally let’s get a little more advance and take advantage of our date filter and get values for specific days of the week.

First we’ll set our date filter against a variable.

filter_dates = df[start_date & end_date]

Next we’ll use this variable and get our mean value for the Price High column for the Wednesdays in September.

filter_dates[filter_dates["Day of the Week"] == "Wednesday"]["Price High"].describe()["mean"]

All we’re doing here is searching through our September data, looking for Wednesday and then using the describe() method to get the mean for those columns. A good challenge to set yourself would be to write a function that would return all of the days of the week so you could see where the Price High tends to fall for a given day in a month.

Conclusion

Well, I think that’s about it. I really hope you’ve found this tutorial useful and has helped you to see the potential of using Python and Pandas for data analysis. Below you’ll be able to see the full code and please feel free to leave any feedback in the comments section.

Also let me know if you would like me to take this tutorial further as there are a number of things we could add to it. These may include percentage differences between the high and low prices. This would allow us to see days where the most trading is happening. Or even using our day of the week example and condensing that down to times of the day.

Update

I have extended this tutorial further. You can find it here.

Full Code Example

import json
import requests
import pandas as pd
import datetime as dt

def number_to_day(number):
    days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    return days[number]

api_key = "YOUR-API-KEY-GOES-HERE"
url = "https://rest.coinapi.io/v1/ohlcv/LTC/GBP/history?period_id=1DAY&time_start=2015-01-01T00:00:00&time_end=2020-10-31T23:59:00&limit=100000"
headers = {"X-CoinAPI-Key" : api_key}
response = requests.get(url, headers = headers)

if(response.status_code == 429):
    # API response
    print("Too many requests.")
else:
    # Get the information from the API
    coin_data  = json.loads(response.text)
    
    # Assign information to a DataFrame for later use
    ltc_data = pd.DataFrame(coin_data)
    
    ltc_data.rename(columns = {
        "time_period_start": "Start Time",
        "time_period_end": "End Time",
        "time_open": "Open Time",
        "time_close": "Close Time",
        "price_open": "Price Open",
        "price_high": "Price High",
        "price_low": "Price Low",
        "price_close": "Price Close",
        "volume_traded": "Volume Traded",
        "trades_count": "Trade Count",
    }, inplace = True)
    
    # Convert the Start Time to a date/time data type
    ltc_data["Start Time"] = pd.to_datetime(ltc_data["Start Time"])
    
    # Assign day of the week number (0, 1, 2, 3, 4, 5, 6)
    ltc_data["Day of the Week"] = ltc_data['Start Time'].dt.dayofweek
    
    # Convert number to the written day of the week
    ltc_data["Day of the Week"] = ltc_data["Day of the Week"].apply(number_to_day)
    
    # Drop columns as the Start Time is enough for my purposes
    ltc_data.drop(["End Time", "Open Time", "Close Time"], axis = "columns", inplace = True)
    
    reorder_columns = [
        'Day of the Week',
        'Start Time',
        'Price Open',
        'Price Close',
        'Price High',
        'Price Low',
        'Volume Traded',
        'Trade Count'
    ]
    ltc_data = ltc_data.reindex(columns = reorder_columns)
    
    # Create a CSV file with the values so not to waste the free daily API calls
    ltc_data.to_csv("LTC Day History.csv", index = False)

df = pd.read_csv("LTC Day History.csv")

start_date = df["Start Time"] >= "2020-09-01"
end_date = df["Start Time"] <= "2020-09-30"
df[start_date & end_date]["Price High"].describe()["mean"]