## Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations?

12

2

I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.

As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.

However I am looking for a more number based approach.

## Data Set

The (self) collected data set I am using contain around ~54.000 fares. All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).

Hence, every fare within that time window occurs $$n$$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare. (You can't collect a fare for a flight when the departure date of the flight is in the past)

The unformatted that looks basically like this: (fake data)

+--------------------+-----------+--------------------+--------------------------+---------------+
| requestDate        | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 725.32    | 16APR2015:10:50:02 | 23APR2015:21:55:04       | XA            |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 966.32    | 16APR2015:13:20:02 | 23APR2015:19:00:04       | XY            |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 915.32    | 16APR2015:13:20:02 | 23APR2015:21:55:04       | XH            |
+--------------------+-----------+--------------------+--------------------------+---------------+

"DaysBeforeDeparture" is calculated via $$I=s-c$$ where

• I & interval (days before departure)
• s & date of the fare (flight departure)
• c & date of which the fare was collected

Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):

+-----------------+------------------+------------------+------------------+------------------+
| DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
+-----------------+------------------+------------------+------------------+------------------+
| 0               | 880.68           | 477.99           | 2,245.23         | DL           |
+-----------------+------------------+------------------+------------------+------------------+
| 0               | 904.89           | 477.99           | 2,534.55         | DL           |
+-----------------+------------------+------------------+------------------+------------------+
| 0               | 1,044.39         | 920.99           | 2,119.09         | LH               |
+-----------------+------------------+------------------+------------------+------------------+

## What I came up with so far

Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices? Is there an other way?

I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline

PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.

Anyone a clue? :-)

9

Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.

Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.

Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.

Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.

In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.

Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.

What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.

There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.

I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.

Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets. – s1x – 2015-05-21T13:16:38.673

Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights.

I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.

– s1x – 2015-05-21T13:19:26.397

1@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS. – habu – 2015-05-21T13:24:43.460

4

In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.

Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.

Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that.

Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices.

Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001

I assume such results indicate similar price patterns. With a regression analysis, what would I find out? – s1x – 2015-05-18T02:55:16.293

@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).

– Aleksandr Blekh – 2015-05-18T03:21:38.997