TicketMaster System Design

Shahar Shokrani
8 min readJul 18, 2024

--

Designing a system for a high-demand service like TicketMaster requires careful consideration of both functional and non-functional requirements. This article highlights the common mistakes made during the design of such a system and provides corrections to improve clarity, efficiency, and robustness. From handling user interactions to ensuring system scalability, we delve into the critical aspects that make up a well-rounded TicketMaster system.

Credit: https://www.youtube.com/watch?v=fhdPyoO6aXI

Requirements

All of this requirements starts with user should…

  1. View an event: User should be able to view an event’s details and the number of available tickets.
  2. Search events: User should be able to search for events.
  3. Book tickets: User should be able to claim tickets for 5 minutes and then proceed to book them.
  4. Booking popular events: User should be able to book tickets for popular concerts, handling high traffic and ensuring fair ticket distribution.

Non Functional Requirements

It is recommended to lay out the most important non-functional requirements in the context of the problem’s functional requirements

  1. View an event - Availabilty: When a user enters the event page, they should always see the event details.
  2. Search an events: Performance, Search results should always be returned quickly.
  3. Book tickets: Consistency, Ensure there is no double booking of tickets.
  4. Book popular tickets: Scalability, The system should handle many users during peak times.

Core Entities

What data is going to be persisted or exchanged via API.

Events: including locations and artists as properties of the events.

PartialEvent: A smaller version of the Event entity returned only within search results to reduce latency.

PaymentMethod: Will be sent to a third-party vendor without being persisted (only the Token will be persisted).

Reservation: Temporary entity to support ticket reserving.

Tickets: Represents the actual tickets available for booking..

API

These are the requests the clients are be making in order to satisfy the functional requirements by exchanging the core entities:

  1. View an event: GET /events/{eventId} => event, tickets[].
  2. Search events: GET /search?filters => PartialEvent[]
  3. Book tickets: Since there is gap between the time the user clicks on book an events until he actualy purchase the ticket we will seperate it into two steps: POST /reserve {ticketId} => reserveId, and POST /ticket {reserveId} => ticketId.

High Level Design

Events are temporary by nature, so in terms of model capacity, we can say that it does not matter which

If user A has claimed ticket Ids 110 and 112, if user B will try to get claim ticket 110 he should be denied — so we can see our wrties should be Atomic, the modern way to acheive it, either to use Transactional DB, or Create a Locking on our own.

View an event:

We will have a client, and an api gateway which should be responsible to routing to the balanced services, authentication, rate limiting.

The api gateway should be cloud-based so we won’t have to implement it on our own and have a realiable

High level only design for View an event requirement

Search events:

At first we can use the same Event Service, and add a search route that query the db like: “SELECT * FROM EventTable where Title like ‘%…%’, the problem with this is that we will have a full table scan for each search, and it is really slow.

High level only design for Search Events requirement

Book events (option1 — with status column):

We can introduce a “Status” column to the Tickets table, and if user will claim the ticket then the Status will be Claimed, and then Booked againts a 3rd party vendor, by introducing a webhook to be notify when the payment is confirm.

Reserve and Book routes with 3th party vendor and a Status column within Ticket table

The problem with this appraoch, what happens if user regrets and never finalize the booking process within he’s 5 minutes? then the ticket’s status will stuck on Claimed.

We can introduce a second column with the Expiry and with a cron job that runs every 5 minutes and revert claimed tickets more than 5 minutes.

Introduce an Expiry column with a simple cron job that runs every 5 minutes

Again, the problem with this appraoch is that we should query our entire table every 5 minutes, and can lead that tickets will be not be available for potential up to 9:59 minutes (if the user claim the ticket immideitlly after the last cron job finished) — .

Book events (option2 — with distribute lock):

So we can introduce a distributed lock (and remove the status and expiry column):

Use a distributed lock with a 5-minute TTL, ensuring horizontal scalability and considering failover strategies.

When user claims a ticket, we will add it the claims in-memory db, with a ttl (time-to-live) of 5 minutes, then when trying to claim by another user we can easily check if its exist or not in the cliams list with O(1), the events service will also query the DB for available tickets and filter the tickets found in redis — now the Consistency requiremnet is fullfilled.

It important to note the key must be distributed across many booking services in case we will need to horizontally sacle the booking and events services.

And if this Redis gose down, then it means all the users that reserved the ticket will lost their reservations.

Deep Dives

Search Performance

In order to fullfill the search performance requirement we can introduce an elastic search. The elastic should hold an inverted index of the search terms mapped to the actual keys (TicketId) in the main database.

If the search results are light weight, we can keep all the search results property inside the elastic search, it will make the searches return faster because it does not have to go back to the main db (it is not recommnaded to use elastic search as the main db).

Introducing elastic search to speed up our searches

One issue is about making the elastic search to be updated with the main Db, since the system wrtie ratio is low we could:

Own implementation: Everytime event is changed we could update elastic after the the primary db updated, its easy and fast to implement but it will add complexity to the code in terms of failure handling.

CDC (Change Data Capture): Kafka Connect is a soltuion that can keep the elastic sink updated with the Db source changes stream, you should be familiar with event based queue, and it should cost extra but it is the much cleaner reliable solution. (If the write ratio was high, we should have introduce a solution to defend elastic, since it costly to to update the inverted indexes a lot)

Caching

In terms of caching we could use the configurable cloud’s elastic caching or inroduce a cache (Redis) of our own — we only have to invalided the caching when an update occurs, or use the already defined CDN to cache the api requests.

It is important to note that we able to apply caching only becasue users should see the same results we can trigger caching to view the search result.

Implement ElasticSearch for fast searches, ensuring it stays updated with the main database using CDC (Change Data Capture) methods.

Popular events

If many users try to book tickets, it is very likely that two users Tom and Bob are both see ticket 999 as available on screen, Tom will have a success reserving the ticket, and Bob will get an error, Bob will have to refresh to page to view the available tickets and it will be too late for him.

The simple solution is to emulate real-time communication between a client and the server, one way is by long polling its maintaining a long-lived connection between the client and the server, reduces the total number of requests, minimizes latency, and improves real-time communication, so Bob will see in real time the seat goes red and will try to select another one, long polling is recomanded when users are in the page for a short period of time like while in ordering a ticket and leave.

Extreme popular events

We could introduce an in-memory waiting queue with FIFO or random-based prioritization (for example batching N tickets at once), considering persistent queuing solutions like RabbitMQ or ServiceBus (my favorite, Sean), then we can notify the users that they have claimed to tickets to payments via Server Side Events, which is a fast simple solution.

Database selection

The database selection should align with the core features and system requirements. In our system, it is crucial that the database can handle complex queries, such as joining ticket and arena information. Since we handle locking outside the database, there is no strong requirement for ACID compliance.

Complex Queries: The database must efficiently support complex queries, particularly joins between ticket and arena data. This ensures that the system can quickly and accurately retrieve the necessary information for various operations, such as viewing event details and available tickets.

Scalability: The chosen database should scale horizontally to accommodate high traffic, especially during peak times when popular events are being booked. This includes the ability to distribute data and queries across multiple servers to maintain performance.

Consistency: While strict ACID compliance is not necessary due to external locking mechanisms, the database should still provide eventual consistency to ensure data accuracy over time. This is particularly important for maintaining the integrity of ticket reservations and bookings.

Performance: The database should be optimized for read-heavy operations, as users frequently search for events and view event details. High read performance is critical to providing a responsive user experience.

Postgres can indeed fulfill the requirements for the TicketMaster system design. It supports complex queries efficiently, offers solutions for horizontal scalability, provides strong consistency guarantees, and can be optimized for high performance. By leveraging modern extensions and configurations, Postgres can be a robust choice for the database backend in a system designed to handle high demand and complex operations.

Conclution

Designing a robust TicketMaster system requires careful planning and attention to detail. By identifying and correcting common mistakes in functional and non-functional requirements, core entities, API design, and high-level architecture, we can ensure a system that meets user needs and handles high demand effectively. Implementing these corrections will lead to a more reliable, scalable, and user-friendly ticketing platform.

Buy me a coffee

--

--