Requests per second (RPS) is a metric that measures the throughput of a system, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric. Throughput is the amount of a product or service that a company can produce and deliver to a client within a specified period of time.
The requests can be as simple as getting a URL from a web server or can be other kinds of server requests like, database queries, fetching e-mail, bank transactions, etc. The basic principles remain the same.
There are two types of requests, Input/output (I/O) bound and central processing unit (CPU) bound.
Typically, requests are limited by the input and output. That means that it fetches the information from a database, or reads a file, or gets the info from the network. CPU is doing nothing most of the time. Due to the wonders of the Operative System, you can create multiple instances that will keep doing requests while other instances wait. In this case, the server is limited by the number of instances it has running at one time. That means more memory in your RAM, more instances can run simultaneously.
In input/output systems, the number of requests per second is bound by the memory available. To calculate requests per second you divide the memory by memory required for an instance and then multiple it by the inverse of task time taken to complete an instance.
Some other requests, like image processing or doing calculations, are CPU-bound. That means that the limiting factor is the amount of CPU power the machine has. Having a lot of instances does not help, as only one instance can be computed at a time. Two cores in the CPU mean two instances can run simultaneously. The limit here is CPU power and the number of cores it has.
In CPU-bound systems, requests per second is calculated by multiplying the number of cores by the inverse of the task time to complete an instance.