Boosting DB Application using simple caching
Just now, i got a project that quite complex, honestly i didn’t expect this case before. Requirement of this project is as below
Total of users is +- 15.000, but active users is about+- 500
Web application will be accessed by active users that 1 second average call api per user about 2–100 call/second (average 50 call per user/second), so simple math avg request per second for this web apps is 50*500 = 25.000 req/second
Database using Postgres
DB pooling that will be given for this web apps is only 20 (maximum)
My client will be give me server for my web apps with specifications : 32 core CPU, 128GB RAM
I can only install applications and tools on the server provided, for example, if I need redis, I can only install redis on that server.
that call api is about showing historycal data that aggregate from few tables (4 tables : table A have 1 million rows, table B have 2 million rows, table C have 27 million rows, table D have 500 million rows)
especially for table D will be equated the amount of data as table C.
changes in data in tables C and D occur every second
The aggregate data displayed from tables C and D is hystorical data per minute
Using Java and Spring Boot
And for that all requirements, the team i have for build that web apps is just only me.
For the first time, i just build solution using only db pooling with aggregate query. but, the result for me is so bad. When i try to strest test using wrk, only got +- 68 request/second
Then I thought that I couldn’t rely on database capabilities alone, especially since the db pooling provided was quite limited
and I need something that can be relied upon in addition to the database provided as the source of the data.
I think the only power I can use is the large RAM size
as we know that the key to the performance and speed of an application is caching and because the servers provided have large RAM, the caching mechanism must take advantage of large RAM
Then I started researching what tools could take advantage of the power of this large RAM. Some of them are Apache ignite, redis, mysql memory engine
I tried the mysql memory engine, the idea is that I will mirror the data from the postgres database to the mysql memory engine, and I will make a query to the mysql memory engine and increase the db pool for mysql.
But when I try to query the mysql memory engine the results are not what I expected. it turns out that the performance of the mysql memory engine is not very fast compared to postgres.
next time I tried redis, there was an improvement, but it’s not like I thought the increase is only 350% or around 250 req / sec, of course this is very far from the needs of this application which is a minimum of 25,000 req / sec
then i tried apache ignite, and since i use single node cluster (since i do this myself i have limited time to setup large scale cluster). The result was not what I wanted. And there are some limitations in Apache Ignite, namely the aggregate functions are quite small and we cannot create custom functions to aggregate in Apache Ignite.
it honestly makes me quite stressed.
but thank God, God gave me a hint, namely by using simple caching by storing all the data in these tables in variables which I will later aggregate and filter using Java Stream. This idea was inspired by Speedment’s hyperstream. At first I wanted to use Speedment’s Hyperstream, but because I didn’t have the budget to buy the license, I gave up using it. And i started to build my own hyperstream implementation, voila the results makes me happy, i got avg 30.000 req/sec using it.