Our log data is sent to a log server which then saves it in a PostgreSQL database. To prevent the database from growing too large, we periodically remove data older than 24 hours from the database and save it for future query. However, queries on the remaining data is still not fast enough for realtime monitoring. Besides, the vacuum process of PostgreSQL fails to handle gracefully the large number of deletes and slows down our system significantly when it runs.
I’d like to replace the whole logging system with a Hadoop-based system. But right now, I need something that can be up and running in 2 to 3 days and has no or minimal impact on the rest of the system.
The log server would save all log data into log files. At the same time, it runs a embedded in-memory database that has the most recently log data for our realtime monitoring applications to query. Of the final choices of in-memory databases in H2, HSQLDB and Apache Derby, I decide to go with H2.
Since our monitoring applications are in Python, access to H2 from Python is important. There’s no direct support of H2 in Python, so H2 needs to be run in PostgreSQL mode.
When using H2’s default JDBC driver, the URL jdbc:h2:tcp://127.0.0.1:5432/mem:hc_log works as expected. The prefix mem: before database name hc_log is necessary to connect to an in-memory database. The PostgreSQL equivalent jdbs:postgresql://127.0.0.1:5432/mem:hc_log doesn’t work. Instead of mem:hc_log, only mem is passed to H2 as database name. The solution is to use mem@hc_log as database name, and apply a small patch to H2’s ConnectionInfo to restore the name back to mem:hc_log.
H2’s PostgreSQL mode doesn’t work with JDBC4 drivers; JDBC2 is required.