MSA Struggles - (1) Resolving Java's DNS Cache Issue


The services I built are fundamentally all DNS-based. account.yangs.internal, search.yangs.internal, mariadb.yangs.internal, and so on...
When problems occur during service operation, or due to configuration changes, or internal Amazon issues, we operate with HA by switching IPs in DNS.
One day, a DB failure occurred and the IP was automatically changed, but the transition didn't happen at the application level, so we had to resolve it by restarting.
I checked if it was a DNS server configuration issue, but it wasn't a DNS server problem - it was an issue caused by JAVA client's DNS Cache policy.
Today, I want to talk about that story.
Java's Policy to Prevent DNS Attacks
It varies by JVM version, but before version 1.6, the default was Forever, and after version 1.6, a 30-second DNS Cache policy is applied by default.
The reason given is to prevent DNS attacks.
In the past, IPs rarely changed, so there probably weren't many problems, but in today's environment, I think this policy is somewhat problematic.
Google's reCAPTCHA documentation also mentions DNS Cache issues when developing with Java applications.
[
Using reCAPTCHA with Java/JSP | Google Developers Important: Version 1.0 of the reCAPTCHA API is no longer supported, please upgrade to Version 2.0. Learn more The reCAPTCHA Java Library provides a simple way to place a CAPTCHA on your Java-based website, helping you stop bots from abusing it. The library developers.google.com
](https://developers.google.com/recaptcha/old/docs/java?hl=ko)
Java's DNS Cache Policy
I briefly mentioned Java's DNS Cache policy in the introduction, but let me summarize it once more as a review.
It varies slightly depending on whether SecurityManager exists. If SecurityManager exists, for security reasons, the cache is not expired while the application is running. If SecurityManager doesn't exist, caching for 30 seconds is the default.
Can This Cache Policy Be Changed?
Of course there's a way. Java provides the following two methods:
- Changing the content in Java configuration file
Open the file at $JAVA_HOME/jre/lib/security/java.security and modify the key networkaddress.cache.ttl.
If you want to change it, uncomment it and enter your desired time. The unit is seconds.
networkaddress.cache.ttl=0
- Configuration through SecurityManager
While changing the policy globally with option 1 works, there are various situations where you need to modify only specific applications.
In that case, add the following during the Application initialization process:
java.security.Security.setProperty("networkaddress.cache.ttl" , "60");
In my case, I trusted our DNS server, but I set it to 60 seconds just in case of unexpected situations.
Conclusion
With the increase in cloud servers where IPs change frequently, and situations like MSA where services are split across multiple servers, requests based on DNS rather than IP are becoming more common - this is an issue everyone encounters at some point.
I hope others who experience this issue can read this article and feel less overwhelmed, and with that, I'll conclude.
Appendix 1. How Does Python Handle This?
[
[I Don't Know DNS Caching] Hmm, I'm a "talk development" expert CharSyam. I've been doing "talk development" for quite a while, implemented DNS protocol directly, and even built Dynamic DNS based on Zookeeper, so I thought I knew it well (read this as "actually knows nothing"). But something happened that broke my common sense. (It's my common sense, not others', so... charsyam.wordpress.com
](https://charsyam.wordpress.com/2017/12/22/%ec%9e%85-%ea%b0%9c%eb%b0%9c-i-dont-know-dns-caching/)
When wrapping up this issue, I wondered how other languages handle it, and I looked into how Python, which is supposedly popular these days, handles it.
It was a bit of a shock. It means the language level doesn't cache DNS at all. Of course, it's a sudden change of stance while discussing Java DNS Cache issues, but I was just a bit shocked.
In conclusion, this means they accept the latency of DNS lookup time for each request. This can be solved by installing daemons like Linux NSCD, but it was still a bit shocking.