1

The Problem & The Question

I'm trying to create an AWS ElastiCache Redis-flavored cluster and connect to it from an instance in the same VPC. When I create the cluster initially, I can connect to it just fine via redis-cli. If I wait a day or two and then try to connect to it again via redis-cli, I get the following error:

$ redis-cli -h <aws-elasticache-cluster-primary-endpoint>
Could not connect to Redis at <aws-elasticache-cluster-primary-endpoint>:6379: Name or service not known
not connected> 

I should be able to connect again with no issue. Why would I be getting DNS errors the next day when it worked initially? Nothing has changed about the cluster since I created it. How could I fix this?

Steps to Reproduce

  1. ElastiCache Dashboard -> Redis -> Create.

  2. Set the following options:

    Option Value
    Cluster Engine Redis
    Location Amazon Cloud
    Engine version compatibility 6.x
    Port 6379
    Parameter group default.redis6.x
    Node type cache.t3.micro
    Number of replicas 0
    Multi-AZ false
    Subnet group default
    Security groups default
    Encryption at-rest false
    Encryption in-transit false
  3. Create.

  4. After creating, verify you can connect to the cluster with:

    redis-cli -h <aws-elasticache-cluster-primary-endpoint>

  5. Disconnect.

  6. Try reconnecting with the same redis-cli command a few days later. You should see the errors shown in the section above.

Additional Information

nping

$ sudo nping --tcp -p 6379 <aws-elasticache-cluster-primary-endpoint>
Failed to resolve given hostname/IP: <aws-elasticache-cluster-primary-endpoint>.  Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
Cannot find a valid target. Please make sure the specified hosts are either IP addresses in standard notation or hostnames that can be resolved with DNS

nslookup

$ sudo nslookup <aws-elasticache-cluster-primary-endpoint>
Server:         75.75.75.75
Address:        75.75.75.75#53

** server can't find <aws-elasticache-cluster-primary-endpoint>: REFUSED

Reachability Analyzer

I followed AWS' instructions for testing connectivity from my EC2 instance to the ElastiCache cluster. The results I got back for the connectivity were:

Reachability Status State
Reachable Succeeded

Cluster Metrics

Metric Value
CPU Utilization ~1.000%
Engine CPU Utilization ~0.283%
Database Memory Usage Percentage ~1.093%

Local Wireguard configuration

[Interface]
PrivateKey = <value>
ListenPort = 21841
Address = 10.0.0.2/32
DNS = 9.9.9.9 # 75.75.75.75 75.75.76.76

[Peer] PublicKey = <value> AllowedIPs = 0.0.0.0/0, ::/0 Endpoint = <public-IP>:51820

Zachary Delano
  • 116
  • 3
  • 11

1 Answers1

1

It could be due to a "VPC DNS throttling", you can find more information on the official documentation.

https://aws.amazon.com/premiumsupport/knowledge-center/vpc-find-cause-of-failed-dns-queries/

Hik
  • 11
  • 1