View Full Version : Packages cache for local network using Squid proxy server

15th September 2016, 08:31 PM
If it is wanted to save internet bandwidth it is possible to setup local mirror to share packages among all computers on network so that download happens only once. Standard mirroring downloads whole repositories which may be disk and network consuming. In addition to that, not all packages will get used by computers on the network so it is synchronizing stuff that we are never going to use. This is usually overkill for home or small office use.

By using Squid on the other hand, it is possible to cache only packages we are using, so next time some computer on the network requires it, it will get it from cache. But there is a catch, this does not work as is; Since packages managers uses mirrors, links to packages also change based on mirror used so default Squid setup is not seeing a package as same from different mirrors. So we need some way to tell squid to store packages files by package name, packages rarely or never change and we don't care from what url it came from as it is most likely the same file. Squid 3.4 implemented feature Store ID that allows to map files to custom ID. We can decide if we want to pass url to a helper program based on rule. Helper program further decides what to do with link and responds to Squid with a storage ID for that file. The most simple way we can do what we want is to trim url to package name in helper program and return that to Squid as storage ID.

Squid setup:

If using Fedora for a server just install Squid from default repositories.

$ sudo dnf install squid

CentOS 7 provides older Squid 3.3 version that doesn't support Store ID feature so third party repo is required. Squid wiki lists some repositories that can be used. CentOS 7.3 got Squid 3.5 that has the required functionality.

Install store_id_program.py from here (https://github.com/yevmel/squid-rpm-cache), to /usr/local/bin/ and make it executable. Modify Squid configuration /etc/squid/squid.conf based on setting from link. 10000 (10GB) represents cache size to use for storing files and may be adjusted to appropriate size you have available, but it is recommended to not use more then 70% of space available.

# 3 month 12 month
refresh_pattern . 129600 33% 525600

cache_dir ufs /var/spool/squid 10000 16 256

store_id_program /usr/local/bin/store_id_program.py
store_id_children 5 startup=1

# have not seen a larger RPM yet
maximum_object_size 1 GB

# cache RPMs only
acl rpm_only urlpath_regex \.rpm
cache allow rpm_only
cache deny all

After that we need to start/restart squid and enable service:

$ sudo systemctl start squid
$ sudo systemctl enable squid

Network setup:

The network setup is pretty simple, all we need to do is to tell each machine on a network to use our proxy/cache server. We do that by configuring dnf proxy= line. Server running squid must have its TCP port open to be able to accept such requests. Default proxy port for squid is 3128.

Configure dnf for each machine on the network with our proxy: Add a line to /etc/dnf/dnf.conf:

proxy=http://<server ip>:3128
Where <server ip> is ip address of server running Squid cache.

Open squid's port 3128 on server:

$ sudo firewall-cmd --permanent --add-port=3128/tcp
$ sudo firewall-cmd --reload

For troubleshooting purposes or to see if our cache is working as expected we can monitor Squid's log file:

$ sudo tail -f /var/log/squid/access.log

Client(s) setup:

Lately more and more mirrors are providing both http and https connection and dnf seems to prefer to https. This is an issue for squid since it can't read encrypted connection thus it just "tunnels" the connection and package gets downloaded again. The solution for that is to add "&protocol=http" to all the clients metalink URL.

27th July 2017, 10:23 PM
Changing the regex to:
acl rpm_only urlpath_regex \.[d]?rpm

Should also match delta RPMs...


4th August 2017, 07:22 AM

Just changing squid configuration is not enough, store_id_program needs that modification as well, right?

I didn't want to bother with drpms since they are small already and can not be applied in all cases, if packages went out of date too much for example.