PDA

View Full Version : Packages cache for local network using Squid proxy server



srakitnican
15th September 2016, 08:31 PM
If it is wanted to save internet bandwidth it is possible to setup local mirror to share packages among all computers on network so that download happens only once. Standard mirroring downloads whole repositories which may be disk and network consuming. In addition to that, not all packages will get used by computers on the network so it is synchronizing stuff that we are never going to use. This is usually overkill for home or small office use.

By using Squid on the other hand, it is possible to cache only packages we are using, so next time some computer on the network requires it, it will get it from cache. But there is a catch, this does not work as is; Since packages managers uses mirrors, links to packages also change based on mirror used so default Squid setup is not seeing a package as same from different mirrors. So we need some way to tell squid to store packages files by package name, packages rarely or never change and we don't care from what url it came from as it is most likely the same file. Squid 3.4 implemented feature Store ID that allows to map files to custom ID. We can decide if we want to pass url to a helper program based on rule. Helper program further decides what to do with link and responds to Squid with a storage ID for that file. The most simple way we can do what we want is to trim url to package name in helper program and return that to Squid as storage ID.


Squid setup:

If using Fedora for a server just install Squid from default repositories.


$ sudo dnf install squid

CentOS 7 provides older Squid 3.3 version that doesn't support Store ID feature so third party repo is required. Squid wiki lists some repositories that can be used. CentOS 7.3 got Squid 3.5 that has the required functionality.

Install store_id_program.py from here (https://github.com/yevmel/squid-rpm-cache), to /usr/local/bin/ and make it executable. Modify Squid configuration /etc/squid/squid.conf based on setting from link. 10000 (10GB) represents cache size to use for storing files and may be adjusted to appropriate size you have available, but it is recommended to not use more then 70% of space available.


# 3 month 12 month
refresh_pattern . 129600 33% 525600


cache_dir ufs /var/spool/squid 10000 16 256

store_id_program /usr/local/bin/store_id_program.py
store_id_children 5 startup=1

# have not seen a larger RPM yet
maximum_object_size 1 GB

# cache RPMs only
acl rpm_only urlpath_regex \.rpm
cache allow rpm_only
cache deny all

After that we need to start/restart squid and enable service:

$ sudo systemctl start squid
$ sudo systemctl enable squid

Network setup:

The network setup is pretty simple, all we need to do is to tell each machine on a network to use our proxy/cache server. We do that by configuring dnf proxy= line. Server running squid must have its TCP port open to be able to accept such requests. Default proxy port for squid is 3128.

Configure dnf for each machine on the network with our proxy: Add a line to /etc/dnf/dnf.conf:

proxy=http://<server ip>:3128
Where <server ip> is ip address of server running Squid cache.

Open squid's port 3128 on server:

$ sudo firewall-cmd --permanent --add-port=3128/tcp
$ sudo firewall-cmd --reload



For troubleshooting purposes or to see if our cache is working as expected we can monitor Squid's log file:

$ sudo tail -f /var/log/squid/access.log

hobbes69
27th July 2017, 10:23 PM
Changing the regex to:
acl rpm_only urlpath_regex \.[d]?rpm

Should also match delta RPMs...

Thanks,
Richard

srakitnican
4th August 2017, 07:22 AM
Hi,

Just changing squid configuration is not enough, store_id_program needs that modification as well, right?

I didn't want to bother with drpms since they are small already and can not be applied in all cases, if packages went out of date too much for example.