I have modified the software today to be able to block the actual endpoints of the shortened URLs usually included in CP spam. This should help a lot with slowing down the spam; the spammers have infinite URL shorteners but so far they all point to the sam

Name	61812
Email
Subject	Spoiler Image
Comment
File
Embed
Password	(For file deletion.)

File: 1721261022964.jpg (1.12 MB, 1450x1337, o8y5qpaael9y.jpg)

Newspost: Not-So-Secret Anti-Spam Weapon Completed Seisatsu ## Owner 07/18/24 (Thu) 00:03:43 No.1198

I have modified the software today to be able to block the actual endpoints of the shortened URLs usually included in CP spam. This should help a lot with slowing down the spam; the spammers have infinite URL shorteners but so far they all point to the same few places. We will start checking endpoints and adding them to the blocklist soon. Hopefully browsing will become less stressful.

This modification has also been submitted as a pull request to the main vichan repo. Hopefully they pick it up and other sites can start using it.

P.S. I also upgraded us to the newest version of the software. Stuff could break so please let me know if something isn't working right.

Anonymous 07/18/24 (Thu) 01:21:18 No.1199

>>1198
thank you thank you thank you

4chon 07/18/24 (Thu) 05:38:01 No.1201

Won't work in the long run. The CP spammer will just look at this regex pattern that you've submitted to vichan publicly and figure out how to bypass it. Like I've already bypassed it in regex101.com just by adding a space in between. Or he'll just put the links in the images like he's done before.

The only ironclad solution is to block all VPNs at the firewall level (e.g. UFW, iptables). This list is accurate and regularly-updated.

https://github.com/X4BNet/lists_vpn/blob/main/output/datacenter/ipv4.txt

You would write a Python script that:

1. Downloads that text file locally (e.g. /opt/vichan/ipv4.txt).
2. If /opt/vichan/ipv4_old.txt exists, add every line to a set variable called "old_ranges" and add every line from ipv4.txt to another set called "new_ranges". Check to see if every range in "old_ranges" exists in "new_ranges", and for those match in "new_ranges" remove it from that set, and for those that don't exist in "new_ranges", add those ranges to a set variable called "to_be_deleted". Add the ranges from "new_ranges" to UFW or iptables via the correct corresponding external commands (subprocess etc.) and delete the ranges from "to_be_deleted" in UFW or iptables via the correct corresponding external commands. Delete ipv4_old.txt and rename ipv4.txt to ipv4_old.txt.
3. If /opt/vichan/ipv4_old.txt does not exist, add all the ranges from ipv4.txt to UFW or iptables via the correct corresponding external commands and then rename ipv4.txt to ipv4_old.txt.
Then you would have this Python script run as a yearly cron job.

I'm sure ChatGPT can do all of this easily within the snap of a finger, but I don't have a VM right now with vichan installed to actually test this (and verify what it does with UFW/iptables, etc.) and I'm too busy to deal with this any time soon.

In the meantime you should add this DNSBL's to your vichan config.php or instance-config.php if you haven't already, these have no false-positives and block Tor effectively.

$config['dnsbl'][] = 'rbl.efnetrbl.org';
$config['dnsbl'][] = 'dnsbl-1.uceprotect.net';
$config['dnsbl'][] = 'dnsbl.dronebl.org';
$config['dnsbl'][] = 'torexit.dan.me.uk';
$config['dnsbl'][] = 'dnsbl.tornevall.org';

Seisatsu ## Owner 07/18/24 (Thu) 05:46:25 No.1203

>>1201
I don't think they are using VPNs, I think they are using a botnet of compromised computers. Blocking ranges would be useless. We are already using DNSBL.

They can only abstract a URL so far before it becomes unreadable. I will work around it as much as is necessary until no one could even read their URLs and their scheme is completely worthless. A war of attrition.

4chon 07/18/24 (Thu) 06:16:47 No.1204

File: 1721283406601.png (9.35 KB, 580x400, 16-bit-range-ban.png)

>>1203
>I don't think they are using VPNs, I think they are using a botnet of compromised computers.
I checked individually most of the IPs banned via https://networksdb.io/ip/[INSERT_IP] . They all belong to a data center. You can check for yourself with that link next time he spams. Packethub S.A, OVH and M247 Ltd are the most common. As well as basically all the ranges known to be used by all the ones used by NordVPN, ExpressVPN et al. So this spammer is not that sophisticated, he's using mainstream VPN services. We've already mostly eliminated his spamming just by range-banning because he's run out of ranges to use from his VPNs, I can see all the checkmark symbols on the ban list page where he's attempted to use a range-banned IP range again over the past months. I added a 16-bit rangeban button for the ban page (VICHAN_ROOT/templates/mod/ban_form.html) to make it easier for the mods to rangeban quicker instead of having to type *.* or 0.0/16 manually. Maybe not the most elegant since that's 65,535 IPs instead of the narrower ranges in that above ipv4.txt file, but we haven't had appeals complaining about false positives. Hence why I'm not bothering with that approach yet since the spam appears to be gone now (unless he's gone on vacation or something lol).

Button:

&lt;input name="16bitrange" type="button" value="16-bit range" onclick="function t(){var i=document.getElementById('ip'),v=i.value,p=v.split('.');if(p.length&gt;=3){i.value=p[0]+'.'+p[1]+'.*'}}t()"&gt;

In context, in ban_form.html:

&lt;td&gt;
	{% if not hide_ip %}
		&lt;input type="text" name="ip" id="ip" size="20" maxlength="40" value="{{ ip|cloak_ip|e }}"&gt;
		&lt;input name="16bitrange" type="button" value="16-bit range" onclick="function t(){var i=document.getElementById('ip'),v=i.value,p=v.split('.');if(p.length&gt;=3){i.value=$
	{% else %}
		&lt;em&gt;{% trans 'hidden' %}&lt;/em&gt;
	{% endif %}
&lt;/td&gt;

4chon 07/18/24 (Thu) 06:19:08 No.1205

>>1204
><
>>
These are angle brackets btw, not sure why your code tag is screwing it up.

Anonymous 07/18/24 (Thu) 10:21:34 No.1206

File: 1721298093946.jpeg (42.97 KB, 1024x576, alice_wow.jpeg)

>>1201
You can see how our strategy differs from a typical text pattern match in combination with other vichan features because we are matching spammer's endpoints, not his bait. Banned text in images is a long-ago solved problem by the vichan team. Make sure to update your software. Sei didn't.

>>1204
Would you be so kind and actually submit these changes to vichan directly, please? A dedicated range ban button (especially one with adjustable range size) would be awesome for all boards.

And while mass banning datacenters to deplete the adversary's resources is feasible (your observations match with mine, albeit I have seen posts from unmarked IPs in the Netherlands as well), it's a cat&mouse game as well and has a crossfire potential for the users that actually use such IPs legitimately such as me (e.g. those for whom this site is blocked by their ISP, country or big tech conglomerates) and would require even more human intervention in case of false positives. See >>1000 for more details.

>angle brackets
I think it's a bug where "htmlspecialchars" is performed post-wide, including code blocks. Will take a look if it's present in vanilla vichan as well.

PS: No need to sage, we love you and your posts <3

Anonymous 07/18/24 (Thu) 11:27:48 No.1207

>>1206
sage is not a downvote.

Seisatsu ## Owner 07/18/24 (Thu) 16:46:29 No.1209

>>1204
This is a neat little trick and some good info, though I'd like to see how my solution works before banning whole ranges, I will keep it up my sleeve. Thanks.

>>1206
I updated the software yesterday so that I could send my mod upstream. :v

If there is common text in the images we will get those. If it's shifting URLs, I will conjoin my unshortening solution with the OCR code and pipe extracted text through the new filter.

Anonymous 07/19/24 (Fri) 18:07:08 No.1210

Hopefully it works in keeping out the pedosans but assuming they are sentient and not mechanotrons they may find a work around. I really don't get why they spam this stuff. Assuming they are commercial spammers, they can't be getting many customers from sites like these. What if we just hunt down the pedo spammers and kill them all? No man no spam.

Anonymous 07/20/24 (Sat) 01:51:40 No.1211

Just checking how strict the IP filters are, non-maliciousy.

Anonymous 07/25/24 (Thu) 06:38:26 No.1213

File: 1721889505915.jpg (847.61 KB, 1269x1500, E0D40lxVIAE701J.jpg)

i don't think i have seen anything nasty since this!

Anonymous 07/31/24 (Wed) 15:17:35 No.1220

File: 1722439055203.jpg (449.73 KB, 1656x2477, 1721755623408.jpg)

almost two weeks without looking at cp! my psychiatrist will be in awe

Anonymous 07/31/24 (Wed) 19:51:36 No.1231

>>1220
Tachibana-san :3

Anonymous 09/06/24 (Fri) 22:34:43 No.1239

Sei's new anti-spam weapon seems to be working. Nice.

4chon 12/05/24 (Thu) 00:45:42 No.1256

File: 1733359541672.png (5.67 KB, 537x211, citations-punctuation.png)

soysatsu did you send your changes to vichan for inc/functions.php to fix horizontal cites and common punctuation surrounding the cites?

it looks like you fixed it on your site

>>1213 >>1220 >>1231 >>1239

>>1213
>>1220
>>1231
>>1239

>>1213black dragon roll >>1213? >>1213! >>1213. (>>1213) (>>1213 >>1213)
(>>1213, >>1213, >>1213)

heres the code for our site that also fixes it, it's the if-block modified under the "// Cites" comment

	if (isset($board) &amp;&amp; preg_match_all('/(^|\s|(?&lt;=\())&amp;gt;&amp;gt;(\d+)(?=\s|[!?.,)\]])/m', $body, $cites, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
		if (count($cites[0]) &gt; $config['max_cites']) {
			error($config['error']['toomanycites']);
		}

		$skip_chars = 0;
		$body_tmp = $body;

		$search_cites = array();
		foreach ($cites as $matches) {
			$cite_id = isset($matches[2]) &amp;&amp; $matches[2][0] ? $matches[2][0] : $matches[3][0];
			$search_cites[] = '`id` = ' . $cite_id;
		}
		$search_cites = array_unique($search_cites);

		$query = query(sprintf('SELECT `thread`, `id` FROM ``posts_%s`` WHERE ' .
			implode(' OR ', $search_cites), $board['uri'])) or error(db_error());

		$cited_posts = array();
		while ($cited = $query-&gt;fetch(PDO::FETCH_ASSOC)) {
			$cited_posts[$cited['id']] = $cited['thread'] ? $cited['thread'] : false;
		}

		foreach ($cites as $matches) {
			$cite = isset($matches[2]) &amp;&amp; $matches[2][0] ? $matches[2][0] : $matches[3][0];

			// preg_match_all is not multibyte-safe
			foreach ($matches as &amp;$match) {
				$match[1] = (int)mb_strlen(mb_substr($body_tmp, 0, $match[1])); // Convert byte offset to multibyte offset
			}

			if (isset($cited_posts[$cite])) {
				$replacement = '&lt;a onclick="highlightReply(\''.$cite.'\', event);" href="' .
					$config['root'] . $board['dir'] . $config['dir']['res'] .
					link_for(array('id' =&gt; $cite, 'thread' =&gt; $cited_posts[$cite])) . '#' . $cite . '"&gt;' .
					'&amp;gt;&amp;gt;' . $cite .
					'&lt;/a&gt;';

				// Multibyte-safe replacement
				$start = mb_strlen(mb_substr($body, 0, $matches[0][1] + $skip_chars));
				$length = mb_strlen($matches[0][0]);

				$body = mb_substr($body, 0, $start) . 
						$matches[1][0] . $replacement . 
						mb_substr($body, $start + $length);

				$skip_chars += mb_strlen($matches[1][0] . $replacement) - $length;

				if ($track_cites &amp;&amp; $config['track_cites'])
					$tracked_cites[] = array($board['uri'], $cite);
			}
		}
	}

4chon 12/05/24 (Thu) 00:49:07 No.1257

>>1256
o u still didnt fix ur HTML special characters issue, bunch of characters converted to "&" ">" and such

Seisatsu ## Owner 12/05/24 (Thu) 22:38:27 No.1258

>>1256
I don't remember modding anything to this effect, are you on the newest version of vichan?

>>1257
Where are you seeing this?

4chon 12/18/24 (Wed) 08:46:17 No.1267

>>1258
Your code tag is converting angle brackets and stuff for some reason.
You can try my code setting in config.php:
$config['markup_code'] = array("/\[code\](.*?)\[\/code\]/is");

By the way your images are all displayed as blurry on your site when I expand them.

Change in lines 201-203 https://sushigirl.us/stylesheets/style.css
.full-image {
max-width: 98%;
}
to
.full-image {
max-width: 100%;
}

4chon 12/25/24 (Wed) 21:45:45 No.1273

Install firewalld

sudo apt update
sudo apt install firewalld

Install Python requests module

pip3 install requests

Download vpn_block.py
https://pastebin.com/83DWqSJp
https://pastebin.com/raw/83DWqSJp <- download raw, i despise space indentation

Run script (at your own risk)

sudo python3 /path/to/vpn_block.py

disclaimer: haven't tested it yet lol. copilot, chatgpt and pylint (with style/pep warnings disabled) all say "looks good to me". made sure to ask them thoroughly about whether there could be any conflicts/issues with iptables and ufw, that this won't possibly leave junk behind, etc. they said nope.

btw you should enable quick-post-controls.js so people can report posts more easily
https://github.com/vichan-devel/vichan/blob/master/js/quick-post-controls.js

4chon 12/26/24 (Thu) 17:26:35 No.1276

>>1273
Changing my mind about this. There are some people who use VPNs on our site, so I think it's better to just add another SQL query step in post.php and block them at the database (MySQL/MariaDB) level by adding another table of just these CIDR addresses instead of blocking them with firewalld, because otherwise they may assume the site is down or something. I don't think it should be a problem to just create a new table with 500k entries, and continue to have a cron job to clear and update it annually.

Anonymous 12/26/24 (Thu) 18:08:38 No.1277

>>1276
Currently the precedent is not to blanket-ban any and all suspected datacenters and VPNs from viewing the site. Using these methods is not bannable and doing so would generally hurt our users. I see that you have figured that out yourself.

Even then, vichan already supports DNSBL, why is this needed?

In addition, it won't generally stop CP4 as they are not using obvious datacenter IPs anymore. I still believe content-based filtering is the only long-term solution with a plausible effect. And empirically it seems to work pretty well.

Anonymous 12/29/24 (Sun) 19:36:44 No.1280

File: 1735501004537.webm (10.67 MB, 650x576, usagi_be_like.webm)

Thanks to the moderation team for their work this year.

Do you think the weaponry finally did its trick? Or do you believe we are still losing users left and right due to slow moderation?

Please do not bonk me for the vid, it took forever to render.

Anonymous 12/29/24 (Sun) 20:16:11 No.1281

>>1280
Saved - I read every post from the webm "live"… :D It was a fun year. Recently I gave up on contributing here, though. It just felt like talking to myself. Maybe my posts were not worth a reply..? There is hardly any new threads, too. Board seems barren even more so when you consider sushigirl's Discord is beaming with life.

Staff is not too fast with the cleanup either, and its weird to me that the anal poster is not dealt with via software - like, with "take action now.*anal" regex filter?