RTPMixSound a tool to mix pre-recorded audio in real-time with the audio (i.e. RTP) in the specified target audio stream. these tools take the contents of a .wav or tcpdump format file and insert or mix in the sound. These tools require access (sniffing of the VoIP traffic but not necessarily MITM) to the RTP stream, so they can properly craft sequence numbers, timestamps, etc. rtpinsertsound, with the right timing, can be used to add words or phrases to a conversation. rtpmixsound can be used to merge in background audio, like noise, sounds from a “gentlemans club”, curse words, etc., etc. These tools have been tested in a variety of vendor environments and work in pretty much any environment, where encryption isn’t used.
-a source RTP IPv4 addr -A source RTP port -b destination RTP IPv4 addr -B destination RTP port -f spoof factor - amount by which to: a) increment the RTP hdr sequence number obtained from the ith legitimate packet to produce the RTP hdr sequence number for the ith spoofed packet b) multiply the RTP payload length and add that product to the RTP hdr timestamp obtained from the ith legitimate packet to produce the RTP hdr timestamp for the ith spoofed packet c) increment the IP hdr ID number obtained from the ith legitimate packet to produce the IP hdr ID number for the ith spoofed packet [ range: +/- 1000, default: 2 ] -i interface (e.g. eth0) -j jitter factor - the reception of a legitimate RTP packet in the target audio stream enables the output of the next spoofed packet. This factor determines when that spoofed packet is actually transmitted. The factor relates how close to the next legitimate packet you'd actually like the enabled spoofed packet to be transmitted. For example, -j 10 means 10% of the codec's transmission interval. If the transmission interval = 20,000 usec (i.e. G.711), then delay the output of the spoofed RTP packet until the time-of-day is within 2000 usec (i.e. 10%) of the time the next legitimate RTP packet is expected. In other words, delay 100% minus the jitter factor, or 18,000 usec in this example. The smaller the jitter factor, the greater the risk you run of not outputting the current spoofed packet before the next legitimate RTP packet is received. Therefore, a factor > 10 is advised. [ range: 0 - 80, default: 80 = output spoof ASAP ] -p seconds to pause between setup and injection -h help - print this usage -v verbose output mode
cyborg@cyborg:~$ sudo rtpmixsound a.wav Targeting interface eth0 libfindrtp_find_rtp(): using pcap filter "ip". State: ip_a == | port_a == 0 | ip_b == | port_b == 0 State: ip_a == | port_a == 0 | ip_b == | port_b == 0