Product-In-Focus: Spotify

One of my favorite software products would be Spotify, since it is my primary music streaming service and I love listening to music while on a long-drive, exercising or listening to podcasts in general. From the perspective of a user, I’ve tried to cluster some of the pain points (features and functionality based) I have with this app, and made an urgency tier list (top-down: easier->harder to implement) of changes:

I’ve also attached the affinity map working layout for Spotify here:

On a concluding note, it would be imperative to point out that perhaps the next segment they should start focusing on-be it via business or health podcasts or introducing more retrofit stations- is the over 30 age-group.

The Memer’s Currency

Image

What is Dogecoin?

According to the official website: “Dogecoin is a decentralized, peer-to-peer digital currency that enables you to easily send money online”.

It was created by IBM developer Billy Markus and Adobe data scientist Jackson Palmer in 2013 as an alternative to Bitcoin and a parody of the online cult around cryptocurrencies.

Despite, or possibly because of that, it has increased in price by 1,250% since the new year – even faster than its big brother.

It is named in honour of a Japanese Shiba Inu whose photo went viral at the time: captions were added in the Comic Sans font to convey an amusing imagined insight into the dog’s inner-most thoughts.

The code is based on the Litecoin, and can be mined in the same way as many other digital currencies. It differs because its supply is limitless. 

What is going on with its price?

 Prices of crypto have rocketed in the past three months driven by fears of inflation in real-world currencies, celebrity endorsements and growing mainstream acceptance.

Dogecoin’s anarchic spirit has struck a chord in the US, with Elon Musk, Snoop Dogg and Kiss’s Gene Simmons all raising its profile in recent days.

It has also been picked up by the high-risk betters who read Reddit’s WallsStreetBets channel.

On February 4, it doubled in value after Musk tweeted: “Dogecoin is the people’s crypto”.

Musk was joking. With Simmons, who declared himself the “God of Dogecoin”,  and rapper Snoop Dogg, who pasted a  Doge over his album art, it is less clear.

Either way, the social media frenzy sent the price of Doge shooting up again last night. The goal is to push it  to $1 per coin, which would require a further gain of around 1000%.

How do I buy Dogecoin?

Most crypto exchanges do not offer Doge but Binance and Kraken are the two biggest brokers who do. It is also available in the US on the Robinhood app, whose army of armchair users helped to fuel its latest rise.

Should I buy it?

Unless you are prepared to see your investment vanish in an unpredictable puff of digital smoke (see GameStop), or are one of the high-risk betters who think Reddit’s WallsStreetBets channel constitutes reasoned and sensible investment advice, then

No.

Music Composition with GANs

In this article, I’ll introduce GANSynth, a method for generating high-fidelity audio with Generative Adversarial Networks (GANs).

Why generate audio with GANs?

GANs are a state-of-the-art method for generating high-quality images. However, researchers have struggled to apply them to more sequential data such as audio and music, where autoregressive (AR) models such as WaveNets and Transformers dominate by predicting a single sample at a time. While this aspect of AR models contributes to their success, it also means that sampling is painfully serial and slow, and techniques such as distillation or specialized kernels are required for real-time generation.

Rather than generate audio sequentially, GANSynth generates an entire sequence in parallel, synthesizing audio significantly faster than real-time on a modern GPU and ~50,000 times faster than a standard WaveNet. Unlike the WaveNet autoencoders from the original paper that used a time-distributed latent code, GANSynth generates the entire audio clip from a single latent vector, allowing for easier disentanglement of global features such as pitch and timbre. Using the NSynth dataset of musical instrument notes, we can independently control pitch and timbre.

How does it work?

GANSynth uses a Progressive GAN architecture to incrementally upsample with convolution from a single vector to the full sound. Similar to previous work we found it difficult to directly generate coherent waveforms because upsampling convolution struggles with phase alignment for highly periodic signals. Consider the figure below:

The red-yellow curve is a periodic signal with a black dot at the beginning of the wave each cycle. If we try to model this signal by chopping it into periodic frames (black dotted line), as is done for both upsampling convolutions in GANs and short-time fourier transforms (STFT), the distance between the beginning of the frame (dotted line) and the beginning of the wave (dot) changes over time (black solid line). For a strided convolution, this means the convolution needs to learn all the phase permutations for a given filter, which is very inefficient. This difference (black line) is called the phase and it precesses over time because the wave and frames have different periodicities.1

As you can see in the example above, phase is a circular quantity (yellow bars, mod 2π), but if we unwrap it (orange bars), it decreases by a constant amount each frame (red bars). We call this the instantaneous frequency (IF) because the definition of frequency is the change in phase in time. An STFT compares a frame of signal to many different frequencies, which leads to the speckled phase patterns as in the image below. In contrast, when we extract the instantaneous frequencies, we see bold consistent lines reflecting the coherent periodicity of the underlying sound.

Results

In the GANSynth ICLR Paper, we train GANs on a range of spectral representations and find that for highly periodic sounds, like those found in music, GANs that generate instantaneous frequency (IF) for the phase component outperform other representations and strong baselines, including GANs that generate waveforms and unconditional WaveNets. We also find that progressive training (P) and increasing the frequency resolution of the STFT (H) boosts performance by helping to separate closely-spaced harmonics. The graph below shows the results of user listening tests, where users were played audio examples from two different methods and asked which of the two they preferred:

Beyond the many quantitative measures in the paper, we also can see qualitatively that the GANs that generate instantaneous frequencies (IF-GANs) also produce much more coherent waveforms. The top row of the figure below shows the generated waveform modulo the fundamental periodicity of a note. Notice that the real data completely overlaps itself as the waveform is extremely periodic. The WaveGAN and PhaseGAN, however, have many phase irregularities, creating a blurry web of lines. The IF-GAN is much more coherent, having only small variations from cycle-to-cycle. In the Rainbowgrams (CQTs with color representing instantaneous frequency) below, the real data and IF models have coherent waveforms that result in strong consistent colors for each harmonic, while the PhaseGAN has many speckles due to phase discontinuities, and the WaveGAN model is very irregular.

What’s next?

This work represents an initial foray into using GANs to generate high-fidelity audio, but many interesting questions remain. While the methods above worked well for musical signals, they still produced some noticeable artifacts for speech synthesis. Some recent related work builds on this by exploring new methods for recovering the phase from generated spectrograms with fewer artifacts. Other promising directions include using multi-scale GAN conditioning, handling variable length outputs, and perhaps replacing upsampling convolution generators with flexible differentiable synthesizers.

How to cite

If you extend or use this work, please cite the paper where it was introduced:

@inproceedings{gansynth,
title = {GANSynth: Adversarial Neural Audio Synthesis},
author = {Jesse Engel and Kumar Krishna Agrawal and Shuo Chen and Ishaan Gulrajani and Chris Donahue and Adam Roberts},
year = {2019},
URL = {https://openreview.net/pdf?id=H1xQVn09FX}
}
  1. Fun fact, this “beating” is a fundamental physical phenomena that gives rise to things as diverse as moiré patternsaliasing in signal processingpolyrhythms and harmony, and the energy band structure of crystalline solids.

Number Theory Basics for Competitive Programming

Introduction

This article discusses topics that are frequently used to solve programming problems based on math. It includes the following topics:

  1. Modular arithmetic
  2. Modular exponentiation
  3. Greatest Common Divisor (GCD)
  4. Extended Euclidean algorithm
  5. Modular multiplicative inverse

1. Modular arithmetic

When one number is divided by another, the modulo operation finds the remainder. It is denoted by the % symbol.

Example

Assume that you have two numbers 5 and 2. 5%2 is 1 because when 5 is divided by 2, the remainder is 1.

Properties

  1. (a+b)%c=(a%c+b%c)%c
  2. (a∗b)%c=((a%c)∗(b%c))%c
  3. (a−b)%c=((a%c)−(b%c)+c)%c
  4. (a/b)%c=((a%c)∗(b−1%c))%c

Note: In the last property, b−1 is the multiplicative modulo inverse of b and c.

Examples

If a=5,b=3, and c=2, then:

  • (5+3)%2=8%2=0
    Similarly, (5%2+3%2)%2=(1+1)%2=0
  • (5∗3)%2=15%2=1
    Similarly, ((5%2)∗(3%2))%2=(1∗1)%2=1

If a=12,b=15, and c=4, then the answer in some languages is (12−15)%4=(12%4−15%4)%4=(0−3)%4=−3. However, the answer of the % operator cannot be negative.

Therefore, to make the answer positive, add c to the formula and compute it as follows:
(12−15)%4=(12%4−15%4+4)%4=(0−3+4)%4=1

When are these properties used?

Assume that a = 1018, b = 1018, and c = 109+7. You have to find (a∗b)%c.

When you multiply a with b, the answer is 1036, which does not conform with the standard integer data types. Therefore, to avoid this we used the properties.

(a∗b)%c=((a%c)∗(b%c))%c=(49∗49)%(109+7)=2401

2. Modular exponentiation
Exponentiation is a mathematical operation that is expressed as xn and computed as xn=x⋅x⋅…⋅x (n times).

Basic method

While calculating xn, the most basic solution is broken down into x⋅xn−1. The new problem is xn−1, which is similar to the original problem. Therefore, like in original problem, it is further broken down to x⋅x⋅xn−2.

This is a recursive way of determining the answer to xn. However, sometimes an equation cannot be broken down any further as in the case of n=0. A C++ code for this solution, considering n≥0 is as follows:

int recursivePower(int x,int n)
{
    if(n==0)
        return 1;
    return x*recursivePower(x,n-1);
}

The recursive method aligns with the explanation, however, the solution can also be written in an iterative format, which is quite ad hoc. A variable ‘result’, to which x is multiplied for n number of times, is maintained.

The iterative code is as follows:

int iterativePower(int x,int n)
{
    int result=1;
    while(n>0)
    {
        result=result*x;
        n--;
    }
    return result;
}

Time complexity

With respect to time complexity, it is a fairly efficient O(n) solution. However, when it comes to finding xn, where n can be as large as 1018, this solution will not be suitable.

Optimized method

While calculating xn, the basis of Binary Exponentiation relies on whether n is odd or even.

If n is even, then xn can be broken down to (x2)n/2. Programmatically, finding x2 is a one-step process. However, the problem is to find (x2)n/2.

Notice how the computation steps were reduced from n to n/2 in just one step? You can continue to divide the power by 2 as long as it is even.

When n is odd, try and convert it into an even value. xn can be written as x⋅xn−1. This ensures that n−1 is even.

  • If n is even, replace xn by (x2)n/2.
  • If n is odd, replace xn by x⋅xn−1n−1 becomes even and you can apply the relevant formula.

Example

You are required to compute 310. The steps are as follows:

  1. The power of 3 is 10, which is even. Break it down as follows:
    310⇒(32)5⇒95
  2. Find 95. The power of 9 is 5, which is odd. Convert it into an even power and then apply the following formula:
    95⇒9⋅94⇒9⋅(92)2⇒9⋅(812)
  3. 812 is a one-step computation process

The result is 9⋅81⋅81=59049.

This is an efficient method and the ten-step process of determining 310 is reduced to a three-step process. At every step, n is divided by 2. Therefore, the time complexity is O(log N).

The code for the process is as follows:

int binaryExponentiation(int x,int n)
{
    if(n==0)
        return 1;
    else if(n%2 == 0)        //n is even
        return binaryExponentiation(x*x,n/2);
    else                             //n is odd
        return x*binaryExponentiation(x*x,(n-1)/2);
}

An iterative version of this method is as follows:

int binaryExponentiation(int x,int n)
{
    int result=1;
    while(n>0)
    {
        if(n % 2 ==1)
            result=result * x;
        x=x*x;
        n=n/2;
    }
    return result;
}

However, storing answers that are too large for their respective datatypes is an issue with this method. In some languages the answer will exceed the range of the datatype while in other languages it will timeout due to large number multiplications. In such instances, you must use modulus (%). Instead of finding xn, you must find (xn) % m.

For example, run the implementation of the method to find 2109. The O(n) solution will timeout, while the O(logN) solution will run in time but it will produce garbage values.

To fix this you must use the modulo operation i.e. % M in those lines where a temporary answer is computed.

int modularExponentiation(int x,int n,int M)
{
    if(n==0)
        return 1;
    else if(n%2 == 0)        //n is even
        return modularExponentiation((x*x)%M,n/2,M);
    else                             //n is odd
        return (x*modularExponentiation((x*x)%M,(n-1)/2,M))%M;

}

Similarly, the iterative binary exponentiation method can be modified as follows:

int modularExponentiation(int x,int n,int M)
{
    int result=1;
    while(n>0)
    {
        if(power % 2 ==1)
            result=(result * x)%M;
        x=(x*x)%M;
        n=n/2;
    }
    return result;
}

Recursive solution analysis

  • Time complexity: O(log N)
  • Memory complexity: O(log N) because a function call consumes memory and log N recursive function calls are made

Iterative solution analysis

  • Time complexity: O(log N)
  • Memory complexity: O(1)

3. Greatest Common Divisor (GCD)

The GCD of two or more numbers is the largest positive number that divides all the numbers that are considered. For example, the GCD of 6 and 10 is 2 because it is the largest positive number that can divide both 6 and 10.

Naive approach

Traverse all the numbers from min(A, B) to 1 and check whether the current number divides both A and B. If yes, it is the GCD of A and B.

int GCD(int A, int B) {
    int m = min(A, B), gcd;
    for(int i = m; i > 0; --i)
        if(A % i == 0 && B % i == 0) {
            gcd = i;
            return gcd;
        }
}

Time Complexity

The time complexity of this function is O(min(A, B)).

Euclid’s algorithm

The idea behind this algorithm is GCD(A,B)=GCD(B,A%B). It will recurse until A%B=0.

int GCD(int A, int B) {
    if(B==0)
        return A;
    else
        return GCD(B, A % B);
}

Example

If a = 16 and B = 10, then the GCD is computed as follows:

  • GCD(16, 10) = GCD(10, 16 % 10) = GCD(10, 6)
  • GCD(10, 6) = GCD(6, 10 % 6) = GCD(6, 4)
  • GCD(6, 4) = GCD(4, 6 % 4) = GCD(4, 2)
  • GCD(4, 2) = GCD(2, 4 % 2) = GCD(2, 0)

Since B = 0, GCD(2,0) will return 2.

Time complexity
The time complexity is O(log(max(A, B))).

4. Extended Euclidean algorithm

This algorithm is an extended form of Euclid’s algorithm. GCD(A,B) has a special property so that it can always be represented in the form of an equation i.e. Ax+By=GCD(A,B).

The coefficients (x and y) of this equation will be used to find the modular multiplicative inverse. The coefficients can be zero, positive or negative in value.

This algorithm takes two inputs as A and B and returns GCD(A,B) and coefficients of the above equation as output.

Example

If A=30 and B=20, then 30∗(1)+20∗(−1)=10 where 10 is the GCD of 20 and 30.

Key idea

A.x+B.y=GCD(A,B). —(1)

You know that GCD(A,B)=GCD(B,A%B). Therefore, you can write the equation as follows: B.x1+ (A % B).y1=GCD(A,B). —(2)

You can write A%B=A−B∗⌊A/B⌋ where ⌊⌋ means floor value .B and substitute it in equation 2. Your equation will be as follows: B.x1+ (A – ⌊A/B⌋.B).y1=GCD(A,B)

When you solve it further, your equation is as follows: B.(x1 – ⌊A/B⌋.y1)+A.y1=GCD(A,B). —(3)

Comparing coefficients in equations 1 and 3, you get the following:

  • x=y1
  • y=x1 – ⌊A/B⌋.y1

These equations are key in understanding the extended Euclidean algorithm.

In this algorithm, recursive calls are made to GCD(B,A%B). The values that are returned from recursive calls are x1 and y1, which are used to get x and y.

Implementation

#include < iostream >

int d, x, y;
void extendedEuclid(int A, int B) {
    if(B == 0) {
        d = A;
        x = 1;
        y = 0;
    }
    else {
        extendedEuclid(B, A%B);
        int temp = x;
        x = y;
        y = temp - (A/B)*y;
    }
}

int main( ) {
extendedEuclid(16, 10);
cout << The GCD of 16 and 10 is  << d << endl;
cout << Coefficients x and y are ”<< x <<  and   << y << endl;
return 0;   
}

Output

The GCD of 16 and 10 is 2.
Coefficients x and y are 2 and -3.

Initially, the extended Euclidean algorithm will run as Euclid’s algorithm until you determine GCD(A,B) or until B = 0. It will then assign x = 1 and y = 0.

In the current scenario, since B = 0 and GCD(A,B) is A, the equation Ax+By=GCD(A,B) will be changed to A∗1+0∗0=A.

The values of d, x, and y in the process of the extendedEuclid( ) function are as follows:

  • d=2,x=1,y=0
  • d=2,x=0,y=1−(4/2)∗0=1
  • d=2,x=1,y=0−(6/4)∗1=−1
  • d=2,x=−1,y=1−(10/6)∗−1=2
  • d=2,x=2,y=−1−(16/10)∗2=−3

Time complexity

The time complexity of the extended Euclidean algorithm is O(log(max(A,B))).

When is this algorithm used?

This algorithm is used when A and B are co-prime. In such cases, x becomes the multiplicative modulo inverse of A under modulo B, and y becomes the multiplicative modulo inverse of B under modulo A. This has been explained in detail in the Modular multiplicative inverse section.

5. Modular multiplicative inverse

What is a multiplicative inverse? If A.B=1, you are required to find B such that it satisfies the equation. The solution is simple. The value of B is 1/A or A−1. Here, B is the multiplicative inverse of A.

What is modular multiplicative inverse? If you have two numbers A and M, you are required to find B such it that satisfies the following equation:

(A.B)%M=1

Here B is the modular multiplicative inverse of A under modulo M.

Formally, if you have two integers A and M, B is said to be modular multiplicative inverse of A under modulo M if it satisfies the following equation:

A.B≡1(modM). where B is in the range [1,M-1]

This equation is a formal representation of the equation discussed earlier.

Why is B in the range [1,M-1]?

(A∗B)%M=(A%M∗B%M)%M

Since we have B%M, the inverse must be in the range [0,M-1]. However, since 0 is invalid, the inverse must be in the range [1,M-1].

Existence of modular multiplicative inverse

An inverse exists only when A and M are coprime i.e. GCD(A,M)=1.

For example, if A=5 and M=12, then (A∗5)%M=(5∗5)%12=1. Here, 5 is the modular multiplicative inverse of 5 under modulo 12.
Though (5∗17)%12=1, but since 17 > 12, it isn’t considered.

Therefore, the answer is 5.

Approach 1 (naive approach)

int modInverse(int A,int M)
{
    A=A%M;
    for(int B=1;B<M;B++)
        if((A*B)%M)==1)
            return B;
}

Time Complexity

The algorithm mentioned above runs in O(M).

Approach 2

A and M are coprime i.e. Ax+My=1. In the extended Euclidean algorithm, x is the modular multiplicative inverse of A under modulo M. Therefore, the answer is x. You can use the extended Euclidean algorithm to find the multiplicative inverse.

For example, if A=5 and M=12, then GCD(A,B)=1. Therefore, the inverse exists.

5∗(5)+12∗(−2)=1 (which comes from the extended Euclidean algorithm). Therefore, 5 is the inverse of A=5 under modulo 12.

int d,x,y;
int modInverse(int A, int M)
{
    extendedEuclid(A,M);
    return (x%M+M)%M;    //x may be negative
}

Time complexity

O(log(max(A,M)))

Approach 3 (used only when M is prime)

This approach uses Fermat’s Little Theorem.

The theorem specifies the following: AM−1≡1(modM)

By multiplying with A−1 both sides,the equation can be rewritten as follows:

A−1≡AM−2(modM)

The formula for A−1 is in the form of exponents. Therefore, modular exponentiation can be used to determine the result.

For example, if A=5 and M=11 then AM−2(modM)=59(mod11)=9 is the inverse of 5 under modulo 11.

int modInverse(int A,int M)
{
    return modularExponentiation(A,M-2,M);
}

Time complexity O(log M)

When is modular inverse used?

Modular inverse is used to solve (A/B)%M as follows: (A/B)%M=(A∗B−1)%M After you find the inverse, you can solve this equation easily.